Re: e1000 performance hack for ppc64 (Power4)

Anton Blanchard (anton@samba.org)
Sat, 14 Jun 2003 09:15:01 +1000

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Anton Blanchard: "Re: e1000 performance hack for ppc64 (Power4)"
Previous message: Greg KH: "Re: [RFC PATCH] Add lm78 sensor chip support (2.5.70)"

> 2-byte alignment is bad for ppc64, so what is minimum alignment that is
> good? (I hope it's not 2K!) What could you do at a higher level to
> present properly aligned buffers to the driver?

Best case 128 bytes - ppc64 cacheline size. Worst case 512 bytes - our
pci-pci bridge likes to prefetch in 512 byte increments. From Herman's
data you can see this in action:

Linux sending one 1514 byte packet (777 Mbits sec rate)

Address PCI command Bytes xfered
420A8 MRM 344
42200 MRM 512
42400 MRM 512
42600 MRL 128
42680 MR 18

With the buffer 512 byte aligned this is completed in 3 transactions.
Yes the hardware could be more intelligent about these unaligned
transactions but we cant do much about that now.

We might be able to do the alignment at a higher level but its not
straightforward (see my previous mail).

> If I'm understanding the patch correctly, you're saying unaligned DMA +
> TCE lookup is more expensive than a data copy? If we copy the data, we
> loss some of the benefits of TSO and Zerocopy and h/w checksum
> offloading! What is more expensive, unaligned DMA or TCE?

Lets ignore TCE lookup overhead for the moment. As Herman pointed all
these DMAs should occur on the same page.

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Anton Blanchard: "Re: e1000 performance hack for ppc64 (Power4)"
Previous message: Greg KH: "Re: [RFC PATCH] Add lm78 sensor chip support (2.5.70)"