Best case 128 bytes - ppc64 cacheline size. Worst case 512 bytes - our
pci-pci bridge likes to prefetch in 512 byte increments. From Herman's
data you can see this in action:
Linux sending one 1514 byte packet (777 Mbits sec rate)
Address PCI command Bytes xfered
420A8 MRM 344
42200 MRM 512
42400 MRM 512
42600 MRL 128
42680 MR 18
With the buffer 512 byte aligned this is completed in 3 transactions.
Yes the hardware could be more intelligent about these unaligned
transactions but we cant do much about that now.
We might be able to do the alignment at a higher level but its not
straightforward (see my previous mail).
> If I'm understanding the patch correctly, you're saying unaligned DMA +
> TCE lookup is more expensive than a data copy? If we copy the data, we
> loss some of the benefits of TSO and Zerocopy and h/w checksum
> offloading! What is more expensive, unaligned DMA or TCE?
Lets ignore TCE lookup overhead for the moment. As Herman pointed all
these DMAs should occur on the same page.
Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/