RE: e1000 performance hack for ppc64 (Power4)

Feldman, Scott (scott.feldman@intel.com)
Thu, 12 Jun 2003 18:16:11 -0700


David, arch-specific code in the driver concerns me. I would really
like to avoid any arch-specific code in the driver if at all possible.
Can we find a way to move this work to the arch-dependent areas? This
doesn't seem to be an issue unique to e1000, so moving this up one level
should benefit other devices as well. More questions below.

> Peculiarities in the PCI bridges on Power4 based ppc64 machines mean
> that unaligned DMAs are horribly slow. This hits us hard on gigabit
> transfers, since the packets (starting from the MAC header) are
> usually only 2-byte aligned.

2-byte alignment is bad for ppc64, so what is minimum alignment that is
good? (I hope it's not 2K!) What could you do at a higher level to
present properly aligned buffers to the driver?

> The patch below addresses this by copying and realigning packets into
> nicely 2k aligned buffers. As well as fixing the alignment that
> minimises the number of TCE lookups, and because we allocate the
> buffers pci_alloc_consistent(), we avoid setting up and tearing down
> the TCE mappings for each packet.

If I'm understanding the patch correctly, you're saying unaligned DMA +
TCE lookup is more expensive than a data copy? If we copy the data, we
loss some of the benefits of TSO and Zerocopy and h/w checksum
offloading! What is more expensive, unaligned DMA or TCE?

-scott
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/