DETAILS: We have a Motorola G4-based compact PCI card with dual DEC 21143-based
ethernet ports. We are using a 2.2.17 kernel with various patches, but none to
the ethernet driver. We've been working on this card for months now without
seeing any real problems, but recently someone was doing bandwidth tests through
each link and noticed a discrepency. We then looked at the output of ifconfig
and saw all kinds of carrier errors (one for each packet transmitted) on the
problem link.
The normal method of booting this card is with a largish (34MB uncompressed)
ramdisk as the root filesystem. In this scenario, one ethernet link is
configured by the system based on information obtained from the bootp server,
and the other ethernet link is brought up automatically later on based on the
first address that was configured. What we've noticed is that the ethernet link
that is configured later on shows a carrier error for every packet transmitted
through that link. Interestingly the vast majority of those packets are
actually making it through--a "ping -f" from another machine to the affected
link shows about .1% packet loss. It doesn't matter which link is configured
automatically by the system (we've tried it both ways), the carrier errors
always occur on the other link.
If we boot the exact same kernel (actually its the kernel and ramdisk glommed
together into one file, loaded via tftp) but then override the boot args to use
an nfs-mounted root filesystem that is identical to the one in the ramdisk, then
everything works fine. We configure one ethernet link at startup based on bootp
requests and the other one gets configured later on. Everything works
perfectly, "ping -f" from another machine gives a few dropped packets out of a
few hundred thousand, through either link. No errors.
Can anyone think of what could possibly be causing this? Somehow, the act of
using a ramdisk as our root filesystem is causing problems with our ethernet
links. Are there any known gotchas that may be biting us? Other than the
problems with one of the two links, the system seems to be working perfectly.
Thanks for any theories you might have,
Chris Friesen
Nortel Networks
Ottawa, ON
PS. This is my third time sending this, since my first two tries (from two
different addresses) don't seem to have made it onto the list at all. Anyone
else seeing this?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/