After a transmission error (e.g. excessive collisions) the chip stops to
let the driver handle it. The driver does its thing and restarts the
transmission engine. Problem is, the ring buffer pointer on the chip
skidded too far and hence takes up work from the wrong entry.
If an error occured on entry n, the chip continues on n+2. The driver stops
harvesting transmitted buffers because the next entry in the ring (n+1)
remains marked as owned by the driver. A few more packets may be sent after
the restart, then the card stalls. After a while the watchdog kicks in to
resets chip and buffers. Transmission continues.
You can verify this easily by dumping ring pointer information and the
status bits associated with the ring buffer.
The fix is to have the interrupt handler set the ring buffer pointer to
what the driver knows to be the current entry.
Btw: The stalling you've seen, was that at 10 or 100 Mbps? Hub or Switch?
With debug level 2 (and fixed driver), do you find Abort or Underrun errors
in your log in situations where stalling occured with the old driver?
Roger
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/