It is a bug. np->rx_ring[i].next_desc is initialized by 2.4.6 with
gcc-2.95.1 to
[0]: 0x...210
[1]: 0x00000000
[2]: 0x...220
[3]: 0x...230
[4]: 0x...240.
etc.
> I realize the 2.4 code was changed to have the descriptor ring base
> be pre-translated from a virtual address to PCI bus-accessable physical
> memory address, and used offsets from that base. I'm pointing out that
> this problem doesn't exist in the 2.2. Note that the code above
> explicitly translates each descriptor ring entry to a physical address
> individually.
>
Correct. The problem was introduce by the virt_to_desc to pci_dma
conversion.
> > The patch also cleans up the suspend/resume synchronization and removes
> > 2 superflous (& wrong) spin_unlock calls.
>
> The (large) patch seems to add some unnecessary locking.
>
It's possible that some locking outside of the tx and rx codepath is not
required, I'm concentrating on a race free suspend & resume
implementation.
It shouldn't affect the critical functions:
rx interrupts run without a spinlock, and start_tx only acquires the
lock around "status = DescOwn;np->cur_rx++". netdev_tx_done() during
start_tx is an idea for tx interrupt mitigation, it's not yet finished.
-- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/