Re: [patch] severe softirq handling performance bug, fix, 2.4.5

Rusty Russell (rusty@rustcorp.com.au)
Sun, 10 Jun 2001 20:40:20 +1000


In message <Pine.LNX.4.33.0105261920030.3336-200000@localhost.localdomain> you
write:
> i've been seeing really bad average TCP latencies on certain gigabit cards
> (~300-400 microseconds instead of the expected 100-200 microseconds), ever
> since softnet went into the main kernel, and never found a real
> explanation for it, until today.

The S/390 guys hit similar issues with the hotplug CPU patch, with
softirqs still pending on the downed CPU.

There are two cases when this happens:

(1) softirq's disabled when the interrupt came in.

(2) softirq scheduled within softirq, such as when networking is
overloaded (net/core/dev.c's net_rx_action()).

Solving (2) is hard: the choices are to risk livelock (by resetting
the mask inside the softirq loop), or accept that bursty traffic may
have latencies of up to one timer tick.

Either way, we should solve (1) by checking in local_bh_enable() is
fine (despite Dave's reservations, and Alexey's "don't do that unless
you are about to schedule()" is obviously crap). Then we can drop the
hackish checks inside schedule(), system calls returns and idle loops,
all of which were simply masking this problem.

Rusty.

--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/