I have to apologize for my misstatements of the problem here. You
yourself pointed out to me the hold time was, in fact, linear. Despite
the linearity of the algorithm, the failure mode persists. I've
postponed further investigation until later, when more invasive
techniques are admissible; /proc/ alone will not suffice if linear
algorithms under tasklist_lock can trigger this failure mode.
I believe further work is needed but I can't think of a 2.5.x mergeable
method to address it. I've attempted to devolve the work to others in
the hopes that future solutions might be devised. It's unfortunate but
general algorithmic scalability for scenarios like this has a real cost
for the low-end and it's a problem I don't feel comfortable trying to
fix in the middle of 2.5.x stabilization for more general systems.
Unless a refinement of either manfred's or your patches can be made to
pass the test (apologies again; I don't recall the results, my time on
the whole system is very limited and it was a while ago) I suspect very
little can be done for 2.5.x here. IMHO a series of patches to
eliminate all remaining linear scans under tasklist_lock alongside a
fair locking construct will be eventually required, though, of course,
only a solution is required, not my expectation.
-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/