Myself and Jakob Oestergaard have seen the same race, and the tentative
fix from Trond was similar to yours. I haven't been able to reproduce
the problem after applying that fix.
Perhaps it's time to propagate the patch upstream? Most recent 2.4.x
kernels are affected...
> What appears to happen is that rpc_call_sync allocates a struct rpc_task
> (with its embedded tk_timer) on the stack, and the timer gets set up
> sometime during rpc_execute. However, the timer actually triggers at
> a point in time where the original call to rpc_call_sync has already
> returned, and the stack space overwritten by other data. That data is
> now interpreted as an rpc_task struct holding a tk_timeout_fn pointer by
> rpc_run_timer, which causes the Oops (actually, Aieee).
Yup, that's the race all right.
Ion
-- It is better to keep your mouth shut and be thought a fool, than to open it and remove all doubt. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/