The bug is very simple and it have nothing to do with hardware details,
it is a plain software bug:
- if (wait) {
- mb();
- atomic_inc(&call_data->finished);
- }
+ if (wait)
+ set_bit(smp_processor_id(), &call_data->finished);
since 2.4.10pre12 (and in all previous -ac also) you still use the above
"call_data" for notifying "finished", but you also allowed "call_data"
to be changed under the IPI handler by another incoming IPI (the
scalability optimization), so the first IPI handler will notify the
completion of the second IPI (instead of the first one) generating the
"missed IPI" generating kernel instability (deadlock for Roboert in
drain_cpu_caches that waits all cpus to "finish").
So you wonder about reply IPI or whatever, and you add further hackery
plus you left the NR_CPUS array that now is useless since there's
nothing to scale and we hold the spinlock for the whole duration of the
IPI cycle, while the only fix in your ac5<->ac6 diff is that you now
spin_unlock after touching ->finished like the old code correctly did,
so you notify completion of the right IPI and not of the wrong one, so
it won't deadlock anymore.
I don't think it's necessary to read ->func and friends before checking
->started, also there's no need of testing ->started. we have the
guarantee that only 1 IPI cycle can run at once because of the spinlock
held for the whole duration of the IPI cycle, so if we are running in
smp_call_function_interrupt we know all ->func,->started == 0 fields are
just coherent and freely usable in memory, we only need to care to mb()
between reading them and increasing ->started like the old code
correctly did. So I think the old code was just perfect.
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/