Re: x86 smp_call_function recent breakage

Andrea Arcangeli (andrea@suse.de)
Tue, 23 Oct 2001 19:23:07 +0200


On Tue, Oct 23, 2001 at 05:49:35PM +0100, Alan Cox wrote:
> > This isn't the right fix:
>
> The cache thing you may be right on. The core fix is not about caching
> its about fixing races on IPI replay. We have to handle an IPI reoccuring
> potentially with a small time delay due to a retransmit of a message
> lost by one party on the bus. Furthermore it seems other messages can
> pass the message that then gets retransmitted.
>
> > Robert, this explains the missed IPI during drain_cpu_caches, it isn't
> > ram fault or IPI missed by the hardware, so I suggest to just backout
> > the second diff above and try again. Will be fixed also in next -aa of
> > course.
>
> I'm not sure I follow why it explains a missed IPI. Please explain in
> more detail.

The bug is very simple and it have nothing to do with hardware details,
it is a plain software bug:

- if (wait) {
- mb();
- atomic_inc(&call_data->finished);
- }
+ if (wait)
+ set_bit(smp_processor_id(), &call_data->finished);

since 2.4.10pre12 (and in all previous -ac also) you still use the above
"call_data" for notifying "finished", but you also allowed "call_data"
to be changed under the IPI handler by another incoming IPI (the
scalability optimization), so the first IPI handler will notify the
completion of the second IPI (instead of the first one) generating the
"missed IPI" generating kernel instability (deadlock for Roboert in
drain_cpu_caches that waits all cpus to "finish").

So you wonder about reply IPI or whatever, and you add further hackery
plus you left the NR_CPUS array that now is useless since there's
nothing to scale and we hold the spinlock for the whole duration of the
IPI cycle, while the only fix in your ac5<->ac6 diff is that you now
spin_unlock after touching ->finished like the old code correctly did,
so you notify completion of the right IPI and not of the wrong one, so
it won't deadlock anymore.

I don't think it's necessary to read ->func and friends before checking
->started, also there's no need of testing ->started. we have the
guarantee that only 1 IPI cycle can run at once because of the spinlock
held for the whole duration of the IPI cycle, so if we are running in
smp_call_function_interrupt we know all ->func,->started == 0 fields are
just coherent and freely usable in memory, we only need to care to mb()
between reading them and increasing ->started like the old code
correctly did. So I think the old code was just perfect.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/