Re: NMI handling rework for x86

Dipankar Sarma (dipankar@in.ibm.com)
Fri, 15 Nov 2002 13:43:38 +0530


On Fri, Nov 15, 2002 at 02:40:10AM -0500, Zwane Mwaikambo wrote:
> On Fri, 15 Nov 2002, Dipankar Sarma wrote:
> > The RCU part is fairly simple - you want to avoid having to acquire
> > a lock for every NMI event to walk the handler so you do it
> > lockfree. If a process running in a different CPU tries to
> > free an nmi handler, it removes it from the list, issues an
> > rcu callback (to be invoked after all CPUs have gone through
> > a context switch or executed user-level code ensuring that the
> > deleted nmi handler can't be running) and waits for completion of
>
> How are you so sure the handler isn't running? You can get an NMI after
> any cpu instruction inbetween all of that happening, not to mention that
> it can happen on multiple processors means since its a shared nmi handler
> list, you're almost never going to find that list not being traversed at
> some stage by a processor. Try synchronising the cpus for a removal when
> they're all handling an NMI every millisecond.

Once you remove a handler from the list, any subsequent NMI is *not*
going to see the handler. So even if another CPU is executing the same
handler, if you wait for the RCU callback, you can guarantee that
no-one is executing the deleted handler. RCU will wait for all the
CPUs to context switch or execute user-level code atleast once.
That meant any other CPU which might have been executing the NMI
handler at the time of deletion has now exited from the handler.

> I don't think you can rely on completion() to ensure this. Its hardly an

Corey's code doesn't rely on completion() to ensure this, it relies
on RCU to make sure that nobody is running the handler. The key is
that once the pointers between the prev and the next of the deleted
NMI handler are set, subsequent NMIs aren't going to see that handler.
context switch/user-level means that CPU must have exited any
NMI handler it may have been executing at the time of deletion.

> atomic operation in this context, whats wrong with spin_trylock(nmi_handler_lock)
> and do an early bailout on failure?

spin_trylock modifies the lock cacheline, so cacheline bouncing.

Thanks

-- 
Dipankar Sarma  <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/