Re: kswapd deadlock 2.4.3-pre6

Linus Torvalds (torvalds@transmeta.com)
Wed, 21 Mar 2001 12:29:55 -0800 (PST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Steve Best: "Announcing Journaled File System (JFS) release 0.2.1 available"
Previous message: Kevin Buhr: "Re: Linux 2.4.2 fails to merge mmap areas, 700% slowdown."

On Wed, 21 Mar 2001, Mike Galbraith wrote:
>
> I have a repeatable deadlock when SMP is enabled on my UP box.
>
> >>EIP; c021e29a <stext_lock+1556/677b> <=====

When you see something like this, please do

gdb vmlinux

(gdb) x/10i 0xc021e29a

and it will basically show you where the code jumps back to.

It's almost certainly the beginning of swap_out_mm() where we get the
page_table_lock, but it would still be good to verify.

The deadlock implies that somebody scheduled with page_table_lock held.
Which would be really bad. You should be able to do something like

if (current->mm && spin_is_locked(&current->mm->page_table_lock))
BUG():

in the scheduler to see if it triggers (this only works on UP hardware
with a SMP kernel - on a real SMP machine it's entirely legal to have the
lock during a schedule, as the lock may be held by any of the _other_
CPU's, of course, and the above assert would be the wrong thing to do in
general)

Of course, it might not be somebody scheduling with a spinlock, it might
just be a recursive lock bug, but that sounds really unlikely.

> ac20+2.4.2-ac20-rwmmap_sem3 does not deadlock doing the same
> churn/burn via make -j30 bzImage.

it won't do the page table locking for page table allocations, so it will
have other bugs, though.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Steve Best: "Announcing Journaled File System (JFS) release 0.2.1 available"
Previous message: Kevin Buhr: "Re: Linux 2.4.2 fails to merge mmap areas, 700% slowdown."