> On Sun, 8 Jul 2001, Linus Torvalds wrote:
> > On Sun, 8 Jul 2001, Rik van Riel wrote:
> > >
> > > ... Bingo. You hit the infamous __wait_on_buffer / ___wait_on_page
> > > bug. I've seen this for quite a while now on our quad xeon test
> > > machine, with some kernel versions it can be reproduced in minutes,
> > > with others it won't trigger at all.
> >
> > Hmm.. That would explain why the "tar" gets stuck, but why does the whole
> > machine grind to a halt with all other processes being marked runnable?
>
> If __wait_on_buffer and ___wait_on_page get stuck, this could
> mean a page doesn't get unlocked. When this is happening, we
> may well be running into a dozens of pages which aren't getting
> properly unlocked on IO completion.
>
> This in turn would get the rest of the system stuck in the
> pageout code path, eating CPU like crazy.
I don't know exactly what is happening, but I do know _who_ is causing
the problem I'm seeing.. it's tmpfs. When mounted on /tmp and running
X/KDE, the tar [1] will oom my box every time because page_launder trys
and always failing to get anything scrubbed after the tar has run for a
while. Unmount tmpfs/restart X and do the same tar, and all is well.
(it's not locked pages aparantly. I modified page_launder to move those
to the active list, and refill_inactive_scan to rotate them to the end
of the active list. inactive_dirty list still grows ever larger, filling
with 'stuff' that page_launder can't clean until you're totally oom)
-Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/