The only other path that uses NO_FS is grow_buffers. But the page probably
got dirtied in generic_file_write, which already put buffers on it. A NOFS
allocation could also be triggered by Ext2 (and other filesystems) by having
lots of dirty mmaps: when page_launder calls page->writepage then the page
won't have buffers on it. That's probably not what's happening though.
I don't think NOFS is causing the problem by the way, it's just a convenient
marker to recognize where the allocation is coming from.
What happens is, page_launder calls reseiserfs_writepage which for some
reason recursively allocates a page (I don't have time to look for the exact
path - it's probably for the journal - but whoever has the problem can check
it via show_trace). We now are in a recursive allocation situation (with
PF_MEMALLOC) so page_launder doesn't get called and we drop through to get
the "failed" message.
It's not nice for __alloc_pages to fail back to a caller that's willing to
wait. See below for one idea about what to do about it.
> How is preemption related?
I'll speculate: page_launder is now yielding to other tasks when it releases
spinlocks to do a writepage. One of them is likely to come back in and
attempt another allocation while we're at rock bottom.
If that's true then I think we should consider something I've wanted to try:
make callers block on a wait queue in __alloc_pages when memory is really
tight.
Hmm. We could do that just in this specific case of PF_MEMALLOC+GFP_WAIT.
Semaphores work well for this kind of thing, something like:
if (!(current->flags & PF_MEMALLOC)) {
<the existing reclaim/launder logic>
} else
if (gfp_mask & __GFP_WAIT) {
wakeup_kswapd();
atomic_inc(&memwaiters);
down(&memwait);
goto try_again;
}
Then in kswapd:
waiters = atomic_read(&memwaiters);
atomic_sub(waiters, &memwaiters);
while (waiters--)
up(&memwaiter);
when we detect that free memory is restored to something reasonable. This
won't deadlock on memwait because kswapd doesn't use __GFP_WAIT.
We also have to make kswapd count wakeups so we can be sure it doesn't sleep
while somebody is waiting in __alloc_pages. The only way I know to do this
reliably is with another semaphore:
void wakeup_kswapd() {
up(&kswapd_sleep);
}
and kswapd downs that semaphore instead of doing interruptible_sleep_on_timeout.
Additionally, a timer has to up() the semaphore periodically, to recover the
sleep_on_timeout behaviour.
Sound like overkill? The alternative is to let GFP_WAIT allocations fail which
forces users like journalling filesystems to busy wait and load up the runqueue.
Sorry I don't have time to code this just now, but I'd like to give this a try
if the problem's still there next week though. Or if you're in the mood...
-- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/