Re: 2.4.16 & OOM killer screw up (fwd)

Marcelo Tosatti (marcelo@conectiva.com.br)
Mon, 10 Dec 2001 17:42:38 -0200 (BRST)


On Mon, 10 Dec 2001, Andrew Morton wrote:

> Marcelo Tosatti wrote:
> >
> > Andrea,
> >
> > Could you please start looking at any 2.4 VM issues which show up ?
> >
>
> Just fwiw, I did some testing on this yesterday.
>
> Buffers and cache data are sitting on the active list, and shrink_caches()
> is *not* getting them off the active list, and onto the inactive list
> where they can be freed.
>
> So we end up with enormous amounts of anon memory on the inactive
> list, so this code:
>
> /* try to keep the active list 2/3 of the size of the cache */
> ratio = (unsigned long) nr_pages * nr_active_pages / ((nr_inactive_pages + 1) * 2);
> refill_inactive(ratio);
>
> just calls refill_inactive(0) all the time. Nothing gets moved
> onto the inactive list - it remains full of unfreeable anon
> allocations. And with no swap, there's nowhere to go.
>
> I think a little fix is to add
>
> if (ratio < nr_pages)
> ratio = nr_pages;
>
> so we at least move *something* onto the inactive list.
>
> Also refill_inactive needs to be changed so that it counts
> the number of pages which it actually moved, rather than
> the number of pages which it inspected.
>
> In my swapless testing, I burnt HUGE amounts of CPU in flush_tlb_others().
> So we're madly trying to swap pages out and finding that there's no swap
> space. I beleive that when we find there's no swap left we should move
> the page onto the active list so we don't keep rescanning it pointlessly.
>
> A fix may be to just remove the use-once stuff. It is one of the
> sources of this problem, because it's overpopulating the inactive list.
>
> In my testing last night, I tried to allocate 650 megs on a 768 meg
> swapless box. Got oom-killed when there was almost 100 megs of freeable
> memory: half buffercache, half filecache. Presumably, all of it was
> stuck on the active list with no way to get off.
>
> We also need to do something about shrink_[di]cache_memory(),
> which seem to be called in the wrong place.
>
> There's also the report concerning modify_ldt() failure in a
> similar situation. I'm not sure why this one occurred. It
> vmallocs 64k of memory and that seems to fail.

I haven't applied the modify_ldt() patch because I want to make sure its
needed: It may just be a bad effect of this one bug.

> I did some similar testing a week or so ago, also tested
> the -aa patches. They seemed to maybe help a tiny bit,
> but not significantly.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/