Re: mempool design

Andrea Arcangeli (andrea@suse.de)
Mon, 17 Dec 2001 16:04:26 +0100


On Sat, Dec 15, 2001 at 11:18:33PM +0100, Ingo Molnar wrote:
>
> On Sat, 15 Dec 2001, Benjamin LaHaise wrote:
>
> > [...] The design for reservations is to use enforced accounting limits
> > to achive the effect of seperate memory pools. [...]
>
> how is this going to handle higher-order pools? How is this going to
> handle the need for composite allocations?
>
> I think putting in reservation for higher-order pages is going to make
> some parts of page_alloc.c *really* complex, and this complexity i dont
> think we need.
>
> > [...] Mempool's design is to build seperate pools on top of existing
> > pools of memory. Can't you see the obvious duplication that implies?
>
> no. Mempool's design is to build preallocated, reserved,
> guaranteed-allocation pools on top of simpler allocators. Simpler
> allocators will try reasonably hard to get something allocated, but might
> fail as well. The majority of allocations within the kernel has no
> deadlock relevance at all. If we allocate a new file structure, or create
> a new socket, or are trying to page in overcommitted memory then we can
> return with -ENOMEM (or OOM) just fine. Allocators are simpler and faster
> without built-in deadlock avoidance and 'reserves'.
>
> so the difference in design is that you are trying to add reservations as
> a feature of the allocators themselves, while i'm trying to add it as a
> generic, allocator-independent subsystem, which also solved a number of
> other problems we always had in the IO code. Under this design, the 'pure'
> allocators themselves have no concept of reserved pools at all, so there
> is no duplication in concepts. (and no duplication of code.)
>
> so the fundamental question is, should reservation be a core functionality
> of allocators, or should it be a separate subsystem. *If* it can be put
> into the core allocators (page allocator, SLAB) that gives us all the
> features that mempool gives us today then i think i'd like that approach.
> But i cannot really see how the composite allocation thing can be done via
> reservations.

This whole long thread can be resumed in two points:

1 mempool reserved memory is "wasted" i.e. not usable as cache
2 if the mempool code is moved inside the memory balancing of the
VM we could use this memory as clean, atomically-freeable cache

however the option 2 is quite complex, think when somebody mmap the page
and we find_lock etc... we cannot "lock" a reserved page, or it would be
unfreeable, at least unless we're sure this "lock" will go away without
us deadlocking on it while waiting.

so in short solution 1 is much simpler and much more obviously correct,
and the only disavantage is that it reduces the amount of clean cache
that could be pontentially be used by the kernel.

If implementation details and code complexity would be our last design
priority solution 2 advocated by Ben, Rik and SCT would be obviously
superior.

At the moment in 2.5 and also in 2.4 we use the "mempool outside VM"
logic just because we can keep it under control without being killed by
the huge complexity of the implementation details with the locking of
clean cache, nesting into the vm etc... Of course I'm considering a
correct implementation of it, not an hack where cache can be mlocked and
the kernel deadlocks because the reserved memory isn't freeable anymore.

Personally I'm more relaxed with the mempool approch because it reduces
the complexity of an order of magnitude, it abstracts the thing without
making the memory balancing more complex and it definitely solve the
problem (if used correctly i.e. not two alloc_bio in a row from the same
pool from multiple tasks at the same time as pointed out by Ingo).

If somebody wants such 1% of ram back he can buy another dimm of ram and
plug it into his hardware. I mean such 1% of ram lost is something that
can be solved by throwing a few euros into the hardware (and people buys
gigabyte boxes anyways so they don't need all of the 100% of ram), the
other complexy cannot be solved with a few euros, that can only be
solved with lots braincycles and it would be a maintainance work as
well. Abstraction and layering definitely helps cutting down the
complexity of the code.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/