Re: large page patch

William Lee Irwin III (wli@holomorphy.com)
Thu, 1 Aug 2002 22:11:51 -0700


On Thu, Aug 01, 2002 at 09:32:44PM -0700, Linus Torvalds wrote:
> I bet that is mainly because of CPU scalability, and being able to avoid
> touching the buddy lists from multiple CPU's - the same reason _we_ have
> the per-CPU front-ends on various allocators.
> I doubt it is because buddy matters past the 4MB mark. I just can't see
> how you can avoid the naive math which says that it should be 1/512th as
> common to coalesce to 4MB as it is to coalesce to 8kB.
> Walking the buddy bitmaps for a few levels (ie up to order 3 or 4) is
> probably quite common, and it's likely to be bad from a SMP cache
> standpoint (touching a few bits with what must be fairly random patterns).
> So avoiding the buddy with a simple front-end is likely to win you
> something, without actually being meaningful at the MAX_ORDER point.

This is actually part of my strategy.

By properly organizing the deferred queues into lists of lists and
maintaining a small per-cpu cache of pages, a "cache fill" involves
doing a single list deletion under the zone->lock and the remainder
of the work to fill a pagevec occurs outside the lock, reducing the
mean hold time down to ridiculous lows. And since the allocations
are batched, the arrival rate is then divided by the batch size.
Conversely, frees are also batched and the same effect achieved with
the dual operations.

i.e. magazines for the page-level allocator

This can't be achieved with a pure buddy system, as it must examine
individual pages one-by-one to keep the bitmaps updated. Vahalia
discusses the general approach in another section, and integration with
buddy systems (and other allocators) in an exercise.

Cheers,
Bill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/