Re: [Lse-tech] Re: 10.31 second kernel compile

Daniel Phillips (phillips@bonn-fries.net)
Sat, 16 Mar 2002 23:25:45 +0100


On March 16, 2002 08:01 pm, Linus Torvalds wrote:
> On Sat, 16 Mar 2002, Daniel Phillips wrote:
> > It could be a lot more abstract than that. Chuck Cranor's UVM (which
> > seems to bear some sort of filial relationship to the FreeBSD VM) buries
> > all accesses to the page table behind a 'pmap' API, and implements the
> > standard Unix VM semantics at the 'memory object' level.
>
> Who knows, maybe we'll change the abstraction in Linux some day too..
> However, I personally tend to prefer "thin" abstractions that don't hide
> details.
>
> The problem with the thick abstractions ("high level") is that they often
> lead you down the wrong path. You start thinking that it's really cheap to
> share partial address spaces etc ("hey, I just map this 'memory object'
> into another process, and it's just a matter of one linked list operation
> and incrementing a reference ount").

My opinion, which I implied in the previous post but didn't state in so many
words, is that the whole Real Unix crowd - Chuck Cranor and Matt Dillon, Sun,
SGI and IBM etc - got off on the wrong track with respect to implementing
Unix VM semantics, and that we will achieve all the design goals they set for
themselves in a simpler, more efficient way. (That is, assuming I ever
finish debugging the page table sharing[1] and extend it to shared mmaps.) I
attribute that whole wrong turn to a too-heavy abstraction of the page table,
distracting the eye from the observation that the page table itself provides
sufficient state to do the same job as memory objects. I'm curious to hear
Matt's opinion on that by the way, I have to go bother him about this.

> Until you realize that the actual sharing still implies a TLB switch
> between the two "threads", and that you need to instantiate the TLB in
> both processes etc. And suddenly that instantiation is actually the _real_
> cost - and your clever highlevel abstraction was actually a lot more
> expensive than you realized.

Well I don't have any problem with the TLB cost being hidden, what bothers me
is the complexity of the mechanism required to make the abstraction work.
Sort-of work I mean, just google 'all-shadowed case' to see one nasty
difficulty.

> [ Side note: I'm very biased by reality. In theory, a non-page-table based
> approach which used only a front-side TLB and a fast lookup into higher-
> level abstractions might be a really nice setup. However, in practice,
> the world is 99%+ based on hardware that natively looks up the TLB in a
> tree, and is really good at it too. So I'm biased. I'd rather do good
> on the 99% than care about some theoretical 1% ]

It breaks down somewhat as virtual memory range goes way beyond 4GB.
There's the relatively minor issue of extra levels of tree traversal,
currently limited to 4 by AMD's architecture but not so limited on other
architectures. A bigger problem is what to do about internal fragmentation
in the page table tree, say if somebody mmaps a 2 TB sparse file, then writes
one byte every 2 meg. Bang, 4 gig worth of page tables, this is probably not
what we want. IMHO, 'don't do that then' isn't a reasonable response.

What we might want to do there is evict some page tables when they start
proliferating too much, and that's when we find out we have no good model for
doing that. I think this needs to be looked at.

[1] I finally got a little more work done on it today

-- 
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/