My opinion, which I implied in the previous post but didn't state in so many
words, is that the whole Real Unix crowd - Chuck Cranor and Matt Dillon, Sun,
SGI and IBM etc - got off on the wrong track with respect to implementing
Unix VM semantics, and that we will achieve all the design goals they set for
themselves in a simpler, more efficient way. (That is, assuming I ever
finish debugging the page table sharing[1] and extend it to shared mmaps.) I
attribute that whole wrong turn to a too-heavy abstraction of the page table,
distracting the eye from the observation that the page table itself provides
sufficient state to do the same job as memory objects. I'm curious to hear
Matt's opinion on that by the way, I have to go bother him about this.
> Until you realize that the actual sharing still implies a TLB switch
> between the two "threads", and that you need to instantiate the TLB in
> both processes etc. And suddenly that instantiation is actually the _real_
> cost - and your clever highlevel abstraction was actually a lot more
> expensive than you realized.
Well I don't have any problem with the TLB cost being hidden, what bothers me
is the complexity of the mechanism required to make the abstraction work.
Sort-of work I mean, just google 'all-shadowed case' to see one nasty
difficulty.
> [ Side note: I'm very biased by reality. In theory, a non-page-table based
> approach which used only a front-side TLB and a fast lookup into higher-
> level abstractions might be a really nice setup. However, in practice,
> the world is 99%+ based on hardware that natively looks up the TLB in a
> tree, and is really good at it too. So I'm biased. I'd rather do good
> on the 99% than care about some theoretical 1% ]
It breaks down somewhat as virtual memory range goes way beyond 4GB.
There's the relatively minor issue of extra levels of tree traversal,
currently limited to 4 by AMD's architecture but not so limited on other
architectures. A bigger problem is what to do about internal fragmentation
in the page table tree, say if somebody mmaps a 2 TB sparse file, then writes
one byte every 2 meg. Bang, 4 gig worth of page tables, this is probably not
what we want. IMHO, 'don't do that then' isn't a reasonable response.
What we might want to do there is evict some page tables when they start
proliferating too much, and that's when we find out we have no good model for
doing that. I think this needs to be looked at.
[1] I finally got a little more work done on it today
-- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/