Re: [RFC] Page table sharing

Daniel Phillips (phillips@bonn-fries.net)
Sat, 16 Feb 2002 22:08:05 +0100


On February 16, 2002 09:21 pm, Linus Torvalds wrote:
> On Sat, 16 Feb 2002, Daniel Phillips wrote:
> >
> > I think this patch is ready to look at now. It's been pretty stable,
> > though I haven't gone as far as booting with it - page table sharing is
> > still restricted to uid 9999. I'm running it on a 2 way under moderate
> > load without apparent problems. The speedup on forking from a parent
> > with large vm is *way* more than I expected.
>
> I'd really like to hear what happens when you enable it unconditionally,
> and then run various real loads along with things like "lmbench".

I'm running on it unconditionally now. I'm still up though I haven't run
really heavy stress tests. I've had one bug report, curiously with the
version keying on uid 9999 while *not* running as uid 9999. This makes me
think that there are some uninitialized page use counts on page tables
somewhere in the system.

> Also, when you test forking over a parent, do you test just the fork, or
> do you test the "fork+wait" combination that waits for the child to exit
> too? The latter is the only really meaningful thing to test.

Will do. Does this do the trick:

wait();
gettimeofday(&etime, NULL);

If so, it doesn't affect the timings at all, in other words I haven't just
pushed the work into the child's exit.

> Anyway, the patch certainly looks pretty simple and small. Great.
>
> > I haven't fully analyzed the locking yet, but I'm beginning to suspect it
> > just works as is, i.e., I haven't exposed any new critical regions. I'd
> > be happy to be corrected on that though.
>
> What's the protection against two different MM's doing a
> "zap_page_range()" concurrently, both thinking that they can just drop the
> page table directory entry, and neither actually freeing it? I don't see
> any such logic there..

Nothing prevents that, duh.

> I suspect that the only _good_ way to handle it is to do
>
> pmd_page = ..
>
> if (put_page_testzero(pmd_page)) {
> .. free the actual page table entries ..
> __free_pages_ok(pmd_page, 0);
> }
>
> instead of using the free_page() logic. Maybe you do that already, I
> didn't go through the patches _that_ closely.

I do something similar in clear_page_tables->free_one_pmd, after the entries
are all gone. I have to do something different in zap_page_range - it wants
to free the pmd only if the count is *greater* than one, and can't tolerate
two mms thinking that at the same time. I think I'd better lock the pmd page
there.

> Ie you'd do a "two-phase" page free - first do the count handling, and if
> that indicates you should really free the pmd, you free the lower page
> tables before you physically free the pmd page (ie the page is "live" even
> though it has a count of zero).

Actually, I'm not freeing the pmd I'm freeing *pmd, a page table. So, if the
count is > 1 it can be freed without doing anything to the ptes on it. This
is the entire source of the speedup, by the way.

-- 
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/