Even if we doing nothing more than the algorithm on the table, I doubt you'll
see a measurable overhead on fork+exec. Certainly it will be as good or
better than what we have currently.
If that's not good enough, I'm considering keeping a bit on the page table
indicating whether are particular page table is currently in the 'all CoWable
ptes set RO' state, and if so, don't do it again. I think that with this
small optimization, the value of further improvements will be small indeed.
That said, Linus's suggestion of using the x86's ability to have the
writeprotect bits in a page directory override the protections at the page
level is a good one, and reduces the cost of detecting the fork+exec case to
a very small number of faults - none if we are clever. But this is entirely
secondary to the main goal of sharing page tables at all, which is a rather
fundamental shift in the way the Linux VM works. (Though it seems the patch
will be small.)
> Also, it is my impression that
> the tree of _running_ processes isn't usually very deep (Say init --> X -->
> [Random processes] --> [compilations &c], this would make 5 or 6 deep, no
> more.
Worst case is just as important as typical case here, since there will always
be x% of users out there whose normal workload consists entirely of worst
case.
> Should take a pstree(1) listing on a busy machine and work out some
> statistics... here (a personal worstation) the tree is very fat at the
> first level below init(8), and just 5 deep when running pstree(1)).
Here's my tree - on a non-very-busy laptop. Why is my X tree so much deeper?
I suppose if I was running java this would look considerably more interesting.
init-+-apache---8*[apache]
|-apmd
|-bash---bash---xinit-+-XFree86
| `-xfwm-+-xfce---gnome-terminal-+-bash---pstree
| | `-gnome-pty-helpe
| `-xfgnome
|-cardmgr
|-cupsd---http
|-5*[getty]
|-gpm
|-kapm-idled
|-kdeinit---kdeinit
|-5*[kdeinit]
|-kdesud
|-keventd
|-kmail
|-mozilla-bin---mozilla-bin---3*[mozilla-bin]
|-portmap
|-sshd
`-xchat
> Sure, all processes will all end up sharing glibc, and the graphical stuff
> will share the X &c libraries, so this would end up being a win this way.
Nobody has suggested that the sharing algorithm as described isn't a win,
IMHO, we are quibbling over the last few percent of the win. It's getting
high time to end the suspense by benchmarking the code.
Caveat: the page table sharing as described does not do a lot for shared
mmaps, such as glibc. (Unless those are inherited through a fork of course,
then it helps a lot.) Let me reiterate my goal with this patch: *Fix The
Fork Problem With Rmap* so that we can quit spending months fiddling with
virtual scanning, trying to get it to work properly (it never will).
I see the value in the various suggestions I've received, but what I don't
see is the value in delaying, or getting stuck adding new features. Let's
concentrate on making the simple thing I've described work *now* and add
features to it later.
I'm gratified that nobody has yet pointed out any fundamental flaws that
would keep it from working. I wasn't at all sure of that when I set out on
this path a month ago.
-- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/