Sounds like something worth experimenting with. I doubt you could
really avoid (effectively) flushing the caches, but even if there are
just a few zero bits in the bitmap at the time of the tear-down, a
fair amount of time could be saved.
You'd be surprised how many 0 bits there will be in the average
process. Even if you bring in all of emacs, glibc, X11R6 libs
etc. and the anonymous memory, there are still a HUGE portion of the
address space totally unused.
But like you said, worth experimenting with :-) First test would be,
start with 1 unsigned long as the bitmask in mm_context_t. Just
implement the bit setting part. Then at exit() count how many 0 bits
are left, record this into some counter table which has one counter
for 0 --> N_BITS_IN_LONG. Make some debug /proc thing which spits the
table out. (hint: at fork, clear out the child's bitmask before
copy_page_range is run for best results :-)
You can use this to do vaious things and see how much there is to gain
by going to two unsigned longs, three, etc.
Then you can hack up the actual clear_page_tables optimization (to
start) and measure the result.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/