In theory, practice and theory are the same...
I think the point I'm trying to make is that the VM stuff costs something
and it shouldn't be that hard to dummy up a system call to measure it.
It was counterintuitive as hell at SGI that the VM stuff would cost that
much and the reasons are subtle. Part of the problem turned out to be
falling out of the instruction cache - the network stack and the VM system
didn't fit and that left no room at all for the app.
If you are trading instruction cache misses for data misses, err, dude,
I think that might be a problem. The point is to process all the data
with less, not more, cache misses, right? In fact, if we agree on that
then that leads you to considering the various ways you could do this
and maybe your way is the right way but maybe there is a less cache
intensive way.
If you're right you're right, so peace. But I'd like the definition of
"right" to be "less cache misses to do the same thing". In fact, if
I managed to communicate only one thing in my entire set of rants and
it was "pay attention to cache misses", hey, that'd be cool with me.
That's how you make things go fast and I like fast.
Think about it, a 3GHz machine is a .3ns clock cycle and the suckers are
super scalar and hyper threaded and all that crud. Memory is about 133ns
away. That's 400 clocks of stall for each cache miss. Lotta code can run
in 400 clocks of super scalar/hyper threaded/fully buzzword enabled processors.
----- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/