Great question! The answer is that you are absolutely right. SGI tried
a pile of things in this area, both on NUMA and on traditional SMPs (the
NUMA stuff was more page migration and the SMP stuff was more process
migration, but the problems are the same, you screw up the cache). They
never got the page migration to give them better performance while I was
there and I doubt they have today. And the process "migration" from CPU
to CPU didn't work either, people tended to lock processes to processors
for exactly the reason you alluded to.
If you read the early hardware papers on SMP, they all claim "Symmetric
Multi Processor", i.e., you can run any process on any CPU. Skip forward
3 years, now read the cache affinity papers from the same hardware people.
You have to step back and squint but what you'll see is that these papers
could be summarized on one sentence:
"Oops, we lied, it's not really symmetric at all"
You should treat each CPU as a mini system and think of a process reschedule
someplace else as a checkpoint/restart and assume that is heavy weight. In
fact, I'd love to see the scheduler code forcibly sleep the process for
500 milliseconds each time it lands on a different CPU. Tune the system
to work well with that, then take out the sleep, and you'll have the right
answer.
----- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/