Thanks for the reply.
What I'd like to see is a scheme where we route interrupts preferentially to
idle CPUs. If none are idle, then aim them at the CPUs running the "least
important" tasks. Also, since each CPU's local APIC only has two interrupt
latches per `level' (the upper nibble of the IRQ's vector), it would be a
good idea to avoid sending IRQs to those CPUs that are already processing one.
Since we only have 4 bits in each XPTR, suppose we used the high bit to
indicate IRQ-in-progress and the bottom three for some kind of compressed
task goodness measure. Idle equals zero. The XTPR would be updated on every
interrupt entry/exit and on every task switch.
What's the catch? The cost of doing the above. Plus, we still only have two
CPUs in each cluster, four if hyperthreading is turned on. (Of course, the
hyperthreaded "sibling processors" don't have the power of a full CPU --
maybe 20% to 40% of one.) All interrupts have to be targeted at a particular
cluster. So, lowest priority interrupt delivery may not buy us very much.
To make matters even more interesting, the economy version (non-NUMA) of this
hardware started shipping last month and the full version will be shipping
soon. I wonder if Marcelo is going to allow this kind of futzing around with
interrupt and scheduler code in 2.4....
-- James Cleverdon, IBM xSeries Platform (NUMA), Beaverton jamesclv@us.ibm.com | cleverdj@us.ibm.com- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/