This is a know problem. It is caused by the way 2.2+ "current" works on i386.
It is at the bottom of the kernel stack and is computed from the stack pointer
using an and. The kernel stack needs to be page aligned because of the
allocator and the way the and works.
The reason current cannot be put into a normal global variable is that there is
usually no easy way to find out which CPU you're running on and the global
variable would need to be indexed by the CPU. Also using a real global on UP
is a real loser in terms of code size on i386 (several KB difference)
This unfortunately means that task_struct ends up on similar cache
colours and depending on the CPU could not use the caches very well.
One hope for i386 is that future CPUs have better caches so that too
aggressive cache colouring may not be worth it (actually I think that was already
hoped for the P2)
E.g. on IA64, m68k, x86-64 and other architectures which have enough registers
the kernel can afford to just put current into a global register variable.
This means a cache colouring allocation could be used for the task_struct because
it doesn't need to be on a maskable address [the ports currently do not do that,
but they could]
It would be interesting if you could also rerun your tests with prefetching
in the scheduler loop enabled. I can supply a patch for that.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/