According to my benchmarks the slowdown on context switch is a lot
more than 5-9% on P4:
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
with wrmsr Linux 2.5.54 2.410 3.5600 6.0300 3.9900 34.8 8.59000 43.7
no wrmsr Linux 2.5.54 1.270 2.3300 4.7700 2.5100 29.5 4.16000 39.2
That looks more like between 10%-51%
[Note I don't trust the numbers completely, the slowdown looks a bit too
extreme especially for the 16p case. But it is clear that it is a lot
slower]
I haven't benchmarked pushfl/popfl, but I cannot imagine it being that
slow to offset that. I agree that syscalls are a slightly hotter path than the
context switch, but hurting one for the other that much looks a bit
misbalanced.
>
> And part of the reason for the huge speedup is that the vsyscall/sysenter
> path is actually pretty much the fastest possible. Yes, it would have been
I can think of some things to speed it up more. e.g. replace all the
push / pop in SAVE/RESTORE_ALL with sub $frame,%esp ; movl %reg,offset(%esp)
and movl offset(%esp),%reg ; addl $frame,%esp. This way the CPU has
no dependencies between all the load/store options unlike push/pop.
(that is what all the code optimization guides recommend and gcc / icc
do too for saving/restoring of lots of registers)
Perhaps that would offset a pushfl / popfl in kernel mode, may be worth
a try.
-Andi
P.S.: For me it is actually good if the i386 context switch is slow.
On x86-64 I have some ugly wrmsrs in the context switch for the
64bit segment base rewriting too and slowing down i386 like this will
make the 64bit kernel look better compared to 32bit ;););)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/