I did that originally, but timings from Jamie convinced me that it's
actually a quite noticeable overhead for the system call path.
You should realize that the 5-9% slowdown in schedule (which I don't like)
comes with a 360% speedup on a P4 in simple system call handling (which I
_do_ like). My P4 does a system call in 428 cycles as opposed to 1568
cycles according to my benchmarks.
And part of the reason for the huge speedup is that the vsyscall/sysenter
path is actually pretty much the fastest possible. Yes, it would have been
faster just from using sysenter/sysexit, but not by 360%. The other
speedups come from not reloading segment registers multiple times
(noticeable on a PIII, not a P4) and from avoiding things liek the flags
pushing.
NOTE! We could trivially speed up the task switching by making
"load_esp0()" a bit smarter. Right now it actually re-writes _both_
SYSENTER_CS and SYSENTER_ESP on a taskswitch, and that's because a process
that was in vm86 mode will have cleared SYSENTER_CS (so that sysenter will
cause a GP fault inside vm86 mode).
Now, that SYSENTER_CS thing is very rare indeed, and by keeping track of
what the previous value was (ie just caching the SYSENTER_CS value in the
thread_struct), we could get rid of it with a conditional jump instead.
Want to try it?
> This would also eliminate the random IOPL problem Luca noticed.
Nope, it wouldn't. A "popfl" in user mode does nothing for iopl. You have
to have the popfl in kernel mode.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/