No, the other one _is_ necessary. I did timings, and having it in the
context switch path made system calls clearly faster on a P4 (as
compared to my original trampoline approach).
It may be only two instructions difference ("movl xx,%esp ; jmp common")
in the system call path, but it was much more than two cycles. I don't
know why, but I assume the system call causes a total pipeline flush,
and then the immediate jmp basically means that the P4 has a hard time
getting the pipe restarted.
This might be fixable by moving more (all?) of the kernel-side fast
system call code into the per-cpu trampoline page, so that you wouldn't
have the immediate jump. Somebody needs to try it and time it, otherwise
the wrmsr stays in the context switch.
I want fast system calls. Most people don't see it yet (because you need
a glibc that takes advantage of it), but those fast system calls are
more than enough to make up for some scheduling overhead.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/