Re: Intel P6 vs P7 system call performance

Jamie Lokier (lk@tantalophile.demon.co.uk)
Sun, 22 Dec 2002 16:00:34 +0000


Mikael Pettersson wrote:
> Manfred had a version with fixed MSR values and the varying data
> in memory. Maybe that's actually faster.

The stack pointer is already changed on context switches since Ingo
changed the kernel to use per-cpu TSS segments.

Manfred's code used a tiny stack (without a valid task struct,
a different method of recovering than Linus' code). You can get that
stack down to 6 words, (3 for debug trap, 3 more for nested NMI).

Combining these leads to an IMHO beautiful hack, which does work btw:
the 6 words fit into unused parts of the per-CPU TSS (just before
tss->es). The MSR has a constant value:

wrmsr(MSR_IA32_SYSENTER_ESP, (u32) &tss->es, 0);

I found the fastest sysenter entry code looks like this:

sysenter_entry_point:
cld # Faster before sti.
sti # Re-enable interrupts after next insn.
movl -68(%esp),%esp # Load per-CPU stack from tss->esp0.

with appropriate fixups at the start of the NMI and debug trap handlers.

enjoy,
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/