Re: [PATCH] Re: SSE related security hole

Andi Kleen (ak@suse.de)
Sat, 20 Apr 2002 21:41:14 +0200


> Besides, I seriously doubt it is any faster than what is there already.
>
> Time it, and notice how:
>
> - fninit takes about 200 cycles
> - fxrstor takes about 215 cycles

On what CPU?

I checked the Athlon4 optimization manual and fxrstor is listed as 68/108
cycles (i guess depending on whether there is XMM state or not so 68 cycles
probably apply here) and fninit as 91 cycles. It doesn't list the SSE1
timings, but i guess the instructions don't take more than 3 cycles
(MMX instructions take that long). So Andrea's way should be
91+16*3=139+some cycles for emms (or 107 if sse ops take only a single cycle)
vs 68 or 108. So the fxrstor wins well.

On x86-64 the difference is even bigger because it has 16 XMM registers instead
of 8.

> In short, your "fast" code isn't actually any faster than doing it right.

At least on Athlon it should be slower.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/