Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)

Andi Kleen (ak@muc.de)
Mon, 6 Jan 2003 03:05:42 +0100


On Mon, Jan 06, 2003 at 02:33:28AM +0100, Linus Torvalds wrote:
> > I can think of some things to speed it up more. e.g. replace all the
> > push / pop in SAVE/RESTORE_ALL with sub $frame,%esp ; movl %reg,offset(%esp)
> > and movl offset(%esp),%reg ; addl $frame,%esp. This way the CPU has
> > no dependencies between all the load/store options unlike push/pop.
>
> Last I remember, that only made a difference on Athlons, and Intel CPU's

I didn't benchmark it, but as a data point ICC 7 generates the movls instead
of pushes now too, (even though it generates bigger code). In fact it is even more
aggressive on that than gcc: gcc does it only for more than three or four registers,
icc does it for two and more. So I expect it being faster on Intel CPUs - at least on
the P4 - too. I doubt they tuned it for Athlons.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/