You should understand that this is mostly the special case.
Atlas loop is hand tuned to compile well with gcc 2.x.x and 3.x.x
prduces worse code on it.
I've tracked down and fixed problem that made Atlas loop to run out
of registers in 3.1.x so it works well again. (It is not that dificult
to prepare corresponding patch for 3.0.x in case there is interest)
There was nothing wrong with the scheduler and the analysis on page
are somewhat missleading. Real problem was that gcc "forgotten" about
posibility of using memory operand in certain cases of commutative
i387 fp instructions requiring one additional register. (this happent as
result of two independent major change sin the compiler)
This register is not available in the loop curefully written for 8 registers
and causes the performance drop.
In specFP perofmrance, gcc 3.0.1 is about 4% better on specfp according to
the results at http://www.suse.de/~aj/SPEC
Honza
> > > >
> > > >
> > > > I would be interested to know if this apply to gcc 3.1 too.
> > >
> > > Well, concerning reg-stack, you can completely get away without it in 3.1
> > > by using -mfpmath=sse if you are targeting Pentium 3,4 or Athlon 4,xp,mp
> > > (for float math, for higher precision only for Pentium 4).
> >
> > Yes, but the lot of users (like me) who are still using Athlon TB, 1330 or
> > 1400 Mhz, and who do not have any reason to upgrade to MP since the
> > performance gain is not really considerable, they cannot use sse instructions.
> > So, what could they do? should they stay with gcc 2.95?
>
> Linux kernel doesn't use floating point math at all, so this is irrelevant
> on lkml, moving to an more appropriate list...
>
> Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/