Re: Some functions are not inlined by gcc 3.2, resulting code is ugly

Denis Vlasenko (vda@port.imtp.ilyichevsk.odessa.ua)
Mon, 4 Nov 2002 14:00:20 -0200


On 3 November 2002 19:28, Jussi Laako wrote:
> On Mon, 2002-11-04 at 02:17, Denis Vlasenko wrote:
> > Alignment does not eliminate jump. It only moves jump target to 16
> > byte boundary.
>
> Exactly. And P4 cache is _very_ bad at anything not 16-byte aligned.
> The speed penalty is big. This seems to be problem only with Intel
> CPU's, no such large effects on AMD ones.

So we are going to waste that space in _all_ kernels just for Intel P4
being stupid? I compile fernels for _486_! At home I have a Duron.
I don't want to pay "P4 tax" ;)

Also, once it gets cached at first access, subsequent accesses won't
hurt that much, right?

> > This _probably_ makes execution slightly faster but on average
> > it costs you 7,5 bytes. This price is too high when you take into
> > account L1 instruction cache wastage and current bus/core clock
> > ratios.
>
> 7.5 bytes is not much compared to possibility of trashed cache or
> pipeline flush.

You definitely need to play with objdump -d just to see those 7.5 bytes
everywhere.

Pipeline flush is moot, you have jump penalty regrdless of alignment.

> Do you have execution time numbers of jump to 16-byte aligned address
> vs unaligned address?

I will do.

Let me say this clear:

If you do million iterations loop, maybe you should align it.

It is very dumb to align everything in the hope it will
magically make your kernel run 2x faster.

--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/