Re: pipe and af/unix latency differences between aa and jam on smp

Andrea Arcangeli (andrea@suse.de)
Tue, 9 Jul 2002 16:53:27 +0200


On Tue, Jul 09, 2002 at 10:05:58AM -0400, rwhron@earthlink.net wrote:
> > *Local* Communication latencies in microseconds - smaller is better
>
> > kernel Pipe AF/Unix
> > ----------------------------- ------- -------
> > 2.4.19-pre7-jam6 29.513 42.369
> > 2.4.19-pre8-jam2 7.697 15.274
> > 2.4.19-pre8-jam2-nowuos 7.739 14.929
>
> > (last line says that wake-up-sync is not responsible...)
>
> > Main changes between first two were irqbalance and ide6->ide10.
>
> The system is scsi only. pre7-jam6 and pre8-jam2 .config's were
> identical.
>
> > Could you try latest -rc1-aa2 ? It includes also irqbalance,
>
> Based on Andrea'a diff logs, irqbalance appeared in 2.4.19pre10aa3.
> There are small differences between the pre10-jam2 and aa irqbalance
> patches. One new datapoint with pre10-jam3:
>
> *Local* Communication latencies in microseconds - smaller is better
> -------------------------------------------------------------------
> kernel Pipe AF/Unix
> ----------------------------- ------- -------
> 2.4.19-pre10-jam2 7.877 16.699
> 2.4.19-pre10-jam3 33.133 66.825
> 2.4.19-pre10-aa2 34.208 62.732
> 2.4.19-pre10-aa4 33.941 70.216
> 2.4.19-rc1-aa1-1g-nio 34.989 52.704

now if this was AF_INET via ethernet I could imagine the irqbalance
making difference (or even irqrate even if irqrate should make no
difference until your hardware hits the limit of irqs it can handle).

but both pipe and afunix should not generate any irq load (other than
the IPI with the reschedule_task wakeups at least, but they're only
dependent on the scheduler, ipi delivery isn't influenced by the
irqrate/irqbalance patches). it's all trasmission in software internal
to the kernel, with no hardware events so no irq, so I would be very
surprised if the irqbalance or irqrate could make any difference. I
would look elsewere first at least. No idea why you're looking at those
irq related patches for this workload.

At first glance I would say either it's a compiler issue that generates
some very inefficent code one way or the other (seems very unlikely but
cache effects can be quite huge in tight loops where a very small part
of the kernel is exercised), or it has something to do with schduler or
similar core non-irq related areas.

>
> A config difference between pre10-jam2 and pre10-jam3 is:
> CONFIG_X86_SFENCE=y # pre10-jam2
> pre10-jam2 was compiled with -Os and pre10-jam3 with -O2.
>
> > Out of interest, is that a P4/Xeon?
>
> Quad P3/Xeon 700 mhz with 1MB cache.
>
> --
> Randy Hron
> http://home.earthlink.net/~rwhron/kernel/bigbox.html

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/