Re: Gigabit/SMP performance problem

Avery Fay (avery_fay@symantec.com)
Mon, 6 Jan 2003 15:25:40 -0500


Right now, I have 4 interfaces in and 4 interfaces out (ideal routing
setup). I'm using just shy of 1500 byte udp packets for testing.

I tried binding the irqs for each pair of interfaces to a cpu... so for
example, if eth0 to sending to eth2 they would be bound to the same cpu.
This seemed to improve performance a little, but I didn't get definite
numbers and it certainly wasn't much.

I'm currently playing around with UP kernels, but when I go back I'll
check out softnet_stat

Avery Fay

Robert Olsson <Robert.Olsson@data.slu.se>
01/03/2003 04:20 PM


To: "Avery Fay" <avery_fay@symantec.com>
cc: linux-kernel@vger.kernel.org
Subject: Gigabit/SMP performance problem

Avery Fay writes:
>
> I'm working with a dual xeon platform with 4 dual e1000 cards on
different
> pci-x buses. I'm having trouble getting better performance with the
second
> cpu enabled (ht disabled). With a UP kernel (redhat's 2.4.18), I can
route
> about 2.9 gigabits/s at around 90% cpu utilization. With a SMP kernel
> (redhat's 2.4.18), I can route about 2.8 gigabits/s with both cpus at
> around 90% utilization. This suggests to me that the network code is
> serialized. I would expect one of two things from my understanding of
the
> 2.4.x networking improvements (softirqs allowing execution on more than

> one cpu):

Well you have a gigabit router :-)

How is your routing setup? Packet size?

Also you'll never get increased performance of a single flow with SMP.
Aggregated performance possible at best. I've been fighting with for some

time too.

You have some important data in /proc/net/softnet_stat which are per cpu
packets received and "cpu collisions" should interest you.

As far as I understand there no serialization in forwarding path except
where
it has to be -- when we add softirq's from different cpu into a single
device.
This seen in "cpu collisions"

Also here we get into inherent SMP cache bouncing problem with TX
interrupts
When TX has skb's which are processed/created in different CPU's. Which
CPU
gonna take the interrupt? No matter how we do we run kfree we gona see a
lot
of cache bouncing. For systems that have same in/out interface
smp_affinity
can be used. In practice this impossible for forwarding.

And this bouncing hurts especially for small pakets....

A litte TX test illustrates. Sender on cpu0.

UP 186 kpps
SMP Aff to cpu0 160 kpps
SMP Aff to cpu0, cpu1 124 kpps
SMP Aff to cpu1 106 kpps

We are playing some code that might decrease this problem.

Cheers.
--ro

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/