yes. The figures below show this. Disabling SG+checksums speeds
up write() and send().
> Split buffers are more expensive and we have to pay for this.
> You have paid too much for slow card though. 8)
>
> Do you measure load correctly?
Yes. Quite confident about this. Here's the algorithm:
1: Run a cycle-soaker on each CPU on an otherwise unloaded
system. See how much "work" they all do per second.
2: Run the cycle-soakers again, but with network traffic happening.
See how much their "work" is reduced. Deduce networking CPU load
from this difference.
The networking code all runs SCHED_FIFO or in interrupt context,
so the cycle-soakers have no effect upon the network code's access
to the CPU.
The "cycle-soakers" just sit there spinning and dirtying 10,000
cachelines per second.
> > 2.4.1-pre10+zercopy, using read()/write(): 39.2% CPU * hardware tx checksums disabled
>
> This is illegal combination of parameters. You force two memory accesses,
> doing this. The fact that it does not add to load is dubious. 8)8)
mm.. Perhaps with read()/write() the data is already in cache?
Anyway, I've tweaked up the tool again so it can do send() or
write() (then I looked at the implementation and wondered why
I'd bothered). It also does TCP_CORK now.
I ran another set of tests. The zerocopy patch improves sendfile()
hugely but slows down send()/write() significantly, with a 3c905C:
http://www.uow.edu.au/~andrewm/linux/#zc
The kernels which were tested were 2.4.1-pre10 with and without the
zerocopy patch. We only look at client load (the TCP sender).
In all tests the link throughput was 11.5 mbytes/sec at all times
(saturated 100baseT) unless otherwise noted.
The client (the thing which sends data) is a dual 500MHz PII with a
3c905C.
For the write() and send() tests, the chunk size was 64 kbytes.
The workload was 63 files with an average length of 350 kbytes.
CPU
2.4.1-pre10+zerocopy, using sendfile(): 9.6%
2.4.1-pre10+zerocopy, using send(): 24.1%
2.4.1-pre10+zerocopy, using write(): 24.2%
2.4.1-pre10+zerocopy, using sendfile(): 16.2% * checksums and SG disabled
2.4.1-pre10+zerocopy, using send(): 21.5% * checksums and SG disabled
2.4.1-pre10+zerocopy, using write(): 21.5% * checksums and SG disabled
2.4.1-pre10-vanilla, using sendfile(): 17.1%
2.4.1-pre10-vanilla, using send(): 21.1%
2.4.1-pre10-vanilla, using write(): 21.1%
Bearing in mind that a large amount of the load is in the device
driver, the zerocopy patch makes a large improvement in sendfile
efficiency. But read() and send() performance is decreased by 10% -
more than this if you factor out the constant device driver overhead.
TCP_CORK makes no difference. The files being sent are much larger
than a single frame.
Conclusions:
For a NIC which cannot do scatter/gather/checksums, the zerocopy
patch makes no change in throughput in all case.
For a NIC which can do scatter/gather/checksums, sendfile()
efficiency is improved by 40% and send() efficiency is decreased by
10%. The increase and decrease caused by the zerocopy patch will in
fact be significantly larger than these two figures, because the
measurements here include a constant base load caused by the device
driver.
-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/