The story so far:
I've been continuing to muck around with the stack, trying both to improve
overall performance, and specifically to improve rx relative to tx
performance, primarily in gig-and-beyond (e.g., Quadrics) environments.
To this end, I have begun by profiling and analyzing the RX side stack.
The profiling is being done as I write, and the analysis is what prompts
me to write.
The direct question:
How many times is data copied between the time that it is received at the
NIC and when the user's call to read() returns the data?
The reason for the question:
I could've sworn I heard the stack was single-copy on both the TX and RX
sides. But, it doesn't look to me like it is. Rather, it looks like there
is one copy in tcp_rcv_estabilshed() (via tcp_copy_to_iovec()), and a
second copy in tcp_recvmsg() (which is called when the user calls read()).
Both of these copies are, I believe, done by skb_copy_datagram_iovec().
The ancilary questions:
If I am wrong about this- does anyone care to publicly humiliate me by
telling me how/why (and possibly calling me stupid)?
If I am right about this- is there a specific reason that it is
implemented this way? Are there any thoughts on changing it? Our specific
inclination is to keep the skbs around until the user calls read(), at
which point we do an iovec memcopy to the userspace buffer, eliminating a
copy- the danger here is if the user doesn't read from the socket, this
might needlessly lock up skbs. To avoid this, we can implement either some
watermark or timeout for skb-consilidation- if the user doesn't call
read() soon enough or before too many skbs are used we copy the skbs to a
socket buffer like normal.
Cheers,
--Gus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/