You need to be careful with this stuff. Cache effects dominate.
I believe the /dev/sgX driver uses a fixed kernel-side buffer for
the transfer. So the source of the copy_to_user() will always come
out of cache if the CPU is snooping the busmastering. But not if
the CPU is performing cache invalidates in response to that busmastering.
But for `dd', which has to copy the data out of pagecache, the
copy_from_user() will get 100% misses on the source, guaranteed.
Also, the `sg_read' command reads everything into the same (small)
chunk of userspace memory. So the destination of copy_to_user()
is always in cache. Probably, the same is true with `dd bs=512',
but one would have to go read the dd source to verify.
This is also why the scsi_debug driver runs so much faster than normal
devices: it copies everything out of a fixed in-kernel buffer. ie:
out of L1 cache. Fast.
Similarly, `sg_dd' against scsi_debug is copying a fixed kernel buffer
into a fixed userspace buffer But when `dd' tries to do the same thing
it incurs an additional copy into the pagecache. If the pagecache
readahead window exceeds your L1 cache size (it does) then it will
appear to be a lot slower.
Summary: the block layer ain't slow - it's memory which is slow ;)
-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/