Maybe. For your basic:
for (many) {
read(fd1, buf, 8192);
write(fd2, buf, 8192);
}
you want `buf' cached, but not the pagecache for fd1 and fd2.
If the prefetch hints can express that then yes, nice.
> The rep ; movsl loop used in copy*user isn't
> very good on modern x86 anyways (it is ok on PPro, but loses on Athlon
> and P4)
On PII and PIII, rep;movsl is slower than an open-coded
duff-device copy for all src/dest alignments except for
the case where both are eight-byte-aligned. By up to
20%, iirc. four-byte-aligned to four-byte-aligned isn't
too bad.
Of course, a lot of copy_*_users are well-aligned. But
a lot are not. I ended up deciding that switching to
the duff-device copy would be a very small overall win, when
you weight it by the alignment patterns of normal kernel
usage.
But making a runtime slection of which copy function to
use (based on src/dest alignment) could speed up the
kernel's most expensive function by maybe 10-15% overall.
The test proggy is in http://www.zip.com.au/~akpm/linux/cptimer.tar.gz
-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/