Re: [PATCH] 2.5.14 IDE 56

Andi Kleen (ak@muc.de)
Fri, 10 May 2002 03:06:45 +0200

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Henrik Mitsch: "kernel panic, 2.4.7 (Redhat 7.2) and i don't know why"
Previous message: Alexander Viro: "Re: PATCH: kernel mount of initrd fails unless mke2fs uses 1024 byte"

On Fri, May 10, 2002 at 02:48:15AM +0200, Andrew Morton wrote:
> Andi Kleen wrote:
> >
> > Andrew Morton <akpm@zip.com.au> writes:
> >
> > > For bulk read() and write() I/O the best sized buffer is 8 kbytes. 4k is
> > > pretty good, too. Anything larger blows the user-side buffer out of L1.
> > > This is for x86.
> >
> > Modern x86 support prefetch hints for the CPU to tell it to not
> > pollute the caches with "streaming data". I bet using them would
> > be a big win.
>
> Maybe. For your basic:
>
> for (many) {
> read(fd1, buf, 8192);
> write(fd2, buf, 8192);
> }
>
> you want `buf' cached, but not the pagecache for fd1 and fd2.
> If the prefetch hints can express that then yes, nice.

SSE has prefetchnta

3dnow has something similar.

In addition you can use movnti* for stores. These should be faster
because they use write combining and avoid the latency of fetching
the cache line of the destination just to overwrite it.

The tricky bit is to avoid prefetches over the boundary of your copy.
Prefetching from an uncached area or write combined area (like the
AGP gart which could start in next page) triggers hardware bugs in
various boxes. This unfortunately complicates the prefetching loops
a bit.

>
> > The rep ; movsl loop used in copy*user isn't
> > very good on modern x86 anyways (it is ok on PPro, but loses on Athlon
> > and P4)
>
> On PII and PIII, rep;movsl is slower than an open-coded
> duff-device copy for all src/dest alignments except for
> the case where both are eight-byte-aligned. By up to
> 20%, iirc. four-byte-aligned to four-byte-aligned isn't
> too bad.

That's surprising. AFAIK on PPro rep ; movs does magic prefetch
tricks in microcode, so it should be eventually faster if you do
not use explicit prefetching and you're not cache hot for
bigger copies (in smaller ones the setup overhead may dominate)

On Athlon rep ; movs loses clearly compared to an unrolled loop.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Henrik Mitsch: "kernel panic, 2.4.7 (Redhat 7.2) and i don't know why"
Previous message: Alexander Viro: "Re: PATCH: kernel mount of initrd fails unless mke2fs uses 1024 byte"