Re: Pentium SSE prefetcht0 instruction... How do you make it work

Tony Hagale (tony@hagale.net)
Thu, 27 Sep 2001 13:43:53 -0500 (CDT)


On Thu, 27 Sep 2001, Bulent Abali wrote:

> I have a system with a large external L3 cache. Uses Pentium III
> Coppermine (stepping 3) processor. L3 block size is relatively long. I
> am trying to increase the L3 performance by prefetching L3 lines. I
> thought prefetcht0, t1, t2, nta instructions may help. These instructions
> prefetch L1/L2 lines in to the processor therefore they should prefetch L3
> as well. However I see no benefit. It is as if prefetch never make it to
> the front side bus. I took the example of arch/i386/lib/mmx.c to implement
> the asm level routines. Perhaps I am overlooking something. I'd
> appreciate any suggestions. /Bulent
>

Intel's p3/4 prefetch instructions are hints only. They are only executed
asynchronously, and depend heavily on the other load on the processor at
the time. They are not required to prefetch, *and* they are not required
to be executed when you think they should in the flow of the program. You
can serialize them by using an MFENCE instruction, but they still aren't
guaranteed to run.

Check the p4 manuals. In fact, I'm not sure prefetch was implemented in
p3. I could be wrong, check the manual.

--Tony

-- 

Tony Hagale -- tony@hagale.net -- http://tony.hagale.net "Quis custodiet ipsos custodiens?"

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/