> > > I would be (pleasantly) surprised to see gcc turn a C memcpy into faster
> > > assembly than our current implementation. And I'll bet
> >
> > gcc has hand-coded assembly inside itself, if gcc compiled memcpy is slower
> > than hand-optimized one, you found a compiler bug.
>
> Not at all. gcc compiled memcpy just has no knowledge of things like
> non-temporal stores, and using mmx/sse to move 64 bits at a time
non-temporal stores are bypassing cache? Is it always good idea?
> instead
> of 32 bit registers. (It's only recently it got prefetch abilities
> too).
gcc knows about mmx/sse, and could decide to use it.
Pavel
-- Casualities in World Trade Center: ~3k dead inside the building, cryptography in U.S.A. and free speech in Czech Republic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/