> For the CONFIG_<<cputype>> options, is it only gcc compiler-type settings that
> are affected? I thought the assembly parts of the kernel were also switched on
> occasion.
Typically the MMX/3dnow/SSE memory copies get enabled where possible.
> I was wondering whether anyone has checked that these assembly snippets were
> decently optimal for the cpu type selected...
Arjan and a few others spent weeks tuning the memset/memcpy functions,
there's really not that much room for improvement with them imo.
I spent a week or so a while back trying in vain to squeeze out just a
few more MB/s, and failed to get good results across the board.
There are a few tricks that can be pulled to do things like copying
backwards to trick hardware prefetchers, but this starts to get into
voodoo land.
For now, the i386 optimised memcpy is probably a closed book.
x86-64 may reopen that book to revisit some lessons learned..
> > > In a lot of cases, the kernel 'knows better' and is adding the
> Be careful about 'knowing better' than compilers. I really don't want to start
> a flamefest, but modern compilers are very good at doing all sorts of
> optimisations, and hand-coded 'optimisations' can _sometimes_ actually hurt
> performance over the unadorned version of the code.
I would be (pleasantly) surprised to see gcc turn a C memcpy into faster
assembly than our current implementation. And I'll bet
$beverage_of_choice it'll stay that way for some time.
gcc has come on in leaps and bounds, but for something as performance
critical as memory copying/clearing, hand tuned assembly wins hands down.
Even AMD's/Intel's performance guides suggest doing this. It's the only
way to fly.
Dave.
-- | Dave Jones. http://www.codemonkey.org.uk | SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/