>> Also, we don't want this data to arrive late or whatever.
>> fast_copy_page must copy page (make it so that memcpy()==0).
>> If it does not, it is too much "optimized".
AC> See the "sfence" instruction
I meant instrumented fast_copy_page() cannot fail due to
late movntq commit to memory since memcmp() is behind sfence
and kernel_fpu_end():
>> ...
>> kernel_fpu_end();
>> + /* Check for "Athlon bug" - remove when resolved */
>> + from-=4096;
>> + to-=4096;
>> + if(memcmp(from,to,4096)!=0) {
>> + printk(KERN_ERR "Athlon bug! from=%08X to=%08X\n", from, to);
>> + memcpy(to,from,4096);
>> + }
>> }
If we still see an oops with this instrumentation,
then fast_copy_page() must be clobbering
RAM elsewhere, right?
>> A better way to do it is to bencmark several routines at
>> startup time and pick the best one. It is done now
>> for RAID xor'ing routine.
AC> Not in this case. It is Athlon specific code. It was fine
AC> tuned when it was written
Yes, but sometimes we have routines which perform
differently on different CPUs. See inslude/asm-i386/string.h
and string-486.h: on Pentium rep movsd is faster, on 386 unrolled
loop is faster... so optimal routine can be picked only at runtime.
CPU-specific routines can compete in such runtime benchmark
too when proper processor is detected - see how KNI-specific
RAID xor routine does that.
-- Best regards, VDA mailto:VDA@port.imtp.ilyichevsk.odessa.ua http://port.imtp.ilyichevsk.odessa.ua/vda/
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/