SP> Well, not necessarily. It might be that data just hasn't "arrived" yet because
SP> of the movntq instruction.
So why it is oopses then?
Also, we don't want this data to arrive late or whatever.
fast_copy_page must copy page (make it so that memcpy()==0).
If it does not, it is too much "optimized".
SP> One thing that also puzzels me is that my is the fast_copy_page() routine laid
SP> out like this :
SP> "2: movq (%0), %%mm0\n"
SP> " movntq %%mm0, (%1)\n"
SP> " movq 8(%0), %%mm1\n"
SP> " movntq %%mm1, 8(%1)\n"
SP> " movq 16(%0), %%mm2\n"
SP> " movntq %%mm2, 16(%1)\n"
SP> " movq 24(%0), %%mm3\n"
SP> " movntq %%mm3, 24(%1)\n"
SP> " movq 32(%0), %%mm4\n"
SP> " movntq %%mm4, 32(%1)\n"
SP> " movq 40(%0), %%mm5\n"
SP> " movntq %%mm5, 40(%1)\n"
SP> " movq 48(%0), %%mm6\n"
SP> " movntq %%mm6, 48(%1)\n"
SP> " movq 56(%0), %%mm7\n"
SP> " movntq %%mm7, 56(%1)\n"
SP> When it's more intuitively more effective to fill the registers with reads first
SP> and then write it with "movntq" like this :
SP> "2: movq (%0), %%mm0\n"
SP> " movq 8(%0), %%mm1\n"
SP> " movq 16(%0), %%mm2\n"
SP> " movq 24(%0), %%mm3\n"
SP> " movq 32(%0), %%mm4\n"
SP> " movq 40(%0), %%mm5\n"
SP> " movq 48(%0), %%mm6\n"
SP> " movq 56(%0), %%mm7\n"
SP> " movntq %%mm0, (%1)\n"
SP> " movntq %%mm1, 8(%1)\n"
SP> " movntq %%mm2, 16(%1)\n"
SP> " movntq %%mm3, 24(%1)\n"
SP> " movntq %%mm4, 32(%1)\n"
SP> " movntq %%mm5, 40(%1)\n"
SP> " movntq %%mm6, 48(%1)\n"
SP> " movntq %%mm7, 56(%1)\n"
A better way to do it is to bencmark several routines at
startup time and pick the best one. It is done now
for RAID xor'ing routine.
-- Best regards, VDA mailto:VDA@port.imtp.ilyichevsk.odessa.ua http://port.imtp.ilyichevsk.odessa.ua/vda/
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/