Same, my dual reports:
[bcrl@toomuch ~]$ ./a.out
nothing: 11 cycles
locked add: 11 cycles
cpuid: 68 cycles
Which is pretty good.
> That said, it _can_ be real even on SMP. There's no reason why a memory
> barrier would have to be as heavy as it is on some machines (even the P4
> looks positively _fast_ compared to most older machines that did memory
> barriers on the bus and took hundreds of much slower cycles to do it).
I had discussions with a few people from intel about the p4 having much
improved locking performance, including the ability to speculatively
execute locked instructions. How much of that is enabled in the current
cores is another question entirely (gotta love microcode patches).
-ben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/