However, rdtsc will completely change how your decoders fill, how busy your
execution units are, it will interfere with every singe stage in the CPU
from the fetch logic to the retirement and write buffer.
Your cycle will not behave "as usual" if you put rdtscs around it.
I usually put an rdtsc in the very beginning of a routine, and an rdtsc
at the end. Then I have the assembly following the last rdtsc increment
a counter in an array (after a bounds check). The index in the array
is the number of clock cycles I spent.
After running the function some thousands of times, you will have a nice
histogram in your array, usually with some skewed bell-like curve (you can
see many interesting kinds of double/triple bells etc. all depending on
your code).
Changing just one or two instructions in the function, then re-running the
code and re-plotting the histogram, will displace the peak of the bell curve
in some direction. This will tell you how your change affected the code
in "typical number of clock cycles".
-- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob Østergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/