> That modulo is likely slower than dereference.
>
> > + if (count % 256 == 0) {
You are forgetting that this case should be converted to and 255 or a
plain byte reference by any optimizing compiler - and gcc surely is,
on x86 this code can be reduced to around 2 cycles (Pentium: mov, or, jnz,
with preceding code intertwined to cancel stalls and jnz being likely in
the code buffer)...
Maciek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/