The test reports the fastest time for 100 kmalloc calls in a tight loop
(Duron 700). Loop/test overhead substracted.
32-byte alloc:
current: 41 ticks
generic_fls: 56 ticks
bsrl: 54 ticks
4096 byte alloc: 84 ticks
generic_fls: 53 ticks
bsrl: 54 ticks
40 ticks difference for -current between 4096 and 32 bytes - ~4 cycles
for each loop.
bit scan is 10 ticks slower for 32 byte allocs, 30 ticks faster for 4096
byte allocs.
No difference between generic_fls and bsrl - the branch predictor can
easily predict all branches in generic_fls for constant kmalloc calls.
-- Manfred- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/