I completed a couple of runs of tiobench, and thought I would share a
few notes. The test was a 128 thread tiobench, using varying queue
sizes: ranging from 128 per list (the default), 512 per list, and 4096
per list. IO performance generally increased with bigger queues, for now
lets just look at profiles though.
Testing was done on a 9GB SCSI drive, SMP machine, 512megs of RAM. File
system used was ext2. Kernel used was 2.5.45-vanilla with and without
the deadline-rbtree patch.
With 128 requests per list, the stock kernel has the following relevant
oprofile results:
c0236000 1972 0.0233741 deadline_merge
c0232cd0 683 0.00809559 __make_request
c0235f30 315 0.00373369 deadline_find_hash
c02364a0 283 0.0033544 deadline_add_request
c0236310 248 0.00293954 deadline_move_requests
c02363b0 196 0.00232319 deadline_next_request
rbtree version has the following highscore list:
c0232cd0 560 0.00680737 __make_request
c02361d0 267 0.00324566 deadline_rb_find
c0235f80 234 0.00284451 deadline_hash_add
c0235fe0 225 0.0027351 deadline_hash_find
c0236100 204 0.00247983 deadline_rb_add
c0236660 157 0.00190849 deadline_next_request
No regression when compared to the stock settings, in fact it looks
pretty darn good.
With 512 requests per list, stock kernel scores:
c0236000 6094 0.0860452 deadline_merge
c0232cd0 714 0.0100814 __make_request
c0235f30 545 0.00769521 deadline_find_hash
c02364a0 307 0.00433474 deadline_add_request
c0236310 234 0.003304 deadline_move_requests
c02363b0 163 0.0023015 deadline_next_request
and rbtree version:
c0232cd0 665 0.00950867 __make_request
c0235fe0 350 0.00500456 deadline_hash_find
c02361d0 287 0.00410374 deadline_rb_find
c0235f80 236 0.0033745 deadline_hash_add
c0236100 197 0.00281685 deadline_rb_add
c0236660 121 0.00173015 deadline_next_request
stock kernel shows worse behaviour with 512 requests vs 128, rbtree
version is basically the same. Final test was with 4096 requests per
list. stock kernel highscore:
c0236000 11753 0.228824 deadline_merge
c0232cd0 707 0.0137649 __make_request
c0235f30 625 0.0121684 deadline_find_hash
c02364a0 309 0.00601604 deadline_add_request
c02363b0 134 0.0026089 deadline_next_request
and rbtree for 4096 requests:
c0232cd0 1152 0.0163495 __make_request
c0235fe0 931 0.013213 deadline_hash_find
c02361d0 523 0.00742254 deadline_rb_find
c0235f80 424 0.00601751 deadline_hash_add
c0236100 293 0.00415832 deadline_rb_add
c0236660 166 0.00235591 deadline_next_request
Now the back merge hash table is getting too small, obviously. It's
worth nothing that with 4096 requests, deadline_merge in the stock
kernel is the biggest cpu user in the kernel apart from
copy_to/from_user and mark_offset_tsc.
As expected, the stock version O(N) insertion scan really hurts. Even
with 128 requests per list, rbtree version is far superior. Once bigger
lists are used, there's just no comparison whatsoever.
Full profile numbers are available from:
http://www.kernel.org/pub/linux/kernel/people/axboe/misc/
-- Jens Axboe- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/