I have uploaded the profiles from Anton's benchmark runs -
http://lse.sourceforge.net/locking/dcache/results/2.5.40/200-base.html
http://lse.sourceforge.net/locking/dcache/results/2.5.40/200-base-nops.html
http://lse.sourceforge.net/locking/dcache/results/2.5.40/200-mm.html
http://lse.sourceforge.net/locking/dcache/results/2.5.40/200-mm-nops.html
A quick comparison of base and base-nops profiles show this -
base :
Hits Percentage Function
75185 100.00 total
11215 14.92 path_lookup <1.html>
8578 11.41 atomic_dec_and_lock <2.html>
5763 7.67 do_lookup <3.html>
5745 7.64 proc_pid_readlink <4.html>
4344 5.78 page_remove_rmap <5.html>
2144 2.85 page_add_rmap <6.html>
1587 2.11 link_path_walk <7.html>
1531 2.04 proc_check_root <8.html>
1461 1.94 save_remaining_regs <9.html>
1345 1.79 inode_change_ok <10.html>
1236 1.64 ext2_free_blocks <11.html>
1215 1.62 ext2_new_block <12.html>
1067 1.42 d_lookup <13.html>
base-no-ps :
Hits Percentage Function
50895 100.00 total
8222 16.15 page_remove_rmap <1.html>
3837 7.54 page_add_rmap <2.html>
2222 4.37 save_remaining_regs <3.html>
1618 3.18 release_pages <4.html>
1533 3.01 pSeries_flush_hash_range <5.html>
1446 2.84 do_page_fault <6.html>
1343 2.64 find_get_page <7.html>
1273 2.50 copy_page <8.html>
1228 2.41 copy_page_range <9.html>
1186 2.33 path_lookup <10.html>
1186 2.33 pSeries_insert_hpte <11.html>
1171 2.30 atomic_dec_and_lock <12.html>
1152 2.26 zap_pte_range <13.html>
841 1.65 do_generic_file_read <14.html>
Clearly dcache_lock is the killer when 'ps' command is used in
the benchmark. My guess (without looking at 'ps' code) is that
it has to open/close a lot of files in /proc and that increases
the number of acquisitions of dcache_lock. Increased # of acquisition
add to cache line bouncing and contention.
I should add that this is a general trend we see in all workloads
that do a lot of open/closes and so much so that performance is very
sensitive to how close to / your application's working directory
is. You would get much better system time if you compile a kernel
in /linux as compared to say /home/fs01/users/akpm/kernel/linux ;-)
> And it appears that dcache-rcu made a ~10% difference on a 24-way PPC64,
> yes? That is nice, and perhaps we should take that, but it is not a
> tremendous speedup.
Hmm.. based on Anton's graph it looked more like ~25% difference for
60 or more scripts. At 200 scripts it is ~27.6%. Without the ps
command, it seems more like ~4%.
Thanks
Dipankar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/