The system has two cpus on a chip sharing a l2 cache, 4 of these chips
built into a multichip module and up to 4 of these modules connected
together. The l3 and memory is distributed amongst the modules.
As you can imagine there is a hierarchy of latencies but the ratio is
quite low.
> OK, now I'm going to have to build a bigger system ;-)
I've got 8 more cpus in store too :)
> I have some strange plans for the lru stuff, but it'll take me a while.
> I'm curious as to why lru_cache_del is so much lower in your list than
> add, whereas the ratio for me is about:
>
> 719 lru_cache_add 7.8152
> 477 lru_cache_del 21.6818
Not sure about that.
> > 6714 .local_flush_tlb_range ppc64 specific
> > 2773 .local_flush_tlb_page ppc64 specific
>
> Do you know what's causing the tlb flushes? Just context switches?
Thats due to the way we manipulate the ppc hashed page table. Every
time we update the linux page tables we have to update the hashed
page table. There are some obvious optimisations we need to make,
hopefully then this will go away. The tlb flushes here are probably
process exits and things like COW faults.
> > 554 .d_lookup
>
> Did you try the dcache patches?
Not for this, I did do some benchmarking of the RCU dcache patches a
while ago which I should post.
> Can you publish lockmeter stats?
I didnt get a chance to run lockmeter, I tend to use the kernel profiler
and use a hacked readprofile (originally from tridge) that displays
profile hits vs assembly instruction. Thats usually good enough to work
out which spinlocks are a problem.
Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/