Andrew,
I tested Mingming's RCU ipc lock patch using a *new* microbenchmark - semopbench.
semopbench was written to test the performance of Mingming's patch.
I also ran a 3 hour stress and it completed successfully.
Explanation of the microbenchmark is below the results.
Here is a link to the microbenchmark source.
http://www-124.ibm.com/developerworks/opensource/linuxperf/semopbench/semopbench.c
SUT : 8-way 700 Mhz PIII
I tested 2.5.44-mm2 and 2.5.44-mm2 + RCU ipc patch
>semopbench -g 64 -s 16 -n 16384 -r > sem.results.out
>readprofile -m /boot/System.map | sort -n +0 -r > sem.profile.out
The metric is seconds / per repetition. Lower is better.
kernel run 1 run 2
seconds seconds
================== ======= =======
2.5.44-mm2 515.1 515.4
2.5.44-mm2+rcu-ipc 46.7 46.7
With Mingming's patch, the test completes 10X faster.
-----
2.4.44-mm2 readprofile shows 70 % of 8 CPUs spinning on .text.lock.sem :
http://www-124.ibm.com/developerworks/opensource/linuxperf/semopbench/sem.profile.1.out
2.5.44-mm2 + Mingming's patch shows that the spin on .text.lock.sem is gone :
http://www-124.ibm.com/developerworks/opensource/linuxperf/semopbench/sem.rcu.profile.1.out
Here is the semopbench results for 2.5.44-mm2 :
http://www-124.ibm.com/developerworks/opensource/linuxperf/semopbench/sem.results.1.out
Here is the semopbench results for 2.5.44-mm2 + Mingming's patch :
http://www-124.ibm.com/developerworks/opensource/linuxperf/semopbench/sem.rcu.results.1.out
-----
Here is some info on how the microbenchmark works :
>semopbench -g 64 -s 16 -n 16384 -r
-g 64 creates 64 sema4 groups
group0
group1
...
group63
-s 16 creates 16 sema4s in each group
group0 - sem0, sem1, ... sem15
group1 - sem0, sem1, ... sem15
...
group63 - sem0, sem1, ... sem15
For each of the 1024 (64*16) sema4s, a process is forked and sleeps on
it's own sema4. When the test starts, the master process will post the
sema4 for the 1st process in each group.
When the 1st process in each group wakes up it will :
(a) resets it's own sema4
(b) post the sema4 for the next process in the group
(c) waits on his own sema4
-n 16384 runs through each sema4 group in the above manner 16384 times.
semopbench reports :
(1) average microseconds that it takes each process to complete repetitions.
(2) CPU utilization
-d turns on debug printfs
-v turns on per process times.
-r does a readprofile -r , reset of the profile buffer before test starts
Bill Hartner
-- IBM Linux Technology Center Performance Team http://www-124.ibm.com/developerworks/oss/linux hartner@austin.ibm.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/