I catched it now because it locks hard my alpha as soon as I play with
any rwsem testcase, not sure why x86 is apparently immune by the hard lockup.
then I also added your trick of returning the semaphore so I can declare "a"
(sem) as read only (that is an improvement for the fast path).
Because of that changes I rerun all the benchmarks. I finished now to
re-benchmark the asm version as it runs even faster than before in the write
contention, the other numbers are basically unchanged. the read down fast path
now runs exactly like yours (so yes it seems the "+a" was giving a no sensical
improvement to my code for the down read fast path).
Of course my down write fast path remains significantly faster than yours and that
really make sense because my smarter algorithm allows me to avoid all your
cmpxchg stuff.
I'm starting the benchmarks of the C version and I will post a number update
and a new patch in a few minutes.
If you can ship me the testcase (also theorical) that breaks my algorihtm in the
next few minutes that would help.
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/