There are 2 issues with this implementation that will likely not effect you.
First, when dumping core of MT applications with LOTS of threads the pthread
library signals all the threads in the application to exit. Sometimes
the process that is dumping core fails to suspend other threads in the
application before some exit. The result of this is that for such
applications you will not see them in the core file.
You have to work at it to see this failure. The way I reproduce this is to
run a test application with about 555 pthread threads in it and send it a
sig_quit. When I look at the core file wont have all 555 threads. SMP makes
this effect a bit more noticeable.
Ingo's design to fix this change the exit path for thread to wait for the
core file to get dumped before finishing the process clean up. I like this
approach, I just wish I thought of it ;)
Second, the controversial issue is in the way my design pauses the other
threads in the MT application. Its not semaphore lock safe. Although no
instance of the following failure has been seen, it is possible with new
kernel code.
If one of the processes in the MT application is currently holding semaphore
lock when the dumping process pauses it, AND the dumping process does any
blocking operation that could attempt to grab this same semaphore, THEN the
core dump will deadlock. Boom.
My patch is good for developers, pending the back port of Ingo's version.
Do let me know how it works out for you.
--mgross
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/