What was happening was rather interesting. The init process was stuck
inside prepare_namespace(), in the while loop here (this is lines 749
- 751 of init/main.c):
pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
if (pid>0)
while (pid != wait(&i));
The installer had sent a HUP signal to init. The init process thus
had current->sigpending == 1. When it called wait, it got down into
sys_wait4 which worked out that there were children but none were
zombies, and at that point it would normally sleep, but because there
were signals pending, it returned -ERESTARTSYS. Now, on the way out
from the system call, the kernel noticed that it was returning to
kernel mode and thus didn't deliver any signals, and sigpending stayed
at 1.
Thus the system was sitting in a tight loop calling wait() over and
over again in kernel mode in the init process.
This was on PPC. I had a look at the i386 code and AFAICS it will do
the same thing. The check for whether we are returning to user mode
is in do_signal there (whereas PPC does the check in entry.S) but the
net effect in both cases is that we don't execute the main body of
do_signal when we are returning from a syscall from a process running
in kernel mode.
I'm not sure what the best way to fix this is. The problem would crop
up whenever we have a kernel thread which wants to wait for a child
process. I don't think we want to start delivering signals to kernel
threads in the same way that we do to usermode processes though.
Any suggestions?
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/