Again I cross-post, and hope that people will accept my apologies. I do
not know if this is a vanilla kernel problem, or something that has to
do with XFS.
I have been noticing problems on two of my boxes here with some
processes somehow getting stuck in "D" state, which cannot be killed
even by a SIGKILL by root, and stay on running forever. I have noticed
the problem, so far, on 2.4.18-xfs and 2.4.19-rc2-xfs kernels.
The 2.4.18-xfs kernel did not have any binary-only modules loaded (like
NVdriver) and aside from the modules from the kernel only had the i2c
and lm-sensors modules. It was compiled using gcc 2.95.4 and was running
on a single Pentium III.
The 2.4.19-rc2-xfs kernel had the NVdriver binary-only module loaded
(coincidentally the same unit I reported kernel BUG reports in
page_alloc.c:91 about earlier), plus kernel modules and the i2c and
lm-sensors modules. It was compiled using gcc 3.1.1 (Debian prerelease)
and was running on a single AMD Duron.
I have not found a pattern to the occurence of these stuck processes.
They don't happen at given intervales, or with only a particular type of
process, or on certain workloads or tasks.
They've happened on everything from X11 (consequently freezing the box),
to the apt-method of dpkg (consequently preventing me from doing any
apt-get installations), to the dhcp3 daemon (consequently preventing new
boxes from getting IP addresses). They've also happened on boxes with
two days uptime, or with 7 days uptime, or with 14.
So far the only "solution" I've found to this problem has been to
reboot.
There is unfortunately absolutely nothing I can find in the logs to help
explain what's going on with these processes. Because I have not figured
out how to reproduce this, I do not know which process to run through
strace or a similar tool to figure out what's going on.
I hope someone can help shed light on this. Thank you very much. I have
upgraded the 2.4.18-xfs box to 2.4.19-rc3-xfs, and will upgrade the
2.4.19-rc2-xfs box to 2.4.19-rc3-xfs as well, and will see if the
problem goes away. If anyone else is experiencing similar "processes
stuck in state 'D'" problems, please do chime in and add your
experiences about the problem.
--> Jijo
-- Federico Sevilla III : <http://jijo.free.net.ph/> Network Administrator : The Leather Collection, Inc. GnuPG Key ID : 0x93B746BE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/