So, I'm looking for a solution, preferrably a set of patches, to help with
the problems described below.
On Monday, I installed a 2.4.20 kernel on our Dell PowerEdge 6600. This
machine is configured as follows:
4x 1.6Ghz Xeon CPUs
8GB RAM
Built-in:
ATI Rage 128 graphics
dual Broadcomm Gigabit Ethernet
serial/parallel/usb
Adaptec SCSI
Dell PERC3/DC (AMI/LSI MegaRAID) dual-channel
The kernel was built from the linux-2.4.20.tar.bz2 from kernel.org, and
patched with only the lvm-1.0.7 and linux-2.4.20 VFS locking patches from
Sistina's LVM-1.0.7 package.
The primary problem: Whenever any process (or set of processes) initiates
intensive disk I/O, the system grinds to a halt, kswapd and kupdated
consume upwards of 40% to 60% CPU each, and system load averages can jump
upwards of 21.00. The problem can be replicated with a simple find command
("find / -print" seems to do it nicely).
I have had two rather painful nights dealing with this (Monday and Tuesday
nights). Luckily, I have a serial null-modem cable rigged up between the
troubled server and another server, and was able to capture all the info
from the Magic Sysrq commands that I could.
Full details are at http://castandcrew.com/~gregory/lkmlstuff/burpr/2.4.20
I've included the kernel config, the kernel and initrd images, the system
map file, output from "ps auxfww" and a couple screen scrapings from top,
and captures from magic sysrq commands from both crashes.
I had problems like this with 2.4.19, and was directed to apply a patch to
inode.c, which appears to be part of a patch set for 2.4.19pre9aa2. I've
archived it at:
http://castandcrew.com/~gregory/lkmlstuff/burpr/2.4.19/patches/10_inode-highmem-2
For 2.4.19, this solves _most_ of the stability issues, but I still have to
work with the LVM people and possibly whomever is responsible for the VM in
2.4.19/2.4.20 to track down some kernel oopses (possibly a seperate
problem.)
I will happily provide whatever other information is needed, though my
opportunities to test things on the machine in question is limited by the
fact that it's a production server.
Thanks in advance,
Gregory
-- Gregory K. Ruiz-Ade <gregory@castandcrew.com> Sr. Systems Administrator Cast & Crew Entertainment Services, Inc.- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/