I recently setup my spiffy new SMP system. The System consists of:
Abit VP6 motherboard
Dual ! Gig Pentium III CPUs
128MB memory (for now)
EIDE boot drive on the VIA EIDE port for most of the system
RAID 5 setup using an Ultra2 ICP RAID controller (gdth driver)
After setting up the new raid I decided to run a burn in test on it with
the following script:
while [ "" = "" ]
do
rm -rv linux-2.4.16.old
mv -v linux-2.4.16 linux-2.4.16.old
cp -av linux-2.4.16.old linux-2.4.16
done
It didn't take very long. After a 15 minutes or so the entire system
deadlocked. And very badly at that. Not even the magic sysrq key
combinations worked anymore. I've spent the past few days trying to debug
the problem and this is what I've found so far.
The freezing appears to only happen when running my burn-in test on the raid
drive. (may have been one instant of it happening on the ide drive while
compiling a kernel too, however I did a file operation on the raid drive
shortly before the freeze so can't say for sure. Though the EIDE burn-in
test can run for quite a long while)
The freezing only happens on an SMP kernel on dual CPUs. (I tried out a
non-SMP kernel. No lockup.)
It doesn't appear filesystem related. (Tried both ext3 and ext2)
It doesn't appear like a hardware issue. Ran a burn-in test with FreeBSD.
(Worked beautify)
Also tried switching the RAID controller to different PCI slots and
disabling the built-in Highpoint HPT370 EIDE pseudo raid controller so that
the card didn't share an irq.
Tried different BIOS settings, no change.
Tried passing "noapic" to the kernel. Deadlock still remained.
Upgraded the BIOS. (deadlock on 2 different BIOSes)
Best I can tell there appears to be a problem with the ICP raid
controller driver (gdth) in an SMP system, or at least in an SMP system
running on this motherboard. Does anybody else have an ICP RAID controller
with a RAID 5 setup running successfully in an SMP system? If so are you on
an Abit VP6? Anyone know of any 2.4.16 kernel bug that could be doing this?
If so, is there a fix or a workaround?
Anyone know how I could debug the cause of this problem? Machine
deadlocks. Not even an Ooops so I'm short on ideas on how to track the
problem down. Please help. 8-( My new SMP system sucks on Linux. 8-(
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/