Do you have RAID1 on the disks ?
Apparently "noapic" option helps, e.g. breaking the SYMMETRIC part of SMP.
You may also try "nmi_watchdog=1", if you have serial console attached
to the box for kernel message logging (and command).
> Since the kernel doesn't provide any info in syslog when it dies, I just
> ran a vmstat 30 to a file and waited for the next untimely demise.
> Here's what happened when it died last time. Note the sudden surge in
> disk activity (bi)
Yes, looks familiar. My hangups have been during high disc activity too.
My box is located into a place into which I have difficult access, e.g.
I can't use it to collect the debug data, and do magics (press reset)
to recover.
> I'd be more than willing to collect any other data required here, just
> let me know what would be of assistance. Note though that I only have
> remote access to this box, so getting magic sysrq info could be
> difficult/impossible (tho I do have console access if that helps).
>
> Thanks,
>
> Phil Oester
/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/