The systems fail at relatively low temperatures. While the failures
are not specifically memory related (ECC errors are never a factor),
we have a memory test that's pretty good at triggering them. Data is
apparently getting corrupted on the front-side bus.
Here's the curious thing: when we run the same memory test on a
Windows 2000 system (same hardware; we just swap the disk), we can
run the ambient temperature up to 60C with no problem at all; the
test will run for days. (It occurred to us to try Win2K because the
hardware vendor was using it to test systems at temperature without
seeing problems.)
Swap in the Linux disk, and at that temperature it'll barely run at
all. The memory test fails quickly at 40C ambient.
FWIW, CPU cooling is pretty good in this box.
So, the puzzle: what might account for temperature sensitivity, of
all things, under Linux 2.4.9-31 (RH 7.2), but not Win2K?
-- /Jonathan Lundell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/