I have a S2460 with dual 1800MPs using BIOS rev 1.04. I had very similar
problems (random hangs, sometimes after 2 minutes, sometimes after 36
hours). Here's what I did to solve them:
1) Turn off power management in the BIOS. I still have power management
enabled in linux and all is fine.
2) (this is the most important one) Make sure you have a minimum of a 500
watt power supply. Each CPU alone is rated for 66 watts of consumption.
3) I still get random hangs at boot (usually after rebooting linux) and I
believe this is due to some ACPI problem. A hard reboot (turn the power
supply off and on) fixes it for me.
4) There are a couple bugs with the 760MP chipset and APICs. To see if
they're affecting you, add "mem=nopentium noapic" to your kernel
parameters (I can run fine without them).
> 4. Then I noticed that the CPU1 heatsink was quite warm (maybe 70C
> feeling around the thick bit of the aluminium) whereas CPU0 heatsink is
> just above room temp.
>
> 5. Checking the Winbond monitoring in the BIOS** menu, it comes up
> showing both CPU's at 77C, then as you hit keys it takes proper
> readings, and claims both CPUs within 1-2 degrees of each other (??). It
> seems accurate on fan speeds though. Both fans running pretty fast,
> 5500-6200 RPM.
My BIOS reports the right temps but lm_sensors didn't. I too was getting
temps in the 75C+ range. To fix lm_sensors, do the following:
echo "2" > /proc/sys/dev/sensors/w83782d-i2c-0-2d/sensor1
echo "2" > /proc/sys/dev/sensors/w83782d-i2c-0-2d/sensor2
echo "2" > /proc/sys/dev/sensors/w83782d-i2c-0-2d/sensor3
> 7. Brought it up to single user mode console, to see if it was video
> card etc. - did some testing of just letting it mostly idle (while true
> - uptime - sleep 1 - etc.) and locked up 1-2 more times.
I thought it was my video card too... so I went out and spent $90 on a new
one only to find it does the same thing.
> 8. Rebooted again, now it's up and running and appears stable (still 1
> CPU), so I took it up to full init 5 and it stayed up (and so I'm
> writing this email :-) Once or twice seemed to stall again for 1-2
> seconds (interrupt storm ???) but recovered.
I notice this sometimes too... I chalk it up to some SMP locking
somewhere. Currently up 6 days, 3:53 with the maximum around 40 days
(rebooted to upgrade kernel).
> Other observation, possibly unrelated: the unpacking of the kernel seems
> very slow for an otherwise pretty quick machine - the dots when it says
> "Loading xxx..." tick at about 1 per second, much like a laptop with
> PC-66 memory, compared with 4-5 per second for the Pentium III
> 800/PC-133 motherboard I just hauled out.
When mine hasn't reset right (the aforementioned ACPI lockup), mine does
this. It was especially prevalent before I upgraded my power supply from
400 to 550 watts
> ** The temperature sensor driver stuff didn't seem to come with the
> kernel ??
pick up the lm_sensors package
-- Ken Witherow <phantoml AT rochester.rr.com> ICQ: 21840670 AIM: phantomlordken http://www.krwtech.com/ken
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/