[snip]
Sounds very similar. Our servers are all identical (except for RAM).
What's unusual is that the machines we *expect* to fail sooner - don't
(not even an oops). Those are very busy cache servers (several of them
in a sibling cluster) which do a lot of swapping. The machines which
*do* fail (or oops without any further catastrophe) are typically
web/mail hosting servers (reasonably busy with about 25% swap being
used). Increasing swap did not help on 2.4.5. We're still waiting for
something to happen on 2.4.6 (ie, oops already appeared - waiting for
meltdown, which, hopefully, will not occur). We used to auto-reboot
every morning at 2am or something to keep things stable - which I
*hate* because I remember having a 2.0.35/6 workstation that had an
uptime of 6 months a couple of years ago. God, I loved that box.
cheers
Henry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/