I basically have a 33-node AMD Athlon Beowulf Cluster using the KT266
chipset. I compiled kernel 2.4.14 optimized for athlons.
If I leave the computers up for several days, without fail random nodes in
the beowulf start to drop like flies. Every other day, a different,
random node will get those Aiiiee messages and complain about some virtual
page request being invalid or somesuch, hanging the machine.
I am sure all the machines have good hardware as we ran thorough tests on
the machines using things like memtest86. I only started experiencing
problems since upgrading the kernels from the stock redhat kernels that
came with those machines.
I haven't yet tried just compiling the kernel without the Athlon
optimizations. I was wondering, though, if there are any known or
suspected issues with Athlons and the latest kernel?
Any help/advice/thoughts/even flames would be appreciated... :)
-Calin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/