Re: Oops in prune_dcache (2.4.0-prerelease)

Udo A. Steinberg (sorisor@Hell.WH8.TU-Dresden.De)
Wed, 03 Jan 2001 05:04:32 +0100


Hi,

Linus Torvalds wrote:
>
> The strange thing is that 0x01000000 value, which almost certainly should
> just be NULL. A one-bit error.
>
> Now, I assume this machine has been historically stable, with no history
> of memory corruption problems.. It's entirely possible (and likely) that
> the one-bit error is due to some wild kernel pointer. Which makes this
> _really_ hard to debug.

Yes the machine is otherwise rock stable, not overclocked and memory timings
are rather conservative. Before the oops the machine had been compiling some
major application for like 5 hours and maybe the excessive stress kicked a
bit somewhere - who knows.

> I'll try to think about it some more, but I'd love to have more reports to
> go on to try to find a pattern..

That's one I can't reproduce. I've just run memtest86 over the entire ram
and it doesn't show any oddities - which doesn't really rule out an
occassional bit-flip due to neutrino storms though ;-)
If someone else has seen something similar lately, it's time to speak up.

-Udo.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/