RE: [STATUS 2.5] October 30, 2002

Ed Vance (EdV@macrolink.com)
Fri, 1 Nov 2002 14:25:55 -0800


On Fri, November 01, 2002 at 11:56 AM, Richard B. Johnson wrote:
> [...]
> So, ten seconds after you have some cosmic-ray upset, you guarantee
> that your machine will crash if you read everything every ten
> seconds. This will never be acceptable. You need to leave the
> machine alone and not try to "pick scabs". That's how you get
> the best reliability. Also, at some periodic intervals, you
> re-boot (restart) the whole machine, reinitializing everything
> including all the RAM.
>
Here's a Monty Python analogy to ECC memory scrubbing:

Do you remember the battle between Arthur and the Black Knight?

Without scrubbing, the memory bits suffer damage at a more or less constant
rate, like the Black Knight. The damage accumulates and eventually renders
the Black Knight non-functional. For the memory, this would be an
uncorrectable error from the accumulation of many separate bit error events.

With scrubbing, the memory bits and the Black Knight suffer damage at the
same rate, but this time the Black Knight is able to stick his limbs back on
(while fighting) after Arthur hacks them off. If the Black Knight's rate of
sticking his limbs back on equals Arthur's rate of hacking his limbs off,
the Black Knight will sustain the same amount of damage, but will remain
functional as long as he can keep up. For the memory, the many separate bit
error events would cause only correctable errors, as long as the scrubbing
can keep up.

cheers,
Ed
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/