Re: Are linux-fs's drive-fault-tolerant by concept?

Jos Hulzink (josh@stack.nl)
Sat, 19 Apr 2003 23:13:53 +0200


On Saturday 19 Apr 2003 17:29, Alan Cox wrote:
> On Sad, 2003-04-19 at 17:04, Stephan von Krawczynski wrote:
> > after shooting down one of this bloody cute new very-big-and-poor IDE
> > drives today I wonder whether it would be a good idea to give the
> > linux-fs (namely my preferred reiser and ext2 :-) some fault-tolerance. I
> > remember there have been some discussions along this issue some time ago
> > and I guess remembering that it was decided against because it should be
> > the drivers issue to give the fs a clean space to live, right?
>
> Sometimes disks just go bang. They seem to do it distressingly more
> often nowdays which (while handy for criminals and pirates) is annoying
> for the rest of us. Putting magic in the file system to handle this is
> hard to do well, and at best you get things like ext2/ext3 have now -
> the ability to recover data in the event of some corruption, unless you
> get into really fancy stff.

Basically, disks 1) die or 2) get bad sectors. Unfortunately, all disk
problems I had so far belong in the 1st category. There is nothing to recover
there, or it must be done by professionals (electrical / mechanical
reconstruction of the drive) Talking about the second category: any disk has
ECC these days, and recoverable errors (sector dying, but data valid) are
detectable and can be handled (badblocks + sector remapping). This all has
nothing to do with filesystems.

Now there is one error left: The unrecoverable data error. Basically this
means you can't trust the data of an entire sector. It might be possible that
only one bit is wrong, true, but for any read/write mounted filesystem, you
don't want to continue beyond this point before a decent filesystem check has
been done. It might be an option to mount a partition readonly as soon as
errors are discovered (don't make the mess bigger than it is already).

Fault tolerance in a filesystem layer means in practical terms that you are
guessing what a filesystem should look like, for the disk doesn't answer that
question anymore. IMHO you don't want that to be done automagically, for it
might go right sometimes, but also might trash everything on RW filesystems.

Fault tolerance OK, but the fs layer should only detect errors reported by the
lower level drivers and handle them gracefully (which is something that might
need impovement a little for some fs drivers), or else trust the data it
gets.

Jos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/