Basically, disks 1) die or 2) get bad sectors. Unfortunately, all disk
problems I had so far belong in the 1st category. There is nothing to recover
there, or it must be done by professionals (electrical / mechanical
reconstruction of the drive) Talking about the second category: any disk has
ECC these days, and recoverable errors (sector dying, but data valid) are
detectable and can be handled (badblocks + sector remapping). This all has
nothing to do with filesystems.
Now there is one error left: The unrecoverable data error. Basically this
means you can't trust the data of an entire sector. It might be possible that
only one bit is wrong, true, but for any read/write mounted filesystem, you
don't want to continue beyond this point before a decent filesystem check has
been done. It might be an option to mount a partition readonly as soon as
errors are discovered (don't make the mess bigger than it is already).
Fault tolerance in a filesystem layer means in practical terms that you are
guessing what a filesystem should look like, for the disk doesn't answer that
question anymore. IMHO you don't want that to be done automagically, for it
might go right sometimes, but also might trash everything on RW filesystems.
Fault tolerance OK, but the fs layer should only detect errors reported by the
lower level drivers and handle them gracefully (which is something that might
need impovement a little for some fs drivers), or else trust the data it
gets.
Jos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/