>> I wonder whether it would be a good idea to give the linux-fs
>> (namely my preferred reiser and ext2 :-) some fault-tolerance.
>
> Fault tollerance should be done at a lower level than the filesystem.
I would (partly) disagree. On the FS level, you would still have to
deal with the data having gone away (or become corrupted). Simply
passing a (known) corrupted block to a FS isn't going to do anything
useful. Having the FS know that "this data is known crap" could tell it
to
a) go look at a backup structure (e.g. one of the many superblock
copies)
b) guess (e.g. in disk allocation bitmap, just think of them all as
used)
c) fail with error (e.g. "cannot read directory due to a physical
problem with the disk"
d) try to reconstruct the data (e.g. search around the disk for magic
numbers)
<snip>
> The filesystem doesn't know or care what device it is stored on, and
> therefore shouldn't try to predict likely failiures.
but it should be tolerant of them and able to recover to some extent.
Generally, the first sign that a disk is dying (to an end user) is when
really-weird-stuff(tm) starts happening. A nice error message from the
file system when they try to go into the directory (or whatever) would
be a lot nicer.
You could generalize the failure down to an extents type record (i.e.
offset and length) which would suit 99.9% of cases (i think :). In the
case of post-detection of error, the extra effort is probably worth it.
these kinda issues are coming up in my honors thesis too, so there
might even be the (dreaded) code and discussion sometime near the end
of the year :)
------------------------------
Stewart Smith
stewartsmith@mac.com
Ph: +61 4 3884 4332
ICQ: 6734154
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/