Now, I can either buy new drives and dd the raw partition over, or I
can hack the kernel to make it a bit smarter about unrecoverable
reads.
Obviously, if I have raid5_error not mark the drive bad, it will hammer
away on it, failing over and over. My thought was to let the read
fail, catch it in raid5_end_read_request then tag the stripe_head
with the device that's failed. If one has already failed, return
EIO. This way further reads on the stripe_head will go to the parity
disk (until it's eventually freed. One IO error per stripe isn't too
harsh a price to pay for disaster recovery)
in 2.4.20, I'm at raid5.c:421 where we're about to call md_error.
What happens to the bh from that point? Obviously, it's not up-to-date,
so when 1 drive fails how does it get re-issued to be pulled from a parity
drive to reconstruct it?
Please CC me, I read via the archives.
Thanks,
--Dan
Obviously, this would ONLY be for recovery in the face of bad sectors.
As quickly as possible the bad drives need to be replaced
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/