> Yay! We noticed that if a controller fails in such a way that
> no interrupts are generated then md driver doesn't notice anything is
> wrong. Commands don't fail, but don't complete either.
Uhhhhh. They should timeout though and then be counted as errors; hard to
catch this differently.
You are saying it doesn't work?
> (Better than putting a timeout on every command.) Also, md multipath
> doesn't notice if the backup path has failed, to warn the user that
> redundancy is no longer in effect. (Though if you set up things so i/o
> is going down both paths, not such a big deal, as md will notice.
> Probably you know all this already.
Yes. We intentionally don't do active _monitoring_ on non-active paths, same
as we don't do reprobing of failed paths to see whether they are alive again.
(The LVM m-p patch does periodically send down live requests to failed paths
to check this; I consider this intentional data corruption, but I'm
paranoid.)
This is something which can easily be implemented safely in user-space
though; the md approach still exposes the lower level devices and you can
check them periodically if you care, and be it with a
while sleep 1 ; do
if ! dd if=/dev/sdaX of=/dev/null ; then
mdadm /dev/md0 --fail /dev/sdaX
logger -p kern.alert "Path /dev/sdaX failed!"
fi
done
;-) Obviously, not something we need to run in kernel space.
> Well, the cciss driver is not a SCSI driver (except for purpsoes of
> tape drives & tape changers) and HP/Compaq has sold more than one
> million of those controllers (does popularity mean they aren't
> "weird"? :-), and we have mulitpath capable storage boxes they
> can connect to.
Indeed. Yes, we'll need to figure out how to do this for 2.5/2.6; maybe
porting forward the md m-p patch to 2.5 is indeed the best choice. It should
be way easier, as md has been greatly cleaned up...
However, past discussions on LKML regarding "How to do m-p cleanly in 2.5"
have never reached a conclusion ;-) We'll see. The good thing about the SCSI
m-p is that it can also handle multipathed tape drives...
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
-- Principal Squirrel SuSE Labs - Research & Development, SuSE Linux AG "If anything can go wrong, it will." "Chance favors the prepared (mind)." -- Capt. Edward A. Murphy -- Louis Pasteur - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/