Re: [PATCH] queue barrier support

James Bottomley (James.Bottomley@SteelEye.com)
Fri, 15 Feb 2002 10:15:58 -0500


James.Bottomley@steeleye.com said:
> A further issue is that you haven't added anything to the error
> recovery code for this. If error recovery is activated for the device
> at the reset level, all tags will be discarded by the device. The eh
> will retry the failing command and then the other tagged commands will
> be re-issued from the scsi_bottom_half_handler (assuming the low level
> device driver immediately fails them with DID_RESET) in the order in
> which the low level driver failed them. Thus you have potentially
> completely messed up the ordering when the commands all get retried.

axboe@suse.de said:
> I already have this fixed locally by just maintaining a fifo list of
> queued commands so we can retry them in the correct order. For 2.5
> there's a busy and free list (so no more scanning for a free command
> either), I would imagine that the easiest for 2.4 is to just maintain
> a busy list in addition to the current array of pending/free commands.

mason@suse.com said:
> I was wondering about this, we would need to change the error handler
> to fail all the requests after the barrier. I was hoping the driver
> did this for us ;-)

Unfortunately, this is going to involve deep hackery inside the error handler.
The current initial premise is that it can simply retry the failing command
by issuing an ABORT to the tag and resending it (which can cause a tag to move
past your barrier). In an error situation, it really wouldn't be wise to try
to abort lots of potentially running tags to preserve the barrier ordering
(because of the overload placed on a known failing component), so I think the
error handler has to abandon the concept of aborting commands and move
straight to device reset. We then carefully resend the commands in FIFO order.

Additionally, you must handle the case that a device is reset by something
else (in error handler terms, the cc_ua [check condition/unit attention]).
Here also, the tags would have to be sent back down in FIFO order as soon as
the condition is detected.

mason@suse.com said:
> Yes, this could get sticky. Does anyone know if other OSes have
> already done this?

Other OSs (well the ones I've heard about: Solaris and HP-UX) try to avoid
ordered tags, mainly because of the performance impact they have---the drive
tag service algorithms become inefficient in the presence of ordered tags
since they're usually optimised for all simple tags.

James

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/