FS-corrupting IDE bug still in 2.4.20-rc3

Neil Conway (nconway_kernel@yahoo.co.uk)
Wed, 27 Nov 2002 10:23:44 +0000 (GMT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Marc-Christian Petersen: "Fwd: Re: [OOPS] 2.4.20-rc4-ac1 (also occurs 2.4.20-rc2-ac3) in radeon DRI for XFree86"
Previous message: Michael Kaufman: "null TTY in tty_fasync"
Next in thread: Marc-Christian Petersen: "Re: FS-corrupting IDE bug still in 2.4.20-rc3"
Reply: Marc-Christian Petersen: "Re: FS-corrupting IDE bug still in 2.4.20-rc3"

--0-1190404082-1038392624=:21963
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Content-Disposition: inline

Guys - you may remember this one from May this year.

I've been off-list and not paying much attention since Andre
acknowledged it was a bug (and didn't like my patch).

I recently needed to compile 2.4.19 and was surprised to find the bug
still present. On examining 2.4.20-rc3 it still seems to be there too
-- no time to compile yet, sorry, but since 2.4.20 is imminent I
thought I should err on the side of caution and remind people about the
bug.

Let me be very clear: this bug has corrupted filesystems on three
machines of mine. All of these had PIIX chipsets. I have also
reproduced it on a VIA chipset, but since that machine was production I
didn't try very hard to corrupt the fs.

The patch is not a real fix, merely a workaround. But since 6 months
have already elapsed, can I request that the patch be applied now, and
when Andre creates a proper fix we can use that.

I've updated the comments in the patch to reflect the fact that I now
realise it's not only DMA transfers that can be trashed by the bug.

Neil

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com
--0-1190404082-1038392624=:21963
Content-Type: text/plain; name="ide_patch_2.4.20-rc3_261102.txt"
Content-Description: ide_patch_2.4.20-rc3_261102.txt
Content-Disposition: inline; filename="ide_patch_2.4.20-rc3_261102.txt"

--- ide-features.c.orig Wed Nov 27 09:34:18 2002
+++ ide-features.c Wed Nov 27 10:06:49 2002
@@ -272,12 +272,39 @@
*/
int ide_config_drive_speed (ide_drive_t *drive, byte speed)
{
+ ide_hwgroup_t *hwgroup = HWGROUP(drive);
ide_hwif_t *hwif = HWIF(drive);
int i, error = 1;
- byte stat;
+ byte stat,unit;
+ unsigned long flags;
+
+ spin_lock_irqsave(&io_request_lock, flags);
+ /*
+ * XXXXX FIXME:
+ * The next test is a band-aid. This is because this routine can be
+ * called while the hwgroup is busy - e.g., after a DMA or PIO
+ * transfer has been initiated. Known culprits: so far, the only
+ * known way to trigger the bug is to load an IDE CD module (both
+ * ide-scsi and ide-cd count) - on most chipsets, this ultimately
+ * causes a call to this routine with no regard for the busy-ness of
+ * the hwgroup. If a transfer is in progress, then as soon as we issue
+ * the SELECT_DRIVE() command below, we trash it. This has caused
+ * fs corruption (it probably shouldn't!).
+ *
+ * The RIGHT way to deal with this is probably either to queue the
+ * call for execution when the hwgroup isn't busy, OR (dodgy?) to sleep
+ * right here in this routine until it isn't busy. We also now have
+ * to use the io_request_lock spinlock to keep SMP systems honest.
+ * This lot is temporary, pending a real fix. NJC 9/5/02, 26/11/02
+ */
+ if (hwgroup) if (hwgroup->busy) {
+ spin_unlock_irqrestore(&io_request_lock, flags);
+ printk("Argh: hwgroup is busy in ide_config_drive_speed\n");
+ return error;
+ }

#if defined(CONFIG_BLK_DEV_IDEDMA) && !defined(CONFIG_DMA_NONPCI)
- byte unit = (drive->select.b.unit & 0x01);
+ unit = (drive->select.b.unit & 0x01);
outb(inb(hwif->dma_base+2) & ~(1<<(5+unit)), hwif->dma_base+2);
#endif /* (CONFIG_BLK_DEV_IDEDMA) && !(CONFIG_DMA_NONPCI) */

@@ -338,6 +365,7 @@
enable_irq(hwif->irq);

if (error) {
+ spin_unlock_irqrestore(&io_request_lock, flags);
(void) ide_dump_status(drive, "set_drive_speed_status", stat);
return error;
}
@@ -371,6 +399,7 @@
case XFER_SW_DMA_0: drive->id->dma_1word |= 0x0101; break;
default: break;
}
+ spin_unlock_irqrestore(&io_request_lock, flags);
return error;
}

--0-1190404082-1038392624=:21963--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Marc-Christian Petersen: "Fwd: Re: [OOPS] 2.4.20-rc4-ac1 (also occurs 2.4.20-rc2-ac3) in radeon DRI for XFree86"
Previous message: Michael Kaufman: "null TTY in tty_fasync"
Next in thread: Marc-Christian Petersen: "Re: FS-corrupting IDE bug still in 2.4.20-rc3"
Reply: Marc-Christian Petersen: "Re: FS-corrupting IDE bug still in 2.4.20-rc3"