Counter-kludge for 2.5.x hanging when writing to block device

Adam J. Richter (adam@yggdrasil.com)
Tue, 3 Jun 2003 01:48:58 -0700


For at least the past few months, the Linux 2.5 kernels have
hung when I try to write a large amount of data to a block device.
I most commonly notice this when trying to clear a disk with a command
like "dd if=/dev/zero of=/dev/discs/disc1/disc". Sometimes doing
an mkfs on a big file system is enough to cause the hang.
I wrote a little program to repeatedly write a 4kB block of zeroes
to the kernel so I could track how far it got before hanging, and it
would write 210-215MB of zeroes to the disk on a computer that had
512MB of RAM before hanging. When these hangs occur, other processes
continue to run fine, and I can do syncs, which return, but the
hung process never resumes. In the past, I've verified with a
printk that it is looping in balance_dirty_pages, repeatedly
calling blk_congestion_wait, and never leaving the loop.

Here is a counter-kludge that seems to stop the problem.
This is certainly not the "right" fix. It just illustrates a way
to stop the problem.

By the way, I say "counter-kludge", because I get the impression
that blk_congestion_wait is itself a kludge, since it calls
blk_run_queues and waits a fixed amount of time, 100ms in this case,
potentially a big waste of time, rather than awaiting some more
accurate criterion.

Adam J. Richter __ ______________ 575 Oroville Road
adam@yggdrasil.com \ / Miplitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

--- linux-2.5.70-bk7/mm/page-writeback.c 2003-06-02 14:02:39.000000000 -0700
+++ linux/mm/page-writeback.c 2003-06-02 13:59:31.000000000 -0700
@@ -177,7 +177,12 @@
if (pages_written >= write_chunk)
break; /* We've done our duty */
}
+#if 0 /* AJR */
blk_congestion_wait(WRITE, HZ/10);
+#else
+ blk_run_queues();
+ break;
+#endif
}

if (nr_reclaimable + ps.nr_writeback <= dirty_thresh)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/