Ok, both your trace and Bob's trace show the problem clearly. thanks
to both for the helpful feedback btw.
The deadlock happens because of a collision between write_some_buffers()
and the GFP_NOHIGHIO logic. The deadlock was not introduced in the vm
rewrite but it was introduced with the nohighio logic.
The problem is that we are locking a couple of buffers, and later - after
they're all locked - we start writing them via write_locked_buffers.
The deadlock happens in the middle of write_locked_buffers when we hit
an highmem buffer, so while allocating with GFP_NOHIGHIO we end doing
sync_page_buffers on any page that isn't highmem, but that incidentally is one of the
other next buffers in the array that we previously locked in
write_some_buffers but that aren't in the I/O queue yet (so we'll wait
forever since they depends on us to be written).
Robert just confirmed that dropping the NOHIGHIO logic fixes the
problem.
So the fix is either:
1) to drop the NOHIGHIO logic like my test patch did
2) or to keep track of what buffers we must not wait while releasing
ram
I'll try approch 2) in the below untested patch (the nohighio logic make
sense so I'd prefer not to drop it), Robert and Bob, can you give it a
spin on the highmem boxes and check if it helps?
I suggest to test it on top of 2.4.10+vm-tweaks-2.
--- 2.4.10aa2/fs/buffer.c.~1~ Wed Sep 26 18:45:29 2001
+++ 2.4.10aa2/fs/buffer.c Fri Sep 28 00:04:44 2001
@@ -194,6 +194,7 @@
struct buffer_head * bh = *array++;
bh->b_end_io = end_buffer_io_sync;
submit_bh(WRITE, bh);
+ clear_bit(BH_Pending_IO, &bh->b_state);
} while (--count);
}
@@ -225,6 +226,7 @@
if (atomic_set_buffer_clean(bh)) {
__refile_buffer(bh);
get_bh(bh);
+ set_bit(BH_Pending_IO, &bh->b_state);
array[count++] = bh;
if (count < NRSYNC)
continue;
@@ -2519,7 +2521,9 @@
int tryagain = 1;
do {
- if (buffer_dirty(p) || buffer_locked(p)) {
+ if (unlikely(buffer_pending_IO(p)))
+ tryagain = 0;
+ else if (buffer_dirty(p) || buffer_locked(p)) {
if (test_and_set_bit(BH_Wait_IO, &p->b_state)) {
if (buffer_dirty(p)) {
ll_rw_block(WRITE, 1, &p);
--- 2.4.10aa2/include/linux/fs.h.~1~ Wed Sep 26 18:51:25 2001
+++ 2.4.10aa2/include/linux/fs.h Fri Sep 28 00:01:54 2001
@@ -217,6 +217,7 @@
BH_New, /* 1 if the buffer is new and not yet written out */
BH_Async, /* 1 if the buffer is under end_buffer_io_async I/O */
BH_Wait_IO, /* 1 if we should throttle on this buffer */
+ BH_Pending_IO, /* 1 if the buffer is locked but not in the I/O queue yet */
BH_PrivateStart,/* not a state bit, but the first bit available
* for private allocation by other entities
@@ -277,6 +278,7 @@
#define buffer_mapped(bh) __buffer_state(bh,Mapped)
#define buffer_new(bh) __buffer_state(bh,New)
#define buffer_async(bh) __buffer_state(bh,Async)
+#define buffer_pending_IO(bh) __buffer_state(bh,Pending_IO)
#define bh_offset(bh) ((unsigned long)(bh)->b_data & ~PAGE_MASK)
Thanks,
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/