It is up.
>
> > If the fix is to avoid page_launder in these cases then the number of
> > occurrences when an alloc_pages fails will go up.
>
> > I was attempting to come up with a way of making try_to_free_buffers
> > fail on buffers which are being processed in the generic_make_request
> > path by marking them, the problem is there is no single place to reset
> > the state of a buffer so that try_to_free_buffers will wait for it.
> > Doing it after the end of the loop in generic_make_request is race
> > prone to say the least.
>
> I really want to fix things like this in 2.5. (ie not avoid the deadlock
> by completly avoiding physical IO, but avoid the deadlock by avoiding
> physical IO on the "device" which is doing the allocation)
>
> Could you send me your code ? No problem if it does not work at all :)
>
Well, the basic idea is simple, but I suspect the implementation might
rapidly become historical in 2.5. Basically I added a new buffer state bit,
although BH_Req looks like it could be cannibalized, no one appears to check
for it (is it really dead code?).
Using a flag to skip buffers in try_to_free_buffers is easy:
===========================================================================
Index: linux/fs/buffer.c
===========================================================================
--- /usr/tmp/TmpDir.3237-0/linux/fs/buffer.c_1.68 Sat Jun 30 12:56:29 2001
+++ linux/fs/buffer.c Sat Jun 30 12:57:52 2001
@@ -2365,7 +2365,7 @@
/*
* Can the buffer be thrown out?
*/
-#define BUFFER_BUSY_BITS ((1<<BH_Dirty) | (1<<BH_Lock) | (1<<BH_Protected))
+#define BUFFER_BUSY_BITS ((1<<BH_Dirty) | (1<<BH_Lock) | (1<<BH_Protected) | (1<<BH_Clamped))
#define buffer_busy(bh) (atomic_read(&(bh)->b_count) | ((bh)->b_state & BUFFER_BUSY_BITS))
/*
@@ -2430,7 +2430,11 @@
spin_unlock(&free_list[index].lock);
write_unlock(&hash_table_lock);
spin_unlock(&lru_list_lock);
- if (wait) {
+ /* Buffers in the middle of generic_make_request processing cannot
+ * be waited for, they may be allocating memory right now and be
+ * locked by this thread.
+ */
+ if (wait && !buffer_clamped(tmp)) {
sync_page_buffers(bh, wait);
/* We waited synchronously, so we can free the buffers. */
if (wait > 1 && !loop) {
===========================================================================
Index: linux/include/linux/fs.h
===========================================================================
--- /usr/tmp/TmpDir.3237-0/linux/include/linux/fs.h_1.99 Sat Jun 30 12:56:29 2001
+++ linux/include/linux/fs.h Sat Jun 30 07:05:37 2001
@@ -224,6 +224,8 @@
BH_Mapped, /* 1 if the buffer has a disk mapping */
BH_New, /* 1 if the buffer is new and not yet written out */
BH_Protected, /* 1 if the buffer is protected */
+ BH_Clamped, /* 1 if the buffer cannot be reclaimed
+ * in it's current state */
BH_Delay, /* 1 if the buffer is delayed allocate */
BH_PrivateStart,/* not a state bit, but the first bit available
@@ -286,6 +288,7 @@
#define buffer_mapped(bh) __buffer_state(bh,Mapped)
#define buffer_new(bh) __buffer_state(bh,New)
#define buffer_protected(bh) __buffer_state(bh,Protected)
+#define buffer_clamped(bh) __buffer_state(bh,Clamped)
#define buffer_delay(bh) __buffer_state(bh,Delay)
#define bh_offset(bh) ((unsigned long)(bh)->b_data & ~PAGE_MASK)
The tricky part which I had not worked out how to do yet is to manage the
clearing of a state bit in all the correct places. You would have to set it
when the buffer got locked when I/O was about to start, it becomes clearable
after the last memory allocation during the I/O submission process. I do
not like the approach because there are so many ways a buffer can go
once you get into generic_make_request. At first I thought I could just
explicitly set and clear a flag around memory allocations like the bounce
buffer path. However, that can lead to AB BA deadlocks between multiple
threads submitting I/O requests. At this point I started to think I was
going to build an unmaintainable rats nest and decided I had not got
the correct answer.
I am not sure that an approach which avoids a specific device will fly either,
all the I/O can be on one device, and what does device mean when it comes
to md/lvm and request remapping?
Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/