The problem with this approach is the waitqueue - you get several
tasks on the waitqueue, and bdflush loses the race - some other
thread steals the r1bh and bdflush goes back to sleep.
Replacing the wait_event() with a special raid1_wait_event()
which unplugs *each time* the caller is woken does help - but
it is still easy to deadlock the system.
Clearly this approach is racy: it assumes that the reserved buffers have
actually been submitted when we unplug - they may not yet have been.
But the lockup is too easy to trigger for that to be a satisfactory
explanation.
The most effective, aggressive, successful and grotty fix for this
problem is to remove the wait_event altogether and replace it with:
run_task_queue(tq_disk);
current->policy |= SCHED_YIELD;
__set_current_state(TASK_RUNNING);
schedule();
This can still deadlock in bad OOM situations, but I think we're
dead anyway. A combination of this approach plus the PF_FLUSH
reservations would work even better, but I found the PF_FLUSH
stuff was sufficient.
> Mind you, if I was really serious about being
> gentle on the memory allocation, I would use
> kmem_cache_alloc(bh_cachep,SLAB_whatever)
> instead of
> kmalloc(sizeof(struct buffer_head), GFP_whatever)
get/put_unused_buffer_head() should be exported API functions.
-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/