Well I know what the problem is, but I don't know how to fix
it in 2.4.
With 4 adapters and six disks on each, you have 27 processes
which are responsible for submitting IO: the 24 bonnies,
kswapd, bdflush and kupdate.
If one or two of the request queues gets filled up, *all* these
threads hit those queues and go to sleep. Nobody is submitting
IO for the other queues and they fall idle.
In 2.4, we have a global LRU of dirty buffers and everyone walks
that list in old->new order. It has a jumble of buffers which
are dirty against all the queues so inevitably, as soon as one
queue fills up it blocks everyone.
A naive fix would be to get callers of balance_dirty() to skip
over buffers in that queue which do not belong to their blockdev.
But the CPU cost of that search would be astronomical.
A more intrusive fix would be to make callers of balance_dirty()
walk the superblock->inode->i_dirty[_data]_buffers list instead
of the buffer LRU. That's a sort-of-2.5 approach.
But even that wouldn't help, because then you hit the second
problem: your 24 bonnie threads hit the same queue congestion
in the page reclaim code when they encounter dirty pages on
the page LRU.
I have a fix for the first problem in 2.5. And the second problem
(the page reclaim code) I have sorta-bandaided.
So. Hard.
I haven't tested this yet, but you may get some benefit from
this patch:
--- linux-2.4.19-rc1/drivers/block/ll_rw_blk.c Thu Jul 4 02:01:16 2002
+++ linux-akpm/drivers/block/ll_rw_blk.c Fri Jul 12 15:28:42 2002
@@ -359,6 +359,7 @@ int blk_grow_request_list(request_queue_
q->batch_requests = q->nr_requests / 4;
if (q->batch_requests > 32)
q->batch_requests = 32;
+ q->batch_requests = 1;
spin_unlock_irqrestore(&io_request_lock, flags);
return q->nr_requests;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/