Now this is true _whatever_ we do.
We all agree that we have to cap the thing somewhere, no?
Which means that we may be cutting off at a point where if we didn't cut
off, we could have merged better etc. So that problem we have regardless
of whether we could bhäs submitted to ll_rw_block() or we count requests
submitted to the actual IO layer.
The advantage off cutting off on a per-request basis is:
- doing contiguous IO is "almost free" on most hardware today. So it's ok
to allow a lot more IO if it's contiguous - because the cost of doing
one request (even if large) is usually much lower than the cost of
doing two (smaller) requests.
- What we really want to do is to have a sliding window of active
requests - enough to get reasonable elevator behaviour, and small
enough to get reasonable latency. Again, for both of these, the
"request" is the right entity - latency comes mostly from seeks (ie
between request boundaries), and similarly the elevator obviously works
on request boundaries too, not on "bh" boundaries.
Also, I doubt it makes all that much sense to change the number of queue
entries based on memory size. It probably makes more sense to scale the
number of requests by disk speed, for example.
[ Although there's almost certainly some amount of correlation - if you
have 2GB of RAM, you probably have fast disks too. But not the linear
function that we currently have. ]
> This situation only gets worse
> as more and more tasks find that they need to clean buffers in order to
> allocate memory, and start throwing more and more buffers from different
> tasks into the io queue (think what happens when two tasks are walking
> the dirty buffer lists locking buffers and then attempting to allocate a
> request which then delays one of the tasks).
Note that this really is a sitation we've had forever.
There are good reasons to believe that we should do a better job of
sorting the IO requests at a higher level in _addition_ to the low-level
elevator. Filesystems should strive to allocate blocks contiguously etc,
and we should strive to keep (and write out) the dirty lists etc in a
somewhat cronological order to take advantage of usually contiguous writes
(and maybe actively sort the dirty queue on writes that are _not_ going to
have good locality, like swapping).
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/