If the above is what has been observed in the real world, then there
would be no problem. Lets say I have 32 tags pending, all writes. Now I
issue a read. Then I go ahead and through my writes at the drive,
basically keeping it at 32 tags all the time. When will this read
complete? The answer is, well it might not within any reasonable time,
because the drive happily starves the read to get the best write
throughput.
The size of the dirty cache back log, or whatever you want to call it,
does not matter _at all_. I don't know why both you and Matt keep
bringing that point up. The 'back log' is just that, it will be
processed in due time. If a read comes in, the io scheduler will decide
it's the most important thing on earth. So I may have 1 gig of dirty
cache waiting to be flushed to disk, that _does not_ mean that the read
that now comes in has to wait for the 1 gig to be flushed first.
> Now consider the read case. I maintain that any reasonable drive will
> *always* outperform the OS's transaction reordering/elevator algorithms
> for seek reduction. This is the whole point of having high tag depths.
Well given that the drive has intimate knowledge of itself, then yes of
course it is the only one that can order any number of pending requests
most optimally. So the drive might provide the best layout of requests
when it comes to total number of seek time spent, and throughput. But
often at the cost of increased (some times much, see the trivial
examples given) latency.
However, I maintain that going beyond any reasonable number of tags for
a standard drive is *stupid*. The Linux io scheduler gets very good
performance without any queueing at all. Going from 4 to 64 tags gets
you very very little increase in performance, if any at all.
> In all I/O studies that have been performed todate, reads far outnumber
> writes *unless* you are creating an ISO image on your disk. In my opinion
Well it's my experience that it's pretty balanced, at least for my own
workload. atime updates and compiles etc put a nice load on writes.
> it is much more important to optimize for the more common, concurrent
> read case, than it is for the sequential write case with intermittent
> reads. Of course, you can fix the latter case too without any change to
> the driver's queue depth as outlined above. Why not have your cake and
> eat it too?
If you care to show me this cake, I'd be happy to devour it. I see
nothing even resembling a solution to this problem in your email, except
from you above saying I should ignore it and optimize for 'the common'
concurrent read case.
It's pointless to argue that tagging is oh so great and always
outperforms the os io scheduler, and that we should just use 253 tags
because the drive knwos best, when several examples have shown that this
is _not the case_.
-- Jens Axboe- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/