Actually I thought your "queue" was "head of queue" and that 5,6,7,8 and 9
were reads....
If the queue contains, say:
(head) R1 R2 R3 W1 W2 W3 W4 W5 W6 W7
Then a new R4 will be inserted between W6 and W7. So if R5 is mergeable
with R4 there is still plenty of time for that.
> > > > > However I think even read-latency is more a workarond to a
> > > > > problem in
> > > > > the I/O queue dimensions.
> > > >
> > > > The problem is the 2.4 algorithm. If a read is not mergeable or
> > > > insertable it is placed at the tail of the queue. Which is the
> > > > worst possible place it can be put because applications wait on
> > > > reads, not on writes.
> > >
> > > O_SYNC/-osync waits on writes too, so are you saying writes must go to
> > > the head because of that?
> >
> > It has been discussed: boost a request to head-of-queue when a thread
> > starts to wait on a buffer/page which is inside that request.
> >
> > But we don't care about synchronous writes. As long as we don't
> > starve them out completely, optimise the (vastly more) common case.
>
> yes, it should be worthwhile to potentially decrease a little the global
> throughput to increase significantly the read latency, I'm not against
> that, but before I would care about that I prefer to get a limit on the
> size of the queue in bytes, not in requests,
Really, it should be in terms of "time". If you assume 6 msec seek and
30 mbyte/sec bandwidth, the crossover is a 120 kbyte I/O. Not that I'm
sure this means anything interesting ;) But the lesson is that the
size of a request isn't very important.
> actually it's probably much worse tha a 10 times ratio since the writer
> is going to use big requests, while the reader is probably seeking with
> <=4k requests.
>
Yup. This is one case where improving latency improves throughput,
if there's computational work to be done.
2.5 (and read-latency) sort-of solve these problems by creating a
massive seekstorm when there are competing reads and writes. It's
a pretty sad solution really.
Better would be to perform those reads and writes in nice big batches.
That's easy for the writes, but for reads we need to wait for the
application to submit another one. That means actually deliberately
leaving the disk head idle for a few milliseconds in the anticipation
that the application will submit another nearby read. This is called
"anticipatory scheduling" and has been shown to provide 20%-70%
performance boost in web serving workloads. It just makes heaps of
sense to me and I'd love to see it in Linux...
See http://www.cs.ucsd.edu/sosp01/papers/iyer.pdf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/