Re: RFC on io-stalls patch

Andrea Arcangeli (andrea@suse.de)
Tue, 15 Jul 2003 11:48:26 +0200


On Tue, Jul 15, 2003 at 10:28:50AM +0200, Jens Axboe wrote:
> no_load:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.21 3 133 197.0 0.0 0.0 1.00
> 2.4.22-pre5 2 134 196.3 0.0 0.0 1.00
> 2.4.22-pre5-axboe 3 133 196.2 0.0 0.0 1.00
> ctar_load:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.21 3 190 140.5 15.0 15.8 1.43
> 2.4.22-pre5 3 235 114.0 25.0 22.1 1.75
> 2.4.22-pre5-axboe 3 194 138.1 19.7 20.6 1.46
>
> 2.4.22-pre5-axboe is way better than 2.4.21, look at the loads
> completed.
>
> xtar_load:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.21 3 287 93.0 14.0 15.3 2.16
> 2.4.22-pre5 3 309 86.4 15.0 14.9 2.31
> 2.4.22-pre5-axboe 3 249 107.2 11.3 14.1 1.87
>
> 2.4.21 beats 2.4.22-pre5, not too surprising and expected, and not
> terribly interesting either.
>
> io_load:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.21 3 543 49.7 100.4 19.0 4.08
> 2.4.22-pre5 3 637 42.5 120.2 18.5 4.75
> 2.4.22-pre5-axboe 3 540 50.0 103.0 18.1 4.06
>
> 2.4.22-pre5-axboe completes the most loads here per time unit.
>
> io_other:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.21 3 581 46.5 111.3 19.1 4.37
> 2.4.22-pre5 3 576 47.2 107.7 19.8 4.30
> 2.4.22-pre5-axboe 3 452 59.7 85.3 19.5 3.40
>
> 2.4.22-pre5 is again the slowest of the lot when it comes to
> workloads/time, 2.4.22-pre5 is again the fastest and completes the work
> load in the shortest time.
>
> read_load:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.21 3 151 180.1 8.3 9.3 1.14
> 2.4.22-pre5 3 150 181.3 8.1 9.3 1.12
> 2.4.22-pre5-axboe 3 152 178.9 8.2 9.9 1.14
>
> Pretty equal.

io_other and xtar_load aren't exactly equal. As for elevator-lowlatency
alone I'm not sure why it doesn't show big benefits in the above
workloads. It was very noticeable in my tests where I normally counted
the lines per second in `find /` or `time ls` (from comparisons with
contest with previous kernels w/o elevator-lowlatency, it looked like it
made a difference too and I've got some positive feedback). Maybe it's
because we enlarged the queue size to 4M in this version, in the
original patches where I run most of the latency tests it was 2M but I
was concerned that it could be too small.

If it doesn't take too much time, I would be curious what happens if
you change:

MAX_QUEUE_SECTORS (4 << (20 - 9))

to

MAX_QUEUE_SECTORS (2 << (20 - 9))

(it's up to you if to apply your patch or not along with this change, it
should make a noticeable difference either ways)

Obviously, the smaller the queue, the higher the fairness and the lower
the latency, but the smaller the pipelining will be in the I/O queue, so
it'll be less guaranteed to keep the spindle constantly working, not an
issue for all low end devices though. Ideally it should be tunable per-device.
on a 50mbyte/sec array 2M didn't show any degradation either during
contigous I/O, but I didn't run any test on faster storages, so I felt
safer to use 4M in the latest versions, knowing latency would be
slightly hurted.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/