[patch] O(1) scheduler, -I0

Ingo Molnar (mingo@elte.hu)
Tue, 15 Jan 2002 17:04:29 +0100 (CET)


the -I0 patch is available at:

http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.5.2-final-I0.patch

stock 2.5.2 includes a 'interactivity estimator' method that includes most
of the things i think to be important for good interactivity:

- sleep time based priority boost/penalty.

- constant frequency runqueue sampling instead of recalculation/switch
based runqueue sampling.

- interactivity based runqueue insertion on timeslice expire.

I'm very happy about the 2.5.2 solution, it's simpler than the one i used
in -H7 - good work Davide!

There are a number of problems in 2.5.2 that need fixing though:

- renicing is broken - it does not work at all, neither up nor down, for
CPU-bound tasks. Renicing fell victim to the attempt to penalize CPU
hogs as much as possible: every CPU-bound task reaches the lowest
priority level and stays there. This also makes kernel compile times
suffer.

- RT scheduling is broken.

- the sleep average is hidden in p->prio, which makes it harder to
recover and use the true interactiveness of the task.

- the runqueue is sampled at a frequency of 20 HZ, which can misdetect
periodic user tasks that somehow correlate with 20 HZ.

I've fixed these problems/bugs by taking some of the -H7 solutions:

- introducing p->sleep_avg, which is updated in a lightweight way. No
more 'history slots'. A single counter, updated in a very simple way.

- limiting the bonus/penalty range according to nice levels - a task can
at most get a 5 priority levels penalty over the default level, in
stock 2.5.2 it can get to the nice +19 level after a few seconds
runtime. Nice levels work again.

- introducing HZ frequency runqueue sampling. Also the MAX_SLEEP_AVG
constant tells us how long into the past we are looking. This is 2
seconds right now.

- separating the RT timeslice code in scheduler_tick(), we used to break
the RT case way too often, now we can hack the SCHED_OTHER code without
having to touch the RT part.

- plus the patch also includes all the fixes and improvements from the
-H7 patch.

i've also cleaned up and commented the priority management code and have
introduced the prio_effective(p) inline function.

i've tested the patch on UP and SMP boxes. I've measured high-load
interactivity to be on equivalent levels with that of stock 2.5.2.

Bug reports, comments, suggestions welcome.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/