Re: Context switch times

Mika Liljeberg (Mika.Liljeberg@welho.com)
Sun, 07 Oct 2001 12:57:29 +0300


Alan Cox wrote:
> This isnt idle speculation - I've done some minimal playing with this but
> my initial re-implementation didnt handle SMP at all and I am still not 100%
> sure how to resolve SMP or how SMP will improve out of the current cunning
> plan.

Here's some idle speculation on SMP to top it off. :) I tend to think
that the load balancing between CPUs should be a completely separate
algorithim and should not necessarily be run at every schedule(). The
idea is to compeletely decouple the problem of scheduling a single CPU
between tasks and the problem of load balancing between the CPUs, making
each problem simpler to solve.

Consider the following basic rules:

A) When a new task comes along, pick the "least loaded" CPU and lock the
new task onto that.
B) Whenever the load imbalance between least loaded CPU and most loaded
CPU becomes too great, move one or more tasks from most loaded CPU to
the least loaded CPU.

The rules themselves should be self-explanatory: A provides initial load
balancing, while B tries to keep the balance (with a sensible hysteresis
to avoid thrashing). However, there are a few minor details to solve:

1) How to determine the load of a CPU? If we can quantify this clearly,
we can easily set a hysteresis level to trigger load balancing between
two CPUs.
2) When and how often to check for load imbalance?
3) How to select the task(s) that should be moved between two CPUs to
correct an imbalance?

For problems 1 and 2 I propose the following solution: Insert the the
load balancing routine itself as a (fake) task on each CPU and run it
when the CPU gets around to it. The load balancer should behave almost
like a CPU-bound task, scheduled on the lowest priority level with other
runnable tasks. The last bit is important: the load balancer should not
be allowed to starve but should be invoked approximately once every
"full rotation" of the scheduler.

With the above it is easy to estimate the load of a CPU. We can simply
use the elapsed time between two invokations of the load balancer task.
When the load balancer task of a particular CPU gets run, it chalks up
the elapsed time on a score board somewhere, and checks whether there is
a significant imbalance between itself and some other CPU. If there is,
it commences to move some tasks between itself and the other CPU (note
rule B, though, it should be enough to mess with just two CPU queues at
a time to minimize balancing and locking overhead).

Problem 3 is tricky. Basically, there should be a cost/benefit function
F(tasks to move) that should be minimized. Ideally F(task_i), the
cost/benefit of moving a single task, would be calculated as a byproduct
of the CPU scheduler algorithm.

F(task_i) might be function of elapsed time since task_i was last
scheduled and the average time slice used by task_i, to account for the
probable cache hit. This would leave it up to the load balancer to move
as many lowest cost tasks to a new CPU as is needed to correct the
imbalance (average time slices used by each task would be needed in
order to make this decision).

Naturally, some additional rules might be necessary to make a task
eligible for moving, e.g., never move the only/last CPU bound task to
another CPU. In addition, it might actually make sense to move at most
one task at each invocation of the load balancer, to further reduce the
probability of thrashing. The load would still converge fairly quickly
towards a balanced state. It would also scale fairly well with the
number of CPUs.

How does that sound?

MikaL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/