> Before we tweak this too much, how about using the global load
> average for this? I can envisage a situation where we have two
> nodes with 8 tasks per node, one with 12 tasks, and one with four.
> You really don't want the ones with 8 tasks pulling stuff from
> the 12 ... only for the least loaded node to start pulling stuff
> later.
Hmmm, yet another idea from the old NUMA scheduler coming back,
therefore it has my full support ;-). Though we can't do it as I did
it there: in the old NUMA approach every cross-node steal was delayed,
only 1-2ms if the own node was underloaded, a lot more if the own
node's load was average or above average. We might need to finally
steal something even if we're having "above average" load, because the
cpu mask of the tasks on the overloaded node might only allow them to
migrate to us... But this is also a special case which we should solve
later.
> What about if we take the global load average, and multiply by
> num_cpus_on_this_node / num_cpus_globally ... that'll give us
> roughly what we should have on this node. If we're significantly
> out underloaded compared to that, we start pulling stuff from
> the busiest node? And you get the damping over time for free.
Patch 05 is going towards this direction but the constants I chose
were very aggressive. I'll update the whole set of patches and try to
send out something today.
Best regards,
Erich
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/