Re: [Lse-tech] Re: NUMA scheduler (was: 2.5 merge candidate list 1.5)

Erich Focht (efocht@ess.nec.de)
Tue, 29 Oct 2002 01:07:12 +0100

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Rob Landley: "Re: 2.5.44: what's .tmp_export-objs for?"
Previous message: Martin J. Bligh: "Re: NUMA scheduler (was: 2.5 merge candidate list 1.5)"

On Monday 28 October 2002 18:35, Martin J. Bligh wrote:
> > I'm puzzled about the initial load balancing impact and have to think
> > about the results I've seen from you so far... In the environments I am
> > used to, the frequency of exec syscalls is rather low, therefore I didn't
> > care too much about the sched_balance_exec performance and prefered to
> > try harder to achieve good distribution across the nodes.
>
> OK, but take a look at Michael's second patch. It still looks at
> nr_running on every queue in the system (with some slightly strange
> code to make a rotating choice on nodes on the case of equality),
> so should still be able to make the best decision .... *but* it
> seems to be much cheaper to execute. Not sure why at this point,
> given the last results I sent you last night ;-)

Yes, I like it! I needed some time to understand that the per_cpu
variables can spread the execed tasks acros the nodes as well as the
atomic sched_node. Sure, I'd like to select the least loaded node instead
of the least loaded CPU. It can well be that you just have created on a
node 10 threads (by fork, therefore still on their original CPU), and have
an idle CPU in the same node (which didn't steal yet the newly created
tasks). Suppose your instant load looks like this:
node 0: cpu0: 1 , cpu1: 1, cpu2: 1, cpu3: 1
node 1: cpu4:10 , cpu5: 0, cpu6: 1, cpu7: 1

If you exec on cpu0 before cpu5 managed to steal something from cpu4,
you'll aim for cpu5. This would just increase the node-imbalance and
force more of the threads on cpu4 to move to node0, which is maybe bad
for them. Just an example... If you start considering non-trivial
cpus_allowed masks, you might get more of these cases.

We could take this as a design target for the initial load balancer
and keep the fastest version we currently have for the benchmarks
we currently use (Michael's).

Regards,
Erich

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Rob Landley: "Re: 2.5.44: what's .tmp_export-objs for?"
Previous message: Martin J. Bligh: "Re: NUMA scheduler (was: 2.5 merge candidate list 1.5)"