> > well, it needs to settle down a bit more, we are technically in a
> > codefreeze :-)
>
> We're in feature freeze. Not sure whether fixing the scheduler for one
> type of hardware supported by Linux is a feature 8)
>
> Anyway, patch 1 should certainly merged ASAP, for the other we can wait
> a bit more to settle, but I don't think it's really worth the wait.
agreed, the patch is unintrusive, but by settling down i mean things like
this:
+/* XXX(hch): this should go into a struct sched_node_data */
should be decided one way or another.
i'm also not quite happy about the conceptual background of
rq->nr_balanced. This load-balancing rate-limit is arbitrary and not
neutral at all. The way this should be done is to move the inter-node
balancing conditional out of load_balance(), and only trigger it from the
timer interrupt, with a given rate. On basically all NUMA hardware i
suspect it's much better to do inter-node balancing only on a very slow
scale. Making it dependnet on an arbitrary portion of the idle-CPU
rebalancing act makes the frequency of inter-node rebalancing almost
arbitrarily high.
ie. there are two basic types of rebalancing acts in multiprocessor
environments: 'synchronous balancing' and 'asynchronous balancing'.
Synchronous balancing is done whenever a CPU runs idle - this can happen
at a very high rate, so it needs to be low overhead and unintrusive. This
was already so when i did the SMP balancer. The asynchronous blancing
component (currently directly triggered from every CPU's own timer
interrupt), has a fixed frequency, and thus can be almost arbitrarily
complex. It's the one that is aware of the global scheduling picture. For
NUMA i'd suggest two asynchronous frequencies: one intra-node frequency,
and an inter-node frequency - configured by the architecture and roughly
in the same proportion to each other as cachemiss latencies.
(this all means that unless there's empirical data showing the opposite,
->nr_balanced can be removed completely, and fixed frequency balancing can
be done from the timer tick. This should further simplify the patch.)
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/