Re: userspace irq balancer

Andrea Arcangeli (andrea@suse.de)
Tue, 27 May 2003 03:26:17 +0200


On Mon, May 26, 2003 at 06:13:09PM -0700, David S. Miller wrote:
> From: Andrea Arcangeli <andrea@suse.de>
> Date: Tue, 27 May 2003 03:09:03 +0200
>
> I'm not going to implement the above in 2.4, that sounds a 2.5 thing,
>
> Then your 2.4.x load balancing is buggy for networking.

it's not buggy, it's less performant than what 2.5 could be, it's not a
matter of bugs it's a matter of performance, this is an heuristic, it
can very well do the wrong thing sometime.

What I care about is if it is that it is less performant than any other
2.4 and any current 2.5. That is non obvious to me. The approximation
will never be as good as the perfect accounting, but it's still better
than no approximation at all IMHO, and for sure I don't want to waste
totally idle cpus on a 32way either.

> You simply cannot ignore this issue and act as if it
> does not exist and does not have huge consequence for IRQ
> load balancing decisions.

The only thing the ksoftirqd check can do is to generate less
conseguences now.

> but my point is that by just ignoring ksoftirqd in the idle selection
> should avoid the biggest of the NAPI issues.
>
> On a properly functioning system, ksoftirqd should not be running.

I argue with that, NAPI needs to poll somehow, either you hook into the
kernel slowing down every single schedule, or you need to offload this
work to a kernel thread.

The other cases of ksoftirqd are meant to avoid the 1msec latency shall
the cpu go idle or shall the irqs arrive faster than the network stack
can process the data. They're all legitimate usages IMHO. And we should
be fine to keep irqs running togeter with softirq, that's the point of
this new check.

> > But deciding how to intepret these measurements and what to do in
> > response is a userlevel policy decision. This also coincides with
> > how cpufreq works.
>
> you mean you can have slightly different modes selectable by sysctl
> right?
>
> One posibility. Another is a descriptor describing things like
> how much to weight hardware vs. software IRQ load, vs. process
> load etc.

this certainly sounds good to me.

>
> or do you really want to generate a reschedule per second
>
> No, nothing like this.

ok.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/