rdtsc could do it very well, irqs and softirqs can't be rescheduled so
you can tick measure how long you take in each cpu, same goes for each
task before migrating to another cpu (I'm only assuming this is SMP and
not AMT, still if the difference between cpu frequency among cpus isn't
huge it could stil work with AMT, a multiplicator could be applied with
AMT). This "non idle" load could be accounted in a per-cpu array.
I'm not going to implement the above in 2.4, that sounds a 2.5 thing,
but my point is that by just ignoring ksoftirqd in the idle selection
should avoid the biggest of the NAPI issues. I'm approximating, i.e.
better than nothing approch (either that or nothing). I never claimed
that to be a final golden algorihm, just obviously better than the
total-trashing one and even w/o the ksoftirqd and HT last bits, numbers
confirmed that.
And for 2.5 there are many doors open for further optimizations of
course.
> But deciding how to intepret these measurements and what to do in
> response is a userlevel policy decision. This also coincides with
> how cpufreq works.
you mean you can have slightly different modes selectable by sysctl
right? or do you really want to generate a reschedule per second with
tlb flush and microkernel API between user and kernel in turn total
waste of resources just to avoid admitting irq balancing belongs to the
kernel?
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/