It seems like we may be talking about two different things.
Our 'priority queue' implementation uses almost the same
goodness function as the current scheduler. The main
difference between our 'priority queue' scheduler and the
current scheduler is the structure of the runqueue. We
break up the single runqueue into a set of priority based
runqueues. The 'goodness' of a task determines what sub-queue
a task is placed in. Tasks with higher goodness values
are placed in higher priority queues than tasks with lower
goodness values. This allows us to limit the scan of runnable
tasks to the highest priority sub-queue, as opposed to the
entire runquue. When scanning the highest priority sub-queue
we use almost the same goodness function as that which is
used today (it does take CPU affinity into account). In fact,
if we used the same goodness function I'm pretty sure we
would exactly match the behavior of the existing scheduler.
Hopefully, Hubertus Franke will speak up and provide more
details, as he is much more familiar with this implementation
than I am.
In any case, I think we have almost reached the conclusion
that our priority queue implementation may not be acceptable
due to the extra overhead in low thread counts.
> one issue regarding multiqueues is the ability of interactive processes to
> preempt other, lower priority processes on other CPUs. These sort of
> things tend to happen while using X. In a system where process priorities
> are different, scheduling decisions cannot be localized per-CPU
> (unfortunately), and 'the right to run' as such is a global thing.
Agreed. This is why our multi-queue scheduler always attempts
to make global decisions. It will preempt lower priority tasks
on other CPUs. This implementation tends to make 'more
localized' scheduling decisions as contention on the runqueue
locks increase. However, at this point one could argue that
we have moved away from a 'realistic' low task count system load.
> lmbench's lat_ctx for example, and other tools in lmbench trigger various
> scheduler workloads as well.
Thanks, I'll add these to our list.
-- Mike Kravetz mkravetz@sequent.com IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/