That does sound like the same behavior I was seeing with lat_ctx. Like
I said in my previous note, the scheduler does try to take CPU affinity
into account. reschedule_idle() does a pretty good job of determining
what CPU a task should run on. In the case of lat_ctx (and I believe
your use of gzip), the 'fast path' in reschedule_idle() is taken because
the CPU associated with the awakened task is idle. However, before
schedule() is run on the 'target' CPU, schedule() is run on another
CPU and the task is scheduled there.
The root cause of this situation is the delay between the time
reschedule_idle() determines what is the best CPU a task should run
on, and the time schedule() is actually ran on that CPU.
I have toyed with the idea of 'temporarily binding' a task to a CPU
for the duration of the delay between reschedule_idle() and schedule().
It would go something like this,
- Define a new field in the task structure 'saved_cpus_allowed'.
With a little collapsing of existing fields, there is room to put
this on the same cache line as 'cpus_allowed'.
- In reschedule_idle() if we determine that the best CPU for a task
is the CPU it is associated with (p->processor), then temporarily
bind the task to that CPU. The task is temporarily bound to the
CPU by overwriting the 'cpus_allowed' field such that the task can
only be scheduled on the target CPU. Of course, the original
value of 'cpus_allowed' is saved in 'saved_cpus_allowed'.
- In schedule(), the loop which examines all tasks on the runqueue
will restore the value of 'cpus_allowed'.
This would preserve the 'best CPU' decision made by reschedule_idle().
On the down side, 'temporarily bound' tasks could not be scheduled
until schedule() is run on their associated CPUs. This could potentially
waste CPU cycles, and delay context switches. In addition, it is
quite possible that while a task is 'temporarily bound' the state of
the system could change in such a way that the best CPU is no longer
best.
There appears to be a classic tradeoff between CPU affinity and
context switch time.
Comments?
-- Mike Kravetz mkravetz@sequent.com IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/