the patch eliminating the frozen_lock which went into 2.5.22 requires
macros for prepare_arch_schedule(), etc... On IA64 I think that we need
to use the "complex" macros, otherwise there is a potential deadlock.
Unfortunately the "complex" macros seem to have the problem that the "prev"
task can be stolen right before the context switch. In your description to
the patch you wrote:
> architectures that need to unlock the runqueue before doing the switch can
> do the following:
>
> #define prepare_arch_schedule(prev) task_lock(prev)
> #define finish_arch_schedule(prev) task_unlock(prev)
> #define prepare_arch_switch(rq) spin_unlock(&(rq)->lock)
> #define finish_arch_switch(rq) __sti()
>
> this way the task-lock makes sure that a task is not scheduled on some
> other CPU before the switch-out finishes, but the runqueue lock is
> dropped.
Suppose we have
cpu1: idle1
cpu2: prev2 -> next2 (in the switch)
I don't understand how task_lock(prev2) done on cpu2 can prevent cpu1 to
schedule prev2, which it stole after the RQ#2 lock release. It will just
try to task_lock(idle1), which will be successfull.
The problems I have are with a small and ugly testprogram which generates
128 threads which all just do system("exit"); Running this program
continuously sooner or later leads to attempts to switch to tasks which
have already exited. Inserting a small udelay after prepare_arch_switch
reveals the problem earlier.
I also tried inserting "while (spin_is_locked(&(next)->alloc_lock));"
but it didn't help.
Any idea how to fix this problem?
Thanks a lot in advance,
best regards,
Erich
PS: Below is the relevant part of schedule():
prepare_arch_schedule(prev);
...
if (likely(prev != next)) {
rq->nr_switches++;
rq->curr = next;
prepare_arch_switch(rq);
//while (spin_is_locked(&(next)->alloc_lock));
prev = context_switch(prev, next);
barrier();
rq = this_rq();
finish_arch_switch(rq);
} else
spin_unlock_irq(&rq->lock);
finish_arch_schedule(prev);
PPS: The potential deadlock on IA64 for the "simple" macros comes from
the fact that we sometimes wrap the context counter and need to
read_lock(tasklist_lock). Doing this without releasing the RQ can lead
to a deadlock with sys_wait4, where a write_lock(&tasklist_lock) is
needed, inside of which a __wake_up needs the RQ lock :-(
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/