Re: [PATCH] 2.4.18 scheduler bugs

Ingo Molnar (mingo@elte.hu)
Sat, 16 Mar 2002 10:23:21 +0100 (CET)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Zwane Mwaikambo: "Re: Aryojan again Kernel panic at boot time <- i am stupid?"
Previous message: Alexander Viro: "[PATCH] Re: 2.5.7-pre2 -- kernel.o(.data+0x300): undefined reference"
In reply to: Joe Korty: "Re: [PATCH] 2.4.18 scheduler bugs"

On Fri, 15 Mar 2002, Joe Korty wrote:

> >> It is an idle cpu that is spending those 200 cycles.
> >
> > wrong. When it's woken up it's *not* an idle CPU anymore, and it's the
> > freshly woken up task that is going to execute 200 cycles later...
>
> I have to disagree. It is the woken up task *running on the otherwise
> idle CPU* that burns up 200 cycles at the tail.

what do you disagree with? It's a fact that any overhead added to the
idle-wakeup path is not 'idle time' but adds latency (overhead) to the
freshly woken up task's runtime.

> A cpu is wasting, say, 5,000,000 cycles (1GHz/100/2, or 1/2 tick) in hlt
> when it could have been doing work. Why worry about an alternative
> wakeup path that burns up 200-400 cycles of that on the otherwise idling
> cpu, even if it is at the tail.

it's *not* idle time, it's naive to think that "it's in the idle task, so
it must be idle time". Latency added to the idle-wakeup shows up as direct
overhead in the woken up task. Lets look at an example, CPU0 is waking up
bdflush that will run on CPU1, CPU1 is idle currently:

CPU0 CPU1
[wakeup bdflush]
[send IPI]
[... IPI delivery latency ...]
[IRQ entry/exit]
[idle thread context switches]
[bdflush runs on CPU1]

contrasted with the idle=poll situation:

CPU0 CPU1
[wakeup bdflush]
[set need_resched]
[idle thread context switches]
[bdflush runs on CPU1]

as you can see, the overhead of 'send IPI', 'IPI delivery' and 'IRQ
entry/exit' delays bdflush. Even assuming that sending and receiving an
IPI is as fast as setting & detecting need_resched [which it theoretically
can be], the IPI variant still has the cost of IRQ entry (and exit), which
is 200 cycles only optimistically, it's more like thousands of cycles on a
GHZ box.

[ as mentioned before, the default idle method has power saving advantages
(even if it's not HLT, some of the better methods do save considerable
amount of power), but idle=poll is clearly an option for the truly
performance-sensitive applications. ]

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Zwane Mwaikambo: "Re: Aryojan again Kernel panic at boot time <- i am stupid?"
Previous message: Alexander Viro: "[PATCH] Re: 2.5.7-pre2 -- kernel.o(.data+0x300): undefined reference"
In reply to: Joe Korty: "Re: [PATCH] 2.4.18 scheduler bugs"