please try again on top of -aa, and I've to specify this : benchmarked
in a way that can be trusted and compared, so we can make some use of
this information. This mean with -18pre2aa2 alone and only -preempt on
top of -18pre2aa2.
NOTE: I'd be glad to say "preempt rules", "go preempt", "preempt is
cool" like you as soon as I have some proof it makes _THE_ difference
and that it is worth the mess on SMP (per-cpu, RCU locking, etc...), not
to tell about the other architectures, but at the moment there's only a
number of people running xmms on mainline with the broken scheduling
points and those numbers that cannot be compared in any sane way. I
repeat, I'm not against preempt, I just want to get some real world
proof and measurement and at the moment I think preempt doesn't worth,
but if you give us _any_ real world proof that a that low mean latency
of the order of 10/100 usec matters to get most of the cpu cycles out of
the cpu during trashing (as it could be possible to speculate from the
broken benchmark posted in this thread), and that there's no real
regression with the additional branches in the spin_unlock in 100%
system load, I may change my mind (an of course, only for anything above
2.5, and still I think there are more interesting optimizations to do
rahter than requiring everybody spending lots of time fixing drivers,
auditing, fixing smp, rcu locking etc... but ok if it is obviously good
thing [aka no real regression and only benefits long term] it would be
ok to do it early as well). I'm not particularly worried about the
preempt lock around the per-cpu stuff, that's a cacheline local, and it
could go into the schedule_data like I did for the rcu per-cpu
variables, so they're at zero cacheline cost (RCU_poll patch costs now
only 1 instruction per schedule and zero memory overhead [an incl
instruction precisely]).
> > Benchmarks are well and good, but until we have a solid explanation for
> > the throughput changes which people are seeing, it's risky to claim
> > that there is a general benefit.
>
> I have an explanation. We can schedule quicker off a woken task. When
> an event occurs that allows an I/O-blocked task to run, its time-to-run
> is shorter. Same event/response improvement that helps interactivity.
That's a nice speculation out of a broken comparison, it may be really
the case, there's no way to be sure, before that sum of usec you should
also sum the seconds spent walking the tasklist in the non O(1)
scheduler.
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/