If SCHED_BATCH load does not affect the load average, then it will *not*
confuse sendmail - because the box is not really loaded, SCHED_BATCH
will only run if sendmail wouldn't :)
I think it's great that SCHED_BATCH doesn't show up in loadavg - it is
really confusing to look at load statistics for boxes which have these
CPU intensive batch jobs running - the load average is perhaps
constantly between 2.1 and 2.2, when the "real" load on the box would
cause it to be between .1 and .2
Idle-time is what you pay for but do not get ;) Having CPU hogs
running is a nice way of getting the last penny out of the investment,
but I have many boxes where I don't do it, simply because it would
render the load statistics useless.
>
> I think this will confuse atd too, which is an obvious candidate
> for the batch scheduler; it may end up starting all jobs which
> sit in it's "batch" queue.
True. I am inclined to saying that any batch system that doesn't keep
track of it's own jobs but only cares about the load average, is flawed.
Load average is "guidance", it is a heuristic, it is not something one
should use as the sole measure when deciding when/where to spawn jobs.
Stuff like that only works in theory...
But atd is in use everywhere, and it does use the load as the only
metric (which is, by the way, why it doesn't start more than one job
per minute, because the loadavg needs time to rise).
Wouldn't it be pretty simple to just make atd have a hard limit on how
many concurrent at jobs (eventually on a per-user basis) it would start?
Unless you really go to extremes (thousands of jobs), the performance of
ten "concurrently" running SCHED_BATCH jobs and the same ten jobs beeing
launched sequentially by atd, should be fairly similar - given the huge
time-slices given by the scheduler to these jobs.
>
> I think a load-average calculation scheme like this would be better:
>
> oldload: is the load average calculated the old way
> batchload: is the load average calculated only from the batch scheduler
> numcpus: number of cpus...
>
> newload(){
> if (oldload > numcpus) return oldload;
> if ((oldload+batchload) > numcpus) return numcpus;
> return (oldload+batchload)
> }
>
> So the batch processes would show the CPUs maxed out, but would not show
> up as overload in the load average. (and you could run
> "atd -l <numcpus - 0.3>")
>
Hmmm... Such a hack might work around the shortcomings in the atd
scheduling algorithm.
I don't like it. It adds yet another level of obfuscation to loadavg.
I bet you can do the proper changes to atd in the same amount of code,
keeping the kernel clean and fixing the problem where it really is.
Just my 0.02 Euro,
-- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob Østergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/