OK, this is because the early flush doesn't quit when load picks up again.
Measuring only the io backlog, as I do now, isn't adequate for telling the
difference between load initiated by the flush itself and other load, such as
cpu bound process proceding to read another file, so that's why the flush
doesn't stop flushing when other IO starts happening. This has to be fixed.
In the mean time, you could try this simple tweak: just set the lower bound,
currently 1/10th second a little higher:
- unsigned check_interval = HZ/10, ...
+ unsigned check_interval = HZ/5, ...
This may be enough to bridge the little pauses in the the compiler's disk
access pattern so the flush isn't triggered. (This is not by any means a
nice solution.) If you set check_interval to HZ*5, you *should* get exactly
the old behaviour, I'd be very interested to hear if you do.
Also, could you do your compiles with 'time' so you can quantify the results?
> 2. Loading programs when writing activity is occuring (even light activity
> like during the compile) is noticable slower, actually any reading from
> disk is.
Hmm, let me think why that may be. The loader doesn't actually read the
program into memory, it just maps it and lets the pages fault in as they're
called for. So if readahead isn't perfect (it isn't) the io backlog may drop
to 0 briefly just as the kflush decides to sample it, and it initiates a
flush. This flush cleans the whole dirty list out, stealing bandwidth from
the reads.
> I also ran my simple ftp test that produced the symptom I reported earlier.
> I transferred a 750MB file via FTP, and with your patch sure enough disk
> writing started almost immediately, but it still didn't seem to write
> enough data to disk to keep up with the transfer so at approximately the
> 200MB mark the old behavior still kicked in as it went into full flush
> mode, during the time network activity halted, just like before. The big
> difference with the patch and without is that the patched kernel never
> seems to balance out, without the patch once the initial burst is done you
> get a nice stream of data from the network to disk with the disk staying
> moderately active. With the patch the disk varies from barely active
> moderate to heavy and back, during the heavy the network transfer always
> pauses (although very briefly).
>
> Just my observations, you asked for comments.
Yes, I have to refine this. The inner flush loop has to know how many io
submissions are happening, from which it can subtract its own submissions and
know sombebody else is submitting IO, at which point it can fall back to the
good old 5 second buffer age limit. False positives from kflush are handled
as a fringe benefit, and flush_dirty_buffers won't do extra writeout. This
is easy and cheap.
I could get a lot fancier than this and caculate IO load averages, but I'd
only do that after mining out the simple possibilities. I'll probably have
something new for you to try tomorrow, if you're willing. By the way, I'm
not addressing your fundamental problem, that's Rik's job ;-). In fact, I
define success in this effort by the extent to which I don't affect behaviour
under load.
Oh, and I'd better finish configuring my kernel and boot my laptop with this,
i.e., eat my own dogfood ;-)
-- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/