Absolutely right.
It's probably worth fixing the "sequential accesses of < pagesize count
as 'active'" problem too, but the real issue is that if you get into a
situation with _many_ more active pages than inactive, the plain 2.4.10
VM doesn't age the active list nearly fast enough.
That's fixed in Andrea's VM tweaks, but if you want to look into this,
the basic problem is in mm/vmscan.c, shrink_caches(), which in plain
2.4.10 does
/* Do we want to age the active list? */
if (nr_inactive_pages < nr_active_pages*2)
refill_inactive(nr_pages);
which doesn't take into account just _how_ imbalanced the active list
is. So if the active list is huge, it will still just scan a small fixed
percentage of it (and to make matters worse, the small part of it is
proportional to the size of the _inactive_ list, so if the inactive list
is small, that just makes the problem worse.
What Andreas fix does is to make the refill rate be proportional to the
sizes of the lists, which should fix this problem for you.
However, I'd also like to fix generic_file_read() to only mark the page
accessed when we're touching it for the first time, and notice
sequential accesses automatically. That way the use-once logic doesn't
depend on the read size - which is a totally independent problem.
If you want to test, the fix for _that_ is in mm/filemap.c:
do_generic_file_read(), where the code does:
...
ret = actor(desc, page, offset, nr);
offset += ret;
index += offset >> PAGE_CACHE_SHIFT;
offset &= ~PAGE_CACHE_MASK;
mark_page_accessed(page);
...
and it would be interesting to hear if the behaviour improves with the
above mark_page_accessed() logic moved a bit and changed to:
...
ret = actor(desc, page, offset, nr);
if (!offset || !file->f_reada)
mark_page_accessed(page);
offset += ret;
index += offset >> PAGE_CACHE_SHIFT;
offset &= ~PAGE_CACHE_MASK;
...
(which basically says: we only mark the page accessed if we read the
_beginning_ of the page, or if we just did a seek to it)
Btw, if you test the above change out and confirm that it fixes te
behaviour, please send me an acknowledgement email - I've not done it in
my own tree yet, and unless I get a "yes, that works well" email I won't
be doing it..
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/