Well, I actually like parts of this. The "always swap out current mm" one
looks rather dangerous, and the lastscan jiffy thing is too ugly for
words, but refill_inactive() looks much nicer. There is beauty in
simplicity.
The page aging in drop_pte feels pretty harsh, though.
Have you looked at "free_pte()"? I don't like that function, and it might
make a difference. There are several small nits with it:
- it should probably try to deactivate the page. If drop_pte does that
when it deacctivates a page involuntarily, why not do it for a real "we
just free'd the page voluntarily"?
- swap-cache pages should probably not just be de-activated, but actively
aged down. Right now, they are neither, so we have to work all the
way through refill_inactive() and then page_launder() to clear them
out. Even though the page may be entirely useless by now (we had a
complex special case that caught and short-circuited some of the pages,
and maybe it was worth it. But maybe the right thing is to just age
them down and naturally deactivate them?)
After all, we aged them up for references to this virtual
mapping, and free_pte() just made it go away. Unlike normal page cache
pages, we don't get any advantage from trying to cache the things
across multiple VM's.
- we're dropping the accessed bit on the floor. In the vmscan case the
accessed bit would have aged the page up.
On the other hand, to offset some of these, we actually count the page
accessed _twice_ sometimes: we count it on lookup, and we count it when we
see the accessed bit in vmscan.c. Which results in some pages getting aged
up twice for just one access if we go through the vmscan logic, while if
we just map and unmap them they get counted just once.
Obviously the page aging logic seems to be making a noticeable difference
to you. So looking at page aging logic issues in the bigger picture migth
be worthwhile - not just staring at the actual swap-out code. The fact is,
the swap-out-code cannot get the aging right if the rest of the system
ignores it or does it only for some cases.
I _think_ the logic should be something along the lines of: "freeing the
page amounts to a implied down-aging of the page, but the 'accessed' bit
would have aged it up, so the two take each other out". But if so, the
free_pte() logic should have something like
if (page->mapping) {
if (!pte_young(pte) || PageSwapCache(page))
age_page_down_ageonly(page);
if (!page->age)
deactivate_page(page);
}
instead of just ignoring these issues completely.
Comments?
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/