You beat me to it, with some minutes...
(I sent a email to Stephan...)
> Of course it would be much better if we had some positive mechanism for
> defragging physical memory instead of just relying on chance and hoping
> for the best the way we do now. IMHO, such a mechanism can be built
> practically and I'm willing to give it a try. Basically, kswapd would try
> to restore a depleted zone order list by scanning mem_map to find buddy
> partners for free blocks of the next lower order. This strategy, together
> with the one used in the patch above, could largly eliminate atomic
> allocation failures. (Although as I mentioned some time ago, getting rid
> of them entirely is an impossible problem.)
>
> The question remains why we suddenly started seeing more atomic allocation
> failures in the recent Linus trees. I'll guess that the changes in
> scanning strategy have caused the system to spend more time close to the
> zone->pages_min amount of free memory. This idea seems to be supported by
> your memstat listings. In some sense, it's been good to have the issue
> forced so that we must come up with ways to make atomic and higher order
> allocations less fragile.
It might be that the elevator works now... :-)
You will only see it once there are no remaining free pages of an even higher
order left - then you will start to fail...
Two things are required:
1) You have lots of memory.
2) You have used it all at some point.
Another thing to do could be to add a order parameter to free.
The pages allocated has to be freed sometime... if we make sure that
they are freed together it could simplify things - no chance that the
first part gets allocated directly...
Or/and we could remove the sources of higher order allocs, as an example:
why is the SCSI layer allowed to allocate order 3 allocs (32 kB) several
times per second? Will we really get a performance hit by not allowing
higher order allocs in active driver code?
/RogerL
-- Roger Larsson Skellefteċ Sweden - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/