How NUMA binding APIs have been used successfully is for a group of
processes to all decide to "hang out" together in the same vicinity
so that they can optimize access to shared memory. In existing NUMA
systems, this has been deciding to execute on a subset of the nodes.
In a system with 4 nodes, on NUMAQ machines, the latency is either
local or remote. Thus if one sets a binding/latency argument such
that remote accesses are allowed, then all of the nodes are fair game,
otherwise only the local node is used. So a set of processes decides
to occupy two nodes. Using strictly a latency argument, there is no
way to specify this. One could use cpu binding to restrict the
processes to the two nodes, but not the memory - unless you now
associate cpus with memory/nodes and are back to maintaining
topology info.
Another shortcoming of latency based binding is if a process executes
on a node for awhile, then gets moved to another node. In that case
the best memory allocation depends on several factors. The first is
to determine where we measure latency from - the node the process is
currently executing or where it had been executing? From a scheduling
perspective we are playing with the idea of a home node. If a
process is dispatched off of the home node should memory allocations
come from the home node or the current node? If neither has available
memory should latency be considered from home or current?
The other way that memory binding has been used is for a large process,
typically a database, to want control over where it places memory and
data structures. It is not a latency issue, but rather one of how
the work is distributed across the system.
The memory binding that Matt proposed allows an architecture to
define what a memory block is, and the application to determine
how it wants to bind to the memory blocks. It is only intended
for use by applications that are aware of the complexities
involved, and these will have to be knowledgeable about the
systems that they are on. Hopefully, by default, Linux will
do the right things as far as memory placement for processes
that choose to leave it up to the system - which will be the
majority of apps. However, the apps that want to specify their
memory placement, want the ability to have explicit control
over the nodes it lands on. It is not strictly a latency based
decision.
Michael
--Michael Hohnbaum 503-578-5486 hohnbaum@us.ibm.com T/L 775-5486
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/