I agree. I will have to look closer but unless there is more
juice than I have seen in Hyper-Transport it is going to become
one of the architectural bottlenecks of the Hammer.
Currently you get 1600MB/s in a single direction. Not to bad.
But when the memory controllers get out to dual channel DDR-II 400,
the local bandwidth to that memory is 6400MB/s, and the bandwidth to
remote memory 1600MB/s, or 3200MB/s (if reads are as common as
writes).
So I suspect bandwidth intensive applications will really benefit
from local memory optimization on the Hammer. I can buy that the
latency is negligible, the fact the links don't appear to scale
in bandwidth as well as the connection to memory may be a bigger
issue.
> > And then you associate that zone-list with the process, and use that
> > zone-list for all process allocations.
>
> That's the basic idea sure for normal allocations from applications
> that do not care much about NUMA.
>
> But "numa aware" applications want to do other things like:
> - put some memory area into every node (e.g. for the numa equivalent of
> per CPU data in the kernel)
> - "stripe" a shared memory segment over all available memory subsystems
> (e.g. to use memory bandwidth fully if you know your interconnect can
> take it; that's e.g. the case on the Hammer)
The latter I really quite believe. Even dual channel PC2100 can
exceed your interprocessor bandwidth.
And yes I have measured 2000MB/s memory copy with an Athlon MP and
PC2100 memory.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/