> Are you concerned with increase in memory used per counter Here? I
suppose
> that must not be that much of an issue for a 16 processor box....
Nope, I'm concerned that if this mechanism is to be used for all counters,
the improvement in cache coherence might be less significant to the point
where the additional overhead isn't worth it.
Arjab van de Ven voiced similar concerns but he also said:
> There's several things where per cpu data is useful; low frequency
> statistics is not one of them in my opinion.
...which may be true for 4-ways and even 8-ways but when you get to
32-ways and greater, you start seeing cache problems. That was the
case on AIX and per-cpu counters was one of the changes that helped
get the spectacular scalability on Regatta.
Anyway, since we just had a long thread going on NUMA topology, maybe
it would be proper to investigate if there is a better way, such as
using the topology to decide where to put counters? I think so, seeing
as it is that most Intel based 8-ways and above will have at least some
NUMA in them.
> Well, I wrote a simple kernel module which just increments a shared
global
> counter a million times per processor in parallel, and compared it with
> the statctr which would be incremented a million times per processor in
> parallel..
I suspected that. Would it be possible to do the test on the real
counters?
Niels Christiansen
IBM LTC, Kernel Performance
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/