Re: [PATCH 4/3] Replace dynamic percpu implementation

Rusty Russell (rusty@au1.ibm.com)
Thu, 22 May 2003 18:36:31 +1000


In message <20030522081423.GC27614@in.ibm.com> you write:
> 4. Extra dereferences in alloc_percpu were not significant, but alloc_percpu
> was interlaced and kmalloc_percpu_new wasn't. Insn profile seemed to
> indicate extra cost in memory dereferencing of alloc_percpu was
> offset by the interlacing/objects sharing the same cacheline part.
> but then insn profiles are only indicative...not accurate.

Interesting: personally I consider the cacheline sharing a feature,
and unless you've done something special, the static declaration
should be interlaced too, no?

If you don't want interlaced, you should make your type
____cacheline_aligned for alloc_percpu, or use

__alloc_percpu(ALIGN(sizeof(x), SMP_CACHE_BYTES), SMP_CACHE_BYTES)

Aside: if kmalloc_percpu uses the per-cpu offset too, it probably
makes sense to make the per-cpu offset to a first class citizen, and
smp_processor_id to be derived, rather than the other way around as at
the moment. This would offer further speedup by removing a level of
indirection.

If you're interested I can probably produce such a patch for x86...

Thanks for the results!
Rusty.

--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/