Re: [Lse-tech] [RFC] [PATCH] Scalable Statistics Counters

Manfred Spraul (manfred@colorfullife.com)
Sun, 09 Dec 2001 11:57:55 +0100

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Pavel Machek: "Re: SMP/cc Cluster description"
Previous message: Jens Axboe: "Re: OOPS: 2.5.1-pre8 - cdrecord + ide_scsi"

>
>
>Assuming the slab allocator manages by node, kmem_cache_alloc_node() &
>kmem_cache_alloc_cpu() would be identical (exzcept for spelling :-).
>Each would pick up the nodeid from the cpu_data struct, then allocate
>from the slab cache for that node.
>

kmem_cache_alloc is simple - the complex operation is kmem_cache_free.

The current implementation
- assumes that virt_to_page() and reading one cacheline from the page
structure is fast. Is that true for your setups?
- uses an array to batch several free calls together: If the array
overflows, then up to 120 objects are freed in one call, to reduce
cacheline trashing.

If virt_to_page is fast, then a NUMA allocator would be a simple
extention of the current implementation:

* one slab chain for each node, one spinlock for each node.
* 2 per-cpu arrays for each cpu: one for "correct node" kmem_cache_free
calls , one for "foreign node" kmem_cache_free calls.
* kmem_cache_alloc allocates from the "correct node" per-cpu array,
fallback to the per-node slab chain, then fallback to __get_free_pages.
* kmem_cache_free checks to which node the freed object belongs and adds
it to the appropriate per-cpu array. The array overflow function then
sorts the objects into the correct slab chains.

If virt_to_page is slow we need a different design. Currently it's
called in every kmem_cache_free/kfree call.

-- Manfred

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

Next message: Pavel Machek: "Re: SMP/cc Cluster description"
Previous message: Jens Axboe: "Re: OOPS: 2.5.1-pre8 - cdrecord + ide_scsi"