part 6:
- enable the cpu array for all caches
- remove the optimized implementations for quick list access - with cpu
arrays in all caches, the list access is now rare.
- make the cpu arrays mandatory, this removes 50% of the conditional
branches from the hot path of kmem_cache_alloc [1]
- poisoning for objects with constructors
Patch got a bit longer...
I forgot to mention this: head arrays mean that some pages can be
blocked due to objects in the head arrays, and not returned to
page_alloc.c. The current kernel never flushes the head arrays, this
might worsen the behaviour of low memory systems. The hunk that flushes
the arrays regularly comes next.
Details changelog: [to be read site by side with the patch]
* docu update
* "growing" is not really needed: races between grow and shrink are
handled by retrying. [additionally, the current kernel never
shrinks]
* move the batchcount into the cpu array:
the old code contained a race during cpu cache tuning:
update batchcount [in cachep] before or after the IPI?
And NUMA will need it anyway.
* bootstrap support: the cpu arrays are really mandatory, nothing works
without them. Thus a statically allocated cpu array is needed
to for starting the allocators.
* move the full, partial & free lists into a separate structure,
as a preparation for NUMA
* structure reorganization: now the cpu arrays are the most important
part, not the lists.
* dead code elimination: remove "failures", nowhere read.
* dead code elimination: remove "OPTIMIZE": not implemented.
The idea is to skip the virt_to_page lookup for caches with
on-slab slab structures, and use (ptr&PAGE_MASK) instead.
The details are in Bonwicks paper. Not fully implemented.
* remove GROWN: kernel never shrinks a cache, thus grown is meaningless.
* bootstrap: starting the slab allocator is now a 3 stage process:
- nothing works, use the statically allocated cpu arrays.
- the smallest kmalloc allocator works, use it to allocate
cpu arrays.
- all kmalloc allocators work, use the default cpu array size
* register a cpu nodifier callback, and allocate the needed head arrays
if a new cpu arrives
* always enable head arrays, even for DEBUG builds. Poisoning and
red-zoning now happens before an object is added to the arrays.
Insert enable_all_cpucaches into cpucache_init, there is no
need for seperate function.
* modifications to the debug checks due to the earlier calls of the
dtor for caches with poisoning enabled
* poison+ctor is now supported
* squezing 3 objects into a cacheline is hopeless, the FIXME is
not solvable and can be removed.
* add additional debug tests: check_irq_off(), check_irq_on(),
check_spinlock_acquired().
* move do_ccupdate_local nearer to do_tune_cpucache. Should have
been part of -04-drain.
* additional objects checks. red-zoning is tricky: it's implemented
by increasing the object size by 2*BYTES_PER_WORD.
Thus BYTES_PER_WORD must be added to objp before calling
the destructor, constructor or before returing the object
from alloc. The poison functions add BYTES_PER_WORD internally.
* create a flagcheck function, right now the tests are duplicated
in cache_grow [always] and alloc_debugcheck_before [DEBUG only]
* modify slab list updates: all allocs are now bulk allocs that try
to get multiple objects at once, update the list pointers
only at the end of a bulk alloc, not once per alloc.
* might_seep was moved into kmem_flagcheck.
* major hotpath change:
- cc always exists, no fallback
- cache_alloc_refill is called with disabled interrupts,
and does everything to recover from an empty cpu array.
Far shorter & simpler __cache_alloc [inlined in both
kmalloc and kmem_cache_alloc]
* __free_block, free_block, cache_flusharray:
main implementation of returning objects to the lists.
no big changes, diff lost track.
* new debug check: too early kmalloc or kmem_cache_alloc
* slightly reduce the sizes of the cpu arrays: keep
the size < a power of 2, including batchcount, avail
and now limit, for optimal kmalloc memory efficiency.
That's it.
I even found 2 bugs while reading: dtors and ctors for verify were
called with wrong parameters, with RED_ZONE enabled, and some checks
still assumed that POISON and ctor are incompatible.
-- Manfred [1] there were 2 branches, now only one is left. But 50% sounds better.
--------------020505040104080307070503 Content-Type: text/plain; name="patch-slab-split-06-mand-cpuarray" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch-slab-split-06-mand-cpuarray"
--- 2.5/mm/slab.c Fri Oct 4 21:38:29 2002 +++ build-2.5/mm/slab.c Sat Oct 5 00:50:40 2002 @@ -8,6 +8,9 @@ * Major cleanup, different bufctl logic, per-cpu arrays * (c) 2000 Manfred Spraul * + * Cleanup, make the head arrays unconditional, preparation for NUMA + * (c) 2002 Manfred Spraul + * * An implementation of the Slab Allocator as described in outline in; * UNIX Internals: The New Frontiers by Uresh Vahalia * Pub: Prentice Hall ISBN 0-13-101908-2 @@ -16,7 +19,6 @@ * Jeff Bonwick (Sun Microsystems). * Presented at: USENIX Summer 1994 Technical Conference * - * * The memory is organized in caches, one cache for each object type. * (e.g. inode_cache, dentry_cache, buffer_head, vm_area_struct) * Each cache consists out of many slabs (they are small (usually one @@ -38,12 +40,14 @@ * kmem_cache_destroy() CAN CRASH if you try to allocate from the cache * during kmem_cache_destroy(). The caller must prevent concurrent allocs. * - * On SMP systems, each cache has a short per-cpu head array, most allocs + * Each cache has a short per-cpu head array, most allocs * and frees go into that array, and if that array overflows, then 1/2 * of the entries in the array are given back into the global cache. - * This reduces the number of spinlock operations. + * The head array is strictly LIFO and should improve the cache hit rates. + * On SMP, it additionally reduces the spinlock operations. * - * The c_cpuarray may not be read with enabled local interrupts. + * The c_cpuarray may not be read with enabled local interrupts - + * it's changed with a smp_call_function(). * * SMP synchronization: * constructors and destructors are called without any locking. @@ -53,6 +57,10 @@ * and local interrupts are disabled so slab code is preempt-safe. * The non-constant members are protected with a per-cache irq spinlock. * + * Many thanks to Mark Hemment, who wrote another per-cpu slab patch + * in 2000 - many ideas in the current implementation are derived from + * his patch. + * * Further notes from the original documentation: * * 11 April '97. Started multi-threading - markhe @@ -61,10 +69,6 @@ * can never happen inside an interrupt (kmem_cache_create(), * kmem_cache_shrink() and kmem_cache_reap()). * - * To prevent kmem_cache_shrink() trying to shrink a 'growing' cache (which - * maybe be sleeping and therefore not holding the semaphore/lock), the - * growing field is used. This also prevents reaping from a cache. - * * At present, each engine can be growing a cache. This should be blocked. * */ @@ -77,6 +81,7 @@ #include <linux/init.h> #include <linux/compiler.h> #include <linux/seq_file.h> +#include <linux/notifier.h> #include <asm/uaccess.h> /* @@ -170,37 +175,91 @@ * cpucache_t * * Per cpu structures + * Purpose: + * - LIFO ordering, to hand out cache-warm objects from _alloc + * - reduce spinlock operations + * * The limit is stored in the per-cpu structure to reduce the data cache * footprint. + * On NUMA systems, 2 per-cpu structures exist: one for the current + * node, one for wrong node free calls. + * Memory from the wrong node is never returned by alloc, it's returned + * to the home node as soon as the cpu cache is filled + * */ typedef struct cpucache_s { unsigned int avail; unsigned int limit; + unsigned int batchcount; } cpucache_t; +/* bootstrap: The caches do not work without cpuarrays anymore, + * but the cpuarrays are allocated from the generic caches... + */ +#define BOOT_CPUCACHE_ENTRIES 1 +struct cpucache_int { + cpucache_t cache; + void * entries[BOOT_CPUCACHE_ENTRIES]; +}; + #define cc_entry(cpucache) \ ((void **)(((cpucache_t*)(cpucache))+1)) #define cc_data(cachep) \ ((cachep)->cpudata[smp_processor_id()]) /* + * NUMA: check if 'ptr' points into the current node, + * use the alternate cpudata cache if wrong + */ +#define cc_data_ptr(cachep, ptr) \ + cc_data(cachep) + +/* + * The slab lists of all objects. + * Hopefully reduce the internal fragmentation + * NUMA: The spinlock could be moved from the kmem_cache_t + * into this structure, too. Figure out what causes + * fewer cross-node spinlock operations. + */ +struct kmem_list3 { + struct list_head slabs_partial; /* partial list first, better asm code */ + struct list_head slabs_full; + struct list_head slabs_free; +}; + +#define LIST3_INIT(parent) \ + { \ + .slabs_full = LIST_HEAD_INIT(parent.slabs_full), \ + .slabs_partial = LIST_HEAD_INIT(parent.slabs_partial), \ + .slabs_free = LIST_HEAD_INIT(parent.slabs_free) \ + } +#define list3_data(cachep) \ + (&(cachep)->lists) + +/* NUMA: per-node */ +#define list3_data_ptr(cachep, ptr) \ + list3_data(cachep) + +/* * kmem_cache_t * * manages a cache. */ - + struct kmem_cache_s { -/* 1) each alloc & free */ - /* full, partial first, then free */ - struct list_head slabs_full; - struct list_head slabs_partial; - struct list_head slabs_free; +/* 1) per-cpu data, touched during every alloc/free */ + cpucache_t *cpudata[NR_CPUS]; + /* NUMA: cpucache_t *cpudata_othernode[NR_CPUS]; */ + unsigned int batchcount; + unsigned int limit; +/* 2) touched by every alloc & free from the backend */ + struct kmem_list3 lists; + /* NUMA: kmem_3list_t *nodelists[NR_NODES] */ unsigned int objsize; unsigned int flags; /* constant flags */ unsigned int num; /* # of objs per slab */ spinlock_t spinlock; - unsigned int batchcount; -/* 2) slab additions /removals */ +/* 3) cache_grow/shrink */ /* order of pgs per slab (2^n) */ unsigned int gfporder; @@ -211,7 +270,6 @@ unsigned int colour_off; /* colour offset */ unsigned int colour_next; /* cache colouring */ kmem_cache_t *slabp_cache; - unsigned int growing; unsigned int dflags; /* dynamic flags */ /* constructor func */ @@ -220,13 +278,11 @@ /* de-constructor func */ void (*dtor)(void *, kmem_cache_t *, unsigned long); - unsigned long failures; - -/* 3) cache creation/removal */ +/* 4) cache creation/removal */ const char *name; struct list_head next; -/* 4) per-cpu data */ - cpucache_t *cpudata[NR_CPUS]; + +/* 5) statistics */ #if STATS unsigned long num_active; unsigned long num_allocations; @@ -243,14 +299,8 @@ /* internal c_flags */ #define CFLGS_OFF_SLAB 0x010000UL /* slab management in own cache */ -#define CFLGS_OPTIMIZE 0x020000UL /* optimized slab lookup */ - -/* c_dflags (dynamic flags). Need to hold the spinlock to access this member */ -#define DFLGS_GROWN 0x000001UL /* don't reap a recently grown */ #define OFF_SLAB(x) ((x)->flags & CFLGS_OFF_SLAB) -#define OPTIMIZE(x) ((x)->flags & CFLGS_OPTIMIZE) -#define GROWN(x) ((x)->dlags & DFLGS_GROWN) #if STATS #define STATS_INC_ACTIVE(x) ((x)->num_active++) @@ -376,11 +426,15 @@ }; #undef CN +struct cpucache_int cpuarray_cache __initdata = { { 0, BOOT_CPUCACHE_ENTRIES, 1} }; +struct cpucache_int cpuarray_generic __initdata = { { 0, BOOT_CPUCACHE_ENTRIES, 1} }; + /* internal cache of cache description objs */ static kmem_cache_t cache_cache = { - .slabs_full = LIST_HEAD_INIT(cache_cache.slabs_full), - .slabs_partial = LIST_HEAD_INIT(cache_cache.slabs_partial), - .slabs_free = LIST_HEAD_INIT(cache_cache.slabs_free), + .lists = LIST3_INIT(cache_cache.lists), + .cpudata = { [0] = &cpuarray_cache.cache }, + .batchcount = 1, + .limit = BOOT_CPUCACHE_ENTRIES, .objsize = sizeof(kmem_cache_t), .flags = SLAB_NO_REAP, .spinlock = SPIN_LOCK_UNLOCKED, @@ -400,10 +454,13 @@ * chicken and egg problem: delay the per-cpu array allocation * until the general caches are up. */ -static int g_cpucache_up; +enum { + NONE, + PARTIAL, + FULL +} g_cpucache_up; static void enable_cpucache (kmem_cache_t *cachep); -static void enable_all_cpucaches (void); /* Cal the num objs, wastage, and bytes left over for a given slab size. */ static void cache_estimate (unsigned long gfporder, size_t size, @@ -433,6 +490,56 @@ *left_over = wastage; } +#ifdef CONFIG_SMP +/* + * Note: if someone calls kmem_cache_alloc() on the new + * cpu before the cpuup callback had a chance to allocate + * the head arrays, it will oops. + * Is CPU_ONLINE early enough? + */ +static int __devinit cpuup_callback(struct notifier_block *nfb, + unsigned long action, + void *hcpu) +{ + int cpu = (int)hcpu; + if (action == CPU_ONLINE) { + struct list_head *p; + cpucache_t *nc; + + down(&cache_chain_sem); + + p = &cache_cache.next; + do { + int memsize; + + kmem_cache_t* cachep = list_entry(p, kmem_cache_t, next); + memsize = sizeof(void*)*cachep->limit+sizeof(cpucache_t); + nc = kmalloc(memsize, GFP_KERNEL); + if (!nc) + goto bad; + nc->avail = 0; + nc->limit = cachep->limit; + nc->batchcount = cachep->batchcount; + + cachep->cpudata[cpu] = nc; + + p = cachep->next.next; + } while (p != &cache_cache.next); + + if (g_cpucache_up == FULL) + do_cpucall(do_timerstart, NULL, cpu); + up(&cache_chain_sem); + } + + return NOTIFY_OK; +bad: + up(&cache_chain_sem); + return NOTIFY_BAD; +} + +static struct notifier_block cpucache_notifier = { &cpuup_callback, NULL, 0 }; +#endif + /* Initialisation - setup the `cache' cache. */ void __init kmem_cache_init(void) { @@ -448,6 +555,13 @@ cache_cache.colour = left_over/cache_cache.colour_off; cache_cache.colour_next = 0; + +#ifdef CONFIG_SMP + /* Register a cpu startup notifier callback + * that initializes cc_data for all new cpus + */ + register_cpu_notifier(&cpucache_notifier); +#endif } @@ -489,12 +603,46 @@ BUG(); sizes++; } while (sizes->cs_size); + /* + * The generic caches are running - time to kick out the + * bootstrap cpucaches. + */ + { + void * ptr; + + ptr = kmalloc(sizeof(struct cpucache_int), GFP_KERNEL); + local_irq_disable(); + BUG_ON(cc_data(&cache_cache) != &cpuarray_cache.cache); + memcpy(ptr, cc_data(&cache_cache), sizeof(struct cpucache_int)); + cc_data(&cache_cache) = ptr; + local_irq_enable(); + + ptr = kmalloc(sizeof(struct cpucache_int), GFP_KERNEL); + local_irq_disable(); + BUG_ON(cc_data(cache_sizes[0].cs_cachep) != &cpuarray_generic.cache); + memcpy(ptr, cc_data(cache_sizes[0].cs_cachep), + sizeof(struct cpucache_int)); + cc_data(cache_sizes[0].cs_cachep) = ptr; + local_irq_enable(); + } } int __init cpucache_init(void) { - g_cpucache_up = 1; - enable_all_cpucaches(); + struct list_head* p; + + down(&cache_chain_sem); + g_cpucache_up = FULL; + + p = &cache_cache.next; + do { + kmem_cache_t* cachep = list_entry(p, kmem_cache_t, next); + enable_cpucache(cachep); + p = cachep->next.next; + } while (p != &cache_cache.next); + + up(&cache_chain_sem); + return 0; } @@ -574,37 +722,34 @@ */ static void slab_destroy (kmem_cache_t *cachep, slab_t *slabp) { - if (cachep->dtor #if DEBUG - || cachep->flags & (SLAB_POISON | SLAB_RED_ZONE) -#endif - ) { + int i; + for (i = 0; i < cachep->num; i++) { + void* objp = slabp->s_mem+cachep->objsize*i; + if (cachep->flags & SLAB_POISON) + check_poison_obj(cachep, objp); + + if (cachep->flags & SLAB_RED_ZONE) { + if (*((unsigned long*)(objp)) != RED_MAGIC1) + BUG(); + if (*((unsigned long*)(objp + cachep->objsize - + BYTES_PER_WORD)) != RED_MAGIC1) + BUG(); + objp += BYTES_PER_WORD; + } + if (cachep->dtor && !(cachep->flags & SLAB_POISON)) + (cachep->dtor)(objp, cachep, 0); + } +#else + if (cachep->dtor) { int i; for (i = 0; i < cachep->num; i++) { void* objp = slabp->s_mem+cachep->objsize*i; -#if DEBUG - if (cachep->flags & SLAB_RED_ZONE) { - if (*((unsigned long*)(objp)) != RED_MAGIC1) - BUG(); - if (*((unsigned long*)(objp + cachep->objsize - -BYTES_PER_WORD)) != RED_MAGIC1) - BUG(); - objp += BYTES_PER_WORD; - } -#endif - if (cachep->dtor) - (cachep->dtor)(objp, cachep, 0); -#if DEBUG - if (cachep->flags & SLAB_RED_ZONE) { - objp -= BYTES_PER_WORD; - } - if ((cachep->flags & SLAB_POISON) && - check_poison_obj(cachep, objp)) - BUG(); -#endif + (cachep->dtor)(objp, cachep, 0); } } - +#endif + kmem_freepages(cachep, slabp->s_mem-slabp->colouroff); if (OFF_SLAB(cachep)) kmem_cache_free(cachep->slabp_cache, slabp); @@ -670,11 +815,6 @@ flags &= ~SLAB_DEBUG_INITIAL; } - if ((flags & SLAB_POISON) && ctor) { - /* request for poisoning, but we can't do that with a constructor */ - printk("%sPoisoning requested, but con given - %s\n", func_nm, name); - flags &= ~SLAB_POISON; - } #if FORCED_DEBUG if ((size < (PAGE_SIZE>>3)) && !(flags & SLAB_MUST_HWCACHE_ALIGN)) /* @@ -682,8 +822,7 @@ * fragmentation. */ flags |= SLAB_RED_ZONE; - if (!ctor) - flags |= SLAB_POISON; + flags |= SLAB_POISON; #endif #endif @@ -735,7 +874,6 @@ if (flags & SLAB_HWCACHE_ALIGN) { /* Need to adjust size so that objs are cache aligned. */ /* Small obj size, can get at least two per cache line. */ - /* FIXME: only power of 2 supported, was better */ while (size < align/2) align /= 2; size = (size+align-1)&(~(align-1)); @@ -802,19 +940,16 @@ cachep->colour_off = offset; cachep->colour = left_over/offset; - /* init remaining fields */ - if (!cachep->gfporder && !(flags & CFLGS_OFF_SLAB)) - flags |= CFLGS_OPTIMIZE; - cachep->flags = flags; cachep->gfpflags = 0; if (flags & SLAB_CACHE_DMA) cachep->gfpflags |= GFP_DMA; spin_lock_init(&cachep->spinlock); cachep->objsize = size; - INIT_LIST_HEAD(&cachep->slabs_full); - INIT_LIST_HEAD(&cachep->slabs_partial); - INIT_LIST_HEAD(&cachep->slabs_free); + /* NUMA */ + INIT_LIST_HEAD(&cachep->lists.slabs_full); + INIT_LIST_HEAD(&cachep->lists.slabs_partial); + INIT_LIST_HEAD(&cachep->lists.slabs_free); if (flags & CFLGS_OFF_SLAB) cachep->slabp_cache = kmem_find_general_cachep(slab_size,0); @@ -822,8 +957,27 @@ cachep->dtor = dtor; cachep->name = name; - if (g_cpucache_up) + if (g_cpucache_up == FULL) { enable_cpucache(cachep); + } else { + if (g_cpucache_up == NONE) { + /* Note: the first kmem_cache_create must create + * the cache that's used by kmalloc(24), otherwise + * the creation of further caches will BUG(). + */ + cc_data(cachep) = &cpuarray_generic.cache; + g_cpucache_up = PARTIAL; + } else { + cc_data(cachep) = kmalloc(sizeof(struct cpucache_int),GFP_KERNEL); + } + BUG_ON(!cc_data(cachep)); + cc_data(cachep)->avail = 0; + cc_data(cachep)->limit = BOOT_CPUCACHE_ENTRIES; + cc_data(cachep)->batchcount = 1; + cachep->batchcount = 1; + cachep->limit = BOOT_CPUCACHE_ENTRIES; + } + /* Need the semaphore to access the chain. */ down(&cache_chain_sem); { @@ -861,32 +1015,27 @@ return cachep; } - -#if DEBUG -/* - * This check if the kmem_cache_t pointer is chained in the cache_cache - * list. -arca - */ -static int is_chained_cache(kmem_cache_t * cachep) +static inline void check_irq_off(void) { - struct list_head *p; - int ret = 0; - - /* Find the cache in the chain of caches. */ - down(&cache_chain_sem); - list_for_each(p, &cache_chain) { - if (p == &cachep->next) { - ret = 1; - break; - } - } - up(&cache_chain_sem); +#if DEBUG + BUG_ON(!irqs_disabled()); +#endif +} - return ret; +static inline void check_irq_on(void) +{ +#if DEBUG + BUG_ON(irqs_disabled()); +#endif } -#else -#define is_chained_cache(x) 1 + +static inline void check_spinlock_acquired(kmem_cache_t *cachep) +{ +#ifdef CONFIG_SMP + check_irq_off(); + BUG_ON(spin_trylock(&cachep->spinlock)); #endif +} /* * Waits for all CPUs to execute func(). @@ -900,20 +1049,6 @@ if (smp_call_function(func, arg, 1, 1)) BUG(); } -typedef struct ccupdate_struct_s -{ - kmem_cache_t *cachep; - cpucache_t *new[NR_CPUS]; -} ccupdate_struct_t; - -static void do_ccupdate_local(void *info) -{ - ccupdate_struct_t *new = (ccupdate_struct_t *)info; - cpucache_t *old = cc_data(new->cachep); - - cc_data(new->cachep) = new->new[smp_processor_id()]; - new->new[smp_processor_id()] = old; -} static void free_block (kmem_cache_t* cachep, void** objpp, int len); @@ -922,6 +1057,7 @@ kmem_cache_t *cachep = (kmem_cache_t*)arg; cpucache_t *cc; + check_irq_off(); cc = cc_data(cachep); free_block(cachep, &cc_entry(cc)[0], cc->avail); cc->avail = 0; @@ -932,6 +1068,8 @@ smp_call_function_all_cpus(do_drain, cachep); } + +/* NUMA shrink all list3s */ static int __cache_shrink(kmem_cache_t *cachep) { slab_t *slabp; @@ -939,17 +1077,17 @@ drain_cpu_caches(cachep); + check_irq_on(); spin_lock_irq(&cachep->spinlock); - /* If the cache is growing, stop shrinking. */ - while (!cachep->growing) { + for(;;) { struct list_head *p; - p = cachep->slabs_free.prev; - if (p == &cachep->slabs_free) + p = cachep->lists.slabs_free.prev; + if (p == &cachep->lists.slabs_free) break; - slabp = list_entry(cachep->slabs_free.prev, slab_t, list); + slabp = list_entry(cachep->lists.slabs_free.prev, slab_t, list); #if DEBUG if (slabp->inuse) BUG(); @@ -960,7 +1098,8 @@ slab_destroy(cachep, slabp); spin_lock_irq(&cachep->spinlock); } - ret = !list_empty(&cachep->slabs_full) || !list_empty(&cachep->slabs_partial); + ret = !list_empty(&cachep->lists.slabs_full) || + !list_empty(&cachep->lists.slabs_partial); spin_unlock_irq(&cachep->spinlock); return ret; } @@ -974,7 +1113,7 @@ */ int kmem_cache_shrink(kmem_cache_t *cachep) { - if (!cachep || in_interrupt() || !is_chained_cache(cachep)) + if (!cachep || in_interrupt()) BUG(); return __cache_shrink(cachep); @@ -1021,6 +1160,7 @@ int i; for (i = 0; i < NR_CPUS; i++) kfree(cachep->cpudata[i]); + /* NUMA: free the list3 structures */ } kmem_cache_free(&cache_cache, cachep); @@ -1039,10 +1179,6 @@ if (!slabp) return NULL; } else { - /* FIXME: change to - slabp = objp - * if you enable OPTIMIZE - */ slabp = objp+colour_off; colour_off += L1_CACHE_ALIGN(cachep->num * sizeof(kmem_bufctl_t) + sizeof(slab_t)); @@ -1062,34 +1198,35 @@ for (i = 0; i < cachep->num; i++) { void* objp = slabp->s_mem+cachep->objsize*i; #if DEBUG + /* need to poison the objs? */ + if (cachep->flags & SLAB_POISON) + poison_obj(cachep, objp); + if (cachep->flags & SLAB_RED_ZONE) { *((unsigned long*)(objp)) = RED_MAGIC1; *((unsigned long*)(objp + cachep->objsize - BYTES_PER_WORD)) = RED_MAGIC1; objp += BYTES_PER_WORD; } -#endif - /* * Constructors are not allowed to allocate memory from * the same cache which they are a constructor for. * Otherwise, deadlock. They must also be threaded. */ - if (cachep->ctor) + if (cachep->ctor && !(cachep->flags & SLAB_POISON)) cachep->ctor(objp, cachep, ctor_flags); -#if DEBUG - if (cachep->flags & SLAB_RED_ZONE) - objp -= BYTES_PER_WORD; - if (cachep->flags & SLAB_POISON) - /* need to poison the objs */ - poison_obj(cachep, objp); + if (cachep->flags & SLAB_RED_ZONE) { + objp -= BYTES_PER_WORD; if (*((unsigned long*)(objp)) != RED_MAGIC1) BUG(); if (*((unsigned long*)(objp + cachep->objsize - BYTES_PER_WORD)) != RED_MAGIC1) BUG(); } +#else + if (cachep->ctor) + cachep->ctor(objp, cachep, ctor_flags); #endif slab_bufctl(slabp)[i] = i+1; } @@ -1097,6 +1234,20 @@ slabp->free = 0; } +static void kmem_flagcheck(kmem_cache_t *cachep, int flags) +{ + if (flags & __GFP_WAIT) + might_sleep(); + + if (flags & SLAB_DMA) { + if (!(cachep->gfpflags & GFP_DMA)) + BUG(); + } else { + if (cachep->gfpflags & GFP_DMA) + BUG(); + } +} + /* * Grow (by 1) the number of slabs within a cache. This is called by * kmem_cache_alloc() when there are no active objs left in a cache. @@ -1109,7 +1260,6 @@ size_t offset; unsigned int i, local_flags; unsigned long ctor_flags; - unsigned long save_flags; /* Be lazy and only check for valid flags here, * keeping it out of the critical path in kmem_cache_alloc(). @@ -1119,15 +1269,6 @@ if (flags & SLAB_NO_GROW) return 0; - /* - * The test for missing atomic flag is performed here, rather than - * the more obvious place, simply to reduce the critical path length - * in kmem_cache_alloc(). If a caller is seriously mis-behaving they - * will eventually be caught here (where it matters). - */ - if (in_interrupt() && (flags & __GFP_WAIT)) - BUG(); - ctor_flags = SLAB_CTOR_CONSTRUCTOR; local_flags = (flags & SLAB_LEVEL_MASK); if (!(local_flags & __GFP_WAIT)) @@ -1138,7 +1279,8 @@ ctor_flags |= SLAB_CTOR_ATOMIC; /* About to mess with non-constant members - lock. */ - spin_lock_irqsave(&cachep->spinlock, save_flags); + check_irq_off(); + spin_lock(&cachep->spinlock); /* Get colour for the slab, and cal the next value. */ offset = cachep->colour_next; @@ -1146,19 +1288,20 @@ if (cachep->colour_next >= cachep->colour) cachep->colour_next = 0; offset *= cachep->colour_off; - cachep->dflags |= DFLGS_GROWN; - cachep->growing++; - spin_unlock_irqrestore(&cachep->spinlock, save_flags); + spin_unlock(&cachep->spinlock); - /* A series of memory allocations for a new slab. - * Neither the cache-chain semaphore, or cache-lock, are - * held, but the incrementing c_growing prevents this - * cache from being reaped or shrunk. - * Note: The cache could be selected in for reaping in - * cache_reap(), but when the final test is made the - * growing value will be seen. + if (local_flags & __GFP_WAIT) + local_irq_enable(); + + /* + * The test for missing atomic flag is performed here, rather than + * the more obvious place, simply to reduce the critical path length + * in kmem_cache_alloc(). If a caller is seriously mis-behaving they + * will eventually be caught here (where it matters). */ + kmem_flagcheck(cachep, flags); + /* Get mem for the objs. */ if (!(objp = kmem_getpages(cachep, flags))) @@ -1181,62 +1324,117 @@ cache_init_objs(cachep, slabp, ctor_flags); - spin_lock_irqsave(&cachep->spinlock, save_flags); - cachep->growing--; + if (local_flags & __GFP_WAIT) + local_irq_disable(); + check_irq_off(); + spin_lock(&cachep->spinlock); /* Make slab active. */ - list_add_tail(&slabp->list, &cachep->slabs_free); + list_add_tail(&slabp->list, &(list3_data(cachep)->slabs_free)); STATS_INC_GROWN(cachep); - cachep->failures = 0; - - spin_unlock_irqrestore(&cachep->spinlock, save_flags); + spin_unlock(&cachep->spinlock); return 1; opps1: kmem_freepages(cachep, objp); failed: - spin_lock_irqsave(&cachep->spinlock, save_flags); - cachep->growing--; - spin_unlock_irqrestore(&cachep->spinlock, save_flags); return 0; } /* * Perform extra freeing checks: - * - detect double free * - detect bad pointers. - * Called with the cache-lock held. + * - POISON/RED_ZONE checking + * - destructor calls, for caches with POISON+dtor */ - -#if DEBUG -static int extra_free_checks (kmem_cache_t * cachep, - slab_t *slabp, void * objp) +static inline void kfree_debugcheck(const void *objp) { - int i; - unsigned int objnr = (objp-slabp->s_mem)/cachep->objsize; +#if DEBUG + struct page *page; - if (objnr >= cachep->num) - BUG(); - if (objp != slabp->s_mem + objnr*cachep->objsize) + if (!virt_addr_valid(objp)) { + printk(KERN_ERR "kfree_debugcheck: out of range ptr %lxh.\n", + (unsigned long)objp); + BUG(); + } + page = virt_to_page(objp); + if (!PageSlab(page)) { + printk(KERN_ERR "kfree_debugcheck: bad ptr %lxh.\n", (unsigned long)objp); BUG(); - - /* Check slab's freelist to see if this obj is there. */ - for (i = slabp->free; i != BUFCTL_END; i = slab_bufctl(slabp)[i]) { - if (i == objnr) - BUG(); } - return 0; +#endif } -#endif -static inline void cache_alloc_head(kmem_cache_t *cachep, int flags) +static inline void *cache_free_debugcheck (kmem_cache_t * cachep, void * objp) { - if (flags & SLAB_DMA) { - if (!(cachep->gfpflags & GFP_DMA)) +#if DEBUG + struct page *page; + unsigned int objnr; + slab_t *slabp; + + kfree_debugcheck(objp); + page = virt_to_page(objp); + + BUG_ON(GET_PAGE_CACHE(page) != cachep); + slabp = GET_PAGE_SLAB(page); + + if (cachep->flags & SLAB_RED_ZONE) { + objp -= BYTES_PER_WORD; + if (xchg((unsigned long *)objp, RED_MAGIC1) != RED_MAGIC2) + /* Either write before start, or a double free. */ BUG(); - } else { - if (cachep->gfpflags & GFP_DMA) + if (xchg((unsigned long *)(objp+cachep->objsize - + BYTES_PER_WORD), RED_MAGIC1) != RED_MAGIC2) + /* Either write past end, or a double free. */ BUG(); } + + objnr = (objp-slabp->s_mem)/cachep->objsize; + + BUG_ON(objnr >= cachep->num); + BUG_ON(objp != slabp->s_mem + objnr*cachep->objsize); + + if (cachep->flags & SLAB_DEBUG_INITIAL) { + /* Need to call the slab's constructor so the + * caller can perform a verify of its state (debugging). + * Called without the cache-lock held. + */ + if (cachep->flags & SLAB_RED_ZONE) { + cachep->ctor(objp+BYTES_PER_WORD, + cachep, SLAB_CTOR_CONSTRUCTOR|SLAB_CTOR_VERIFY); + } else { + cachep->ctor(objp, cachep, SLAB_CTOR_CONSTRUCTOR|SLAB_CTOR_VERIFY); + } + } + if (cachep->flags & SLAB_POISON && cachep->dtor) { + /* we want to cache poison the object, + * call the destruction callback + */ + if (cachep->flags & SLAB_RED_ZONE) + cachep->dtor(objp+BYTES_PER_WORD, cachep, 0); + else + cachep->dtor(objp, cachep, 0); + } + if (cachep->flags & SLAB_POISON) { + poison_obj(cachep, objp); + } +#endif + return objp; +} + +static inline void check_slabp(kmem_cache_t *cachep, slab_t *slabp) +{ +#if DEBUG + int i; + int entries = 0; + + check_spinlock_acquired(cachep); + /* Check slab's freelist to see if this obj is there. */ + for (i = slabp->free; i != BUFCTL_END; i = slab_bufctl(slabp)[i]) { + entries++; + BUG_ON(entries > cachep->num); + } + BUG_ON(entries != cachep->num - slabp->inuse); +#endif } static inline void * cache_alloc_one_tail (kmem_cache_t *cachep, @@ -1244,6 +1442,8 @@ { void *objp; + check_spinlock_acquired(cachep); + STATS_INC_ALLOCED(cachep); STATS_INC_ACTIVE(cachep); STATS_SET_HIGH(cachep); @@ -1253,11 +1453,83 @@ objp = slabp->s_mem + slabp->free*cachep->objsize; slabp->free=slab_bufctl(slabp)[slabp->free]; - if (unlikely(slabp->free == BUFCTL_END)) { - list_del(&slabp->list); - list_add(&slabp->list, &cachep->slabs_full); + return objp; +} + +static inline void cache_alloc_listfixup(struct kmem_list3 *l3, slab_t *slabp) +{ + list_del(&slabp->list); + if (slabp->free == BUFCTL_END) { + list_add(&slabp->list, &l3->slabs_full); + } else { + list_add(&slabp->list, &l3->slabs_partial); } +} + +static void* cache_alloc_refill(kmem_cache_t* cachep, int flags) +{ + int batchcount; + struct kmem_list3 *l3; + cpucache_t *cc; + + check_irq_off(); + cc = cc_data(cachep); +retry: + batchcount = cc->batchcount; + l3 = list3_data(cachep); + + BUG_ON(cc->avail > 0); + spin_lock(&cachep->spinlock); + while (batchcount > 0) { + struct list_head *entry; + slab_t *slabp; + /* Get slab alloc is to come from. */ + entry = l3->slabs_partial.next; + if (entry == &l3->slabs_partial) { + entry = l3->slabs_free.next; + if (entry == &l3->slabs_free) + goto must_grow; + } + + slabp = list_entry(entry, slab_t, list); + check_slabp(cachep, slabp); + while (slabp->inuse < cachep->num && batchcount--) + cc_entry(cc)[cc->avail++] = + cache_alloc_one_tail(cachep, slabp); + check_slabp(cachep, slabp); + cache_alloc_listfixup(l3, slabp); + } + +must_grow: + spin_unlock(&cachep->spinlock); + + if (unlikely(!cc->avail)) { + int x; + x = cache_grow(cachep, flags); + + // cache_grow can reenable interrupts, then cc could change. + cc = cc_data(cachep); + if (!x && cc->avail == 0) // no objects in sight? abort + return NULL; + + if (!cc->avail) // objects refilled by interrupt? + goto retry; + } + return cc_entry(cc)[--cc->avail]; +} + +static inline void cache_alloc_debugcheck_before(kmem_cache_t *cachep, int flags) +{ #if DEBUG + kmem_flagcheck(cachep, flags); +#endif +} + +static inline void *cache_alloc_debugcheck_after (kmem_cache_t *cachep, unsigned long flags, void *objp) +{ +#if DEBUG + if (!objp) + return objp; if (cachep->flags & SLAB_POISON) if (check_poison_obj(cachep, objp)) BUG(); @@ -1271,246 +1543,128 @@ BUG(); objp += BYTES_PER_WORD; } + if (cachep->ctor && cachep->flags & SLAB_POISON) { + unsigned long ctor_flags = SLAB_CTOR_CONSTRUCTOR; + + if (!flags & __GFP_WAIT) + ctor_flags |= SLAB_CTOR_ATOMIC; + + cachep->ctor(objp, cachep, ctor_flags); + } #endif return objp; } -/* - * Returns a ptr to an obj in the given cache. - * caller must guarantee synchronization - * #define for the goto optimization 8-) - */ -#define cache_alloc_one(cachep) \ -({ \ - struct list_head * slabs_partial, * entry; \ - slab_t *slabp; \ - \ - slabs_partial = &(cachep)->slabs_partial; \ - entry = slabs_partial->next; \ - if (unlikely(entry == slabs_partial)) { \ - struct list_head * slabs_free; \ - slabs_free = &(cachep)->slabs_free; \ - entry = slabs_free->next; \ - if (unlikely(entry == slabs_free)) \ - goto alloc_new_slab; \ - list_del(entry); \ - list_add(entry, slabs_partial); \ - } \ - \ - slabp = list_entry(entry, slab_t, list); \ - cache_alloc_one_tail(cachep, slabp); \ -}) - -void* cache_alloc_batch(kmem_cache_t* cachep, int flags) -{ - int batchcount = cachep->batchcount; - cpucache_t* cc = cc_data(cachep); - - spin_lock(&cachep->spinlock); - while (batchcount--) { - struct list_head * slabs_partial, * entry; - slab_t *slabp; - /* Get slab alloc is to come from. */ - slabs_partial = &(cachep)->slabs_partial; - entry = slabs_partial->next; - if (unlikely(entry == slabs_partial)) { - struct list_head * slabs_free; - slabs_free = &(cachep)->slabs_free; - entry = slabs_free->next; - if (unlikely(entry == slabs_free)) - break; - list_del(entry); - list_add(entry, slabs_partial); - } - - slabp = list_entry(entry, slab_t, list); - cc_entry(cc)[cc->avail++] = - cache_alloc_one_tail(cachep, slabp); - } - spin_unlock(&cachep->spinlock); - - if (cc->avail) - return cc_entry(cc)[--cc->avail]; - return NULL; -} static inline void * __cache_alloc (kmem_cache_t *cachep, int flags) { unsigned long save_flags; void* objp; + cpucache_t *cc; - if (flags & __GFP_WAIT) - might_sleep(); + cache_alloc_debugcheck_before(cachep, flags); - cache_alloc_head(cachep, flags); -try_again: local_irq_save(save_flags); - { - cpucache_t *cc = cc_data(cachep); - - if (cc) { - if (cc->avail) { - STATS_INC_ALLOCHIT(cachep); - objp = cc_entry(cc)[--cc->avail]; - } else { - STATS_INC_ALLOCMISS(cachep); - objp = cache_alloc_batch(cachep,flags); - local_irq_restore(save_flags); - if (!objp) - goto alloc_new_slab_nolock; - return objp; - } - } else { - spin_lock(&cachep->spinlock); - objp = cache_alloc_one(cachep); - spin_unlock(&cachep->spinlock); - } + cc = cc_data(cachep); + if (likely(cc->avail)) { + STATS_INC_ALLOCHIT(cachep); + objp = cc_entry(cc)[--cc->avail]; + } else { + STATS_INC_ALLOCMISS(cachep); + objp = cache_alloc_refill(cachep, flags); } local_irq_restore(save_flags); + objp = cache_alloc_debugcheck_after(cachep, flags, objp); return objp; -alloc_new_slab: - spin_unlock(&cachep->spinlock); - local_irq_restore(save_flags); -alloc_new_slab_nolock: - if (cache_grow(cachep, flags)) - /* Someone may have stolen our objs. Doesn't matter, we'll - * just come back here again. - */ - goto try_again; - return NULL; } -/* - * Release an obj back to its cache. If the obj has a constructed - * state, it should be in this state _before_ it is released. - * - caller is responsible for the synchronization +/* + * NUMA: different approach needed if the spinlock is moved into + * the l3 structure */ -#if DEBUG -# define CHECK_NR(pg) \ - do { \ - if (!virt_addr_valid(pg)) { \ - printk(KERN_ERR "kfree: out of range ptr %lxh.\n", \ - (unsigned long)objp); \ - BUG(); \ - } \ - } while (0) -# define CHECK_PAGE(addr) \ - do { \ - struct page *page = virt_to_page(addr); \ - CHECK_NR(addr); \ - if (!PageSlab(page)) { \ - printk(KERN_ERR "kfree: bad ptr %lxh.\n", \ - (unsigned long)objp); \ - BUG(); \ - } \ - } while (0) - -#else -# define CHECK_PAGE(pg) do { } while (0) -#endif - -static inline void cache_free_one(kmem_cache_t *cachep, void *objp) +static inline void __free_block (kmem_cache_t* cachep, void** objpp, int len) { - slab_t* slabp; - - CHECK_PAGE(objp); - /* reduces memory footprint - * - if (OPTIMIZE(cachep)) - slabp = (void*)((unsigned long)objp&(~(PAGE_SIZE-1))); - else - */ - slabp = GET_PAGE_SLAB(virt_to_page(objp)); + check_irq_off(); + spin_lock(&cachep->spinlock); + /* NUMA: move add into loop */ -#if DEBUG - if (cachep->flags & SLAB_DEBUG_INITIAL) - /* Need to call the slab's constructor so the - * caller can perform a verify of its state (debugging). - * Called without the cache-lock held. - */ - cachep->ctor(objp, cachep, SLAB_CTOR_CONSTRUCTOR|SLAB_CTOR_VERIFY); + for ( ; len > 0; len--, objpp++) { + slab_t* slabp; + void *objp = *objpp; - if (cachep->flags & SLAB_RED_ZONE) { - objp -= BYTES_PER_WORD; - if (xchg((unsigned long *)objp, RED_MAGIC1) != RED_MAGIC2) - /* Either write before start, or a double free. */ - BUG(); - if (xchg((unsigned long *)(objp+cachep->objsize - - BYTES_PER_WORD), RED_MAGIC1) != RED_MAGIC2) - /* Either write past end, or a double free. */ - BUG(); - } - if (cachep->flags & SLAB_POISON) - poison_obj(cachep, objp); - if (extra_free_checks(cachep, slabp, objp)) - return; -#endif - { - unsigned int objnr = (objp-slabp->s_mem)/cachep->objsize; + slabp = GET_PAGE_SLAB(virt_to_page(objp)); + list_del(&slabp->list); + { + unsigned int objnr = (objp-slabp->s_mem)/cachep->objsize; - slab_bufctl(slabp)[objnr] = slabp->free; - slabp->free = objnr; - } - STATS_DEC_ACTIVE(cachep); + slab_bufctl(slabp)[objnr] = slabp->free; + slabp->free = objnr; + } + STATS_DEC_ACTIVE(cachep); - /* fixup slab chains */ - { - int inuse = slabp->inuse; + /* fixup slab chains */ if (unlikely(!--slabp->inuse)) { - /* Was partial or full, now empty. */ - list_del(&slabp->list); - /* We only buffer a single page */ - if (list_empty(&cachep->slabs_free)) - list_add(&slabp->list, &cachep->slabs_free); - else + if (list_empty(&list3_data_ptr(cachep, objp)->slabs_free)) { slab_destroy(cachep, slabp); - } else if (unlikely(inuse == cachep->num)) { - /* Was full. */ - list_del(&slabp->list); - list_add_tail(&slabp->list, &cachep->slabs_partial); + } else { + list_add(&slabp->list, + &list3_data_ptr(cachep, objp)->slabs_free); + } + } else { + /* Unconditionally move a slab to the end of the + * partial list on free - maximum time for the + * other objects to be freed, too. + */ + list_add_tail(&slabp->list, &list3_data_ptr(cachep, objp)->slabs_partial); } } + spin_unlock(&cachep->spinlock); } -static inline void __free_block (kmem_cache_t* cachep, - void** objpp, int len) +static void free_block(kmem_cache_t* cachep, void** objpp, int len) { - for ( ; len > 0; len--, objpp++) - cache_free_one(cachep, *objpp); + __free_block(cachep, objpp, len); } -static void free_block (kmem_cache_t* cachep, void** objpp, int len) +static void cache_flusharray (kmem_cache_t* cachep, cpucache_t *cc) { - spin_lock(&cachep->spinlock); - __free_block(cachep, objpp, len); - spin_unlock(&cachep->spinlock); + int batchcount; + + batchcount = cc->batchcount; +#if DEBUG + BUG_ON(!batchcount || batchcount > cc->avail); +#endif + check_irq_off(); + __free_block(cachep, &cc_entry(cc)[0], batchcount); + + cc->avail -= batchcount; + memmove(&cc_entry(cc)[0], &cc_entry(cc)[batchcount], + sizeof(void*)*cc->avail); } /* * __cache_free - * called with disabled ints + * Release an obj back to its cache. If the obj has a constructed + * state, it must be in this state _before_ it is released. + * + * Called with disabled ints. */ static inline void __cache_free (kmem_cache_t *cachep, void* objp) { - cpucache_t *cc = cc_data(cachep); + cpucache_t *cc = cc_data_ptr(cachep, objp); - CHECK_PAGE(objp); - if (cc) { - int batchcount; - if (cc->avail < cc->limit) { - STATS_INC_FREEHIT(cachep); - cc_entry(cc)[cc->avail++] = objp; - return; - } - STATS_INC_FREEMISS(cachep); - batchcount = cachep->batchcount; - cc->avail -= batchcount; - free_block(cachep, &cc_entry(cc)[cc->avail], batchcount); + check_irq_off(); + objp = cache_free_debugcheck(cachep, objp); + + if (likely(cc->avail < cc->limit)) { + STATS_INC_FREEHIT(cachep); cc_entry(cc)[cc->avail++] = objp; return; } else { - free_block(cachep, &objp, 1); + STATS_INC_FREEMISS(cachep); + cache_flusharray(cachep, cc); + cc_entry(cc)[cc->avail++] = objp; } } @@ -1555,6 +1709,13 @@ for (; csizep->cs_size; csizep++) { if (size > csizep->cs_size) continue; +#if DEBUG + /* This happens if someone tries to call + * kmem_cache_create(), or kmalloc(), before + * the generic caches are initialized. + */ + BUG_ON(csizep->cs_cachep == NULL); +#endif return __cache_alloc(flags & GFP_DMA ? csizep->cs_dmacachep : csizep->cs_cachep, flags); } @@ -1572,11 +1733,6 @@ void kmem_cache_free (kmem_cache_t *cachep, void *objp) { unsigned long flags; -#if DEBUG - CHECK_PAGE(objp); - if (cachep != GET_PAGE_CACHE(virt_to_page(objp))) - BUG(); -#endif local_irq_save(flags); __cache_free(cachep, objp); @@ -1598,7 +1754,7 @@ if (!objp) return; local_irq_save(flags); - CHECK_PAGE(objp); + kfree_debugcheck(objp); c = GET_PAGE_CACHE(virt_to_page(objp)); __cache_free(c, (void*)objp); local_irq_restore(flags); @@ -1629,47 +1785,54 @@ return (gfpflags & GFP_DMA) ? csizep->cs_dmacachep : csizep->cs_cachep; } -/* called with cache_chain_sem acquired. */ -static int tune_cpucache (kmem_cache_t* cachep, int limit, int batchcount) +struct ccupdate_struct { + kmem_cache_t *cachep; + cpucache_t *new[NR_CPUS]; +}; + +static void do_ccupdate_local(void *info) { - ccupdate_struct_t new; - int i; + struct ccupdate_struct *new = (struct ccupdate_struct *)info; + cpucache_t *old; - /* - * These are admin-provided, so we are more graceful. - */ - if (limit < 0) - return -EINVAL; - if (batchcount < 0) - return -EINVAL; - if (batchcount > limit) - return -EINVAL; - if (limit != 0 && !batchcount) - return -EINVAL; + check_irq_off(); + old = cc_data(new->cachep); + + cc_data(new->cachep) = new->new[smp_processor_id()]; + new->new[smp_processor_id()] = old; +} + + +static int do_tune_cpucache (kmem_cache_t* cachep, int limit, int batchcount) +{ + struct ccupdate_struct new; + int i; memset(&new.new,0,sizeof(new.new)); - if (limit) { - for (i = 0; i < NR_CPUS; i++) { - cpucache_t* ccnew; - - ccnew = kmalloc(sizeof(void*)*limit+ - sizeof(cpucache_t), GFP_KERNEL); - if (!ccnew) { - for (i--; i >= 0; i--) kfree(new.new[i]); - return -ENOMEM; - } - ccnew->limit = limit; - ccnew->avail = 0; - new.new[i] = ccnew; + for (i = 0; i < NR_CPUS; i++) { + cpucache_t* ccnew; + + ccnew = kmalloc(sizeof(void*)*limit+ + sizeof(cpucache_t), GFP_KERNEL); + if (!ccnew) { + for (i--; i >= 0; i--) kfree(new.new[i]); + return -ENOMEM; } + ccnew->avail = 0; + ccnew->limit = limit; + ccnew->batchcount = batchcount; + new.new[i] = ccnew; } new.cachep = cachep; + + smp_call_function_all_cpus(do_ccupdate_local, (void *)&new); + + check_irq_on(); spin_lock_irq(&cachep->spinlock); cachep->batchcount = batchcount; + cachep->limit = limit; spin_unlock_irq(&cachep->spinlock); - smp_call_function_all_cpus(do_ccupdate_local, (void *)&new); - for (i = 0; i < NR_CPUS; i++) { cpucache_t* ccold = new.new[i]; if (!ccold) @@ -1682,48 +1845,25 @@ return 0; } -/* - * If slab debugging is enabled, don't batch slabs - * on the per-cpu lists by defaults. - */ + static void enable_cpucache (kmem_cache_t *cachep) { -#ifndef CONFIG_DEBUG_SLAB int err; int limit; - /* FIXME: optimize */ if (cachep->objsize > PAGE_SIZE) - return; - if (cachep->objsize > 1024) - limit = 60; + limit = 8; + else if (cachep->objsize > 1024) + limit = 54; else if (cachep->objsize > 256) - limit = 124; + limit = 120; else - limit = 252; + limit = 248; - err = tune_cpucache(cachep, limit, limit/2); + err = do_tune_cpucache(cachep, limit, limit/2); if (err) printk(KERN_ERR "enable_cpucache failed for %s, error %d.\n", cachep->name, -err); -#endif -} - -static void enable_all_cpucaches (void) -{ - struct list_head* p; - - down(&cache_chain_sem); - - p = &cache_cache.next; - do { - kmem_cache_t* cachep = list_entry(p, kmem_cache_t, next); - - enable_cpucache(cachep); - p = cachep->next.next; - } while (p != &cache_cache.next); - - up(&cache_chain_sem); } /** @@ -1762,12 +1902,6 @@ if (searchp->flags & SLAB_NO_REAP) goto next; spin_lock_irq(&searchp->spinlock); - if (searchp->growing) - goto next_unlock; - if (searchp->dflags & DFLGS_GROWN) { - searchp->dflags &= ~DFLGS_GROWN; - goto next_unlock; - } { cpucache_t *cc = cc_data(searchp); if (cc && cc->avail) { @@ -1777,8 +1911,8 @@ } full_free = 0; - p = searchp->slabs_free.next; - while (p != &searchp->slabs_free) { + p = searchp->lists.slabs_free.next; + while (p != &searchp->lists.slabs_free) { slabp = list_entry(p, slab_t, list); #if DEBUG if (slabp->inuse) @@ -1808,7 +1942,6 @@ goto perfect; } } -next_unlock: spin_unlock_irq(&searchp->spinlock); next: searchp = list_entry(searchp->next.next,kmem_cache_t,next); @@ -1827,10 +1960,8 @@ for (scan = 0; scan < best_len; scan++) { struct list_head *p; - if (best_cachep->growing) - break; - p = best_cachep->slabs_free.prev; - if (p == &best_cachep->slabs_free) + p = best_cachep->lists.slabs_free.prev; + if (p == &best_cachep->lists.slabs_free) break; slabp = list_entry(p,slab_t,list); #if DEBUG @@ -1913,23 +2044,24 @@ return 0; } + check_irq_on(); spin_lock_irq(&cachep->spinlock); active_objs = 0; num_slabs = 0; - list_for_each(q,&cachep->slabs_full) { + list_for_each(q,&cachep->lists.slabs_full) { slabp = list_entry(q, slab_t, list); if (slabp->inuse != cachep->num) BUG(); active_objs += cachep->num; active_slabs++; } - list_for_each(q,&cachep->slabs_partial) { + list_for_each(q,&cachep->lists.slabs_partial) { slabp = list_entry(q, slab_t, list); BUG_ON(slabp->inuse == cachep->num || !slabp->inuse); active_objs += slabp->inuse; active_slabs++; } - list_for_each(q,&cachep->slabs_free) { + list_for_each(q,&cachep->lists.slabs_free) { slabp = list_entry(q, slab_t, list); if (slabp->inuse) BUG(); @@ -2050,7 +2182,13 @@ kmem_cache_t *cachep = list_entry(p, kmem_cache_t, next); if (!strcmp(cachep->name, kbuf)) { - res = tune_cpucache(cachep, limit, batchcount); + if (limit < 1 || + batchcount < 1 || + batchcount > limit) { + res = -EINVAL; + } else { + res = do_tune_cpucache(cachep, limit, batchcount); + } break; } }
--------------020505040104080307070503--
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/