You don't strictly even need to LRU it - you could just keep a pte count
aroudn, and when it goes to zero you zap the pmd. You can use the normal
page_count() thing for it.
HOWEVER, I'm rather certain that this won't actually help in real life,
and it does add complexity.
The solution really is a "don't do it then" kind of thing. If you have
5000 processes that all want to map a big shared memory area, and you
don't want to upgrade your CPU's, it's a _whole_ lot easier to just have a
magic "map_large_page()" system call, and start using the 2MB page support
of the x86.
And no, this should NOT be a mmap.
It's a magic x86-only system call, for the express purpose of adding
something well-contained (but really ugly) for Oracle or other similar
users. I don't mind "really ugly" as long as it doesn't have any impact
on the rest of the system.
It should be less than a few hundred lines of code. Suggested starting
point appended.
Making the _generic_ code jump through hoops because some stupid special
case that nobody else is interested in is bad.
Linus
--- don't look too closely or you'll go blind! ---
static unsigned long get_magic_bigpage(int idx)
{
unsigned long bigpage;
if (idx >= MAXBIGPAGES)
return 0;
down(&bigpage_sem);
bigpage = bigpage_array[idx];
if (bigpage)
goto out;
bigpage = alloc_bigpage_from_magic_zone();
if (bigpage) {
bigpage_users[idx] = bigpage;
bigpage_users[idx]++;
}
out:
up(&bigpage_sem);
return bigpage;
}
asmlinkage unsigned long sys_map_ugly_big_page(
unsigned long address,
unsigned long size,
unsigned long idx)
{
/*
* Only root can do this, because the
* allocation will be non-pageable.
*/
if (!capable(CAP_ADMIN))
return -EPERM;
/*
* We require the user to give us the exact
* address, and it has to be PMD_SIZE-aligned
*/
if ((address|size) & (PMD_SIZE-1))
return -EINVAL;
if (size > TASK_SIZE || TASK_SIZE - size < address)
return -EINVAL;
if (!size)
return 0;
down_write(¤t->mm->mmap_sem);
vma = find_vma(address);
retval = -ENOMEM;
/* We won't unmap any existing pages */
if (vma && vma->start < address + size)
goto out;
vma = kmem_cache_alloc(&vma_slab, GFP_KERNEL);
if (!vma)
goto out;
vma->vm_flags = VM_MAGIC;
retval = 0;
do {
bigpage = get_magic_bigpage(idx);
if (!bigpage)
break;
set_pmd(pgd_offset(mm, address), pmd_bigpage(bigpoage));
idx++;
retval += PMD_SIZE;
address += PMD_SIZE;
size -= PMD_SIZE;
} while (size);
out:
up_write(¤t->mm->mmap_sem);
return retval;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/