obscure bug in vmalloc

duane voth (duanev@interactivesi.com)
Thu, 01 Feb 2001 17:04:53 -0600


I've been abusing vmalloc() recently and stumbled across a rather obscure
bug. The code is nearly identical for 2.2 and 2.4 so I suspect it exists
in both trees and has been there for quite some time.

Under a heavy memory load (when __get_free_page(GFP_KERNEL) is occasionally
returning NULL) vmalloc can sometimes return a virtual address that is not
valid in all process contexts. The resulting address will work in the
immediate context of the vmalloc() but can cause a fault if the address
is used after the scheduler has run (or within arbitrary interrupt contexts).

The problem occurs when vmalloc_area_pages() gets an -ENOMEM back from
alloc_area_pmd() and leaves the current process with a new page global
directory entry which is not replicated through all other process' pgds.
This isn't a problem for the majority of vmallocs (why, I don't quite
understand yet) but it is a problem when the area that was going to be
used spans *two* or more pgd entries. Should a second vmalloc occur in
the near term and happen to get the same page for pte that was returned
after the error during the first, and use the second pgd slot, the
if (pgd_val(olddir) != pgd_val(*dir)) optimisation in vmalloc_area_pages()
will happily leave all other process pgds empty.

Anyway, it's too hard to see via inspection so I've attached a patch that
can be used to watch for the problem. The following is a 2.2 patch (2.4
will follow in a few minutes). If you see "vmalloc error" you might want
to uncomment the look_for_dangling_pgds function to get a better feel for
the pgds that cause the probem.

A brute force fix is to remove the if (pgd_val(olddir) != pgd_val(*dir))
optimisation and always update all pgds for every vmalloc (and swallow
the system wide set_pgdir() for all processors!). This fix works but is
clearly non-optimal.

discussion?

duane

duanev@io.com
duanev@interactivesi.com

--- mm/vmalloc.c.orig Sat Jan 27 13:21:53 2001
+++ mm/vmalloc.c Tue Jan 30 19:00:18 2001
@@ -196,6 +196,53 @@
printk("Trying to vfree() nonexistent vm area (%p)\n", addr);
}

+
+int
+duanev_ensure_pgds_same(unsigned long addr, unsigned long size)
+{
+ struct task_struct * p;
+ unsigned long va;
+ int errs = 0;
+
+ for_each_task(p) {
+ for (va = addr; va < addr + size; va = (va + PGDIR_SIZE) & PGDIR_MASK) {
+ pgd_t kpgd = *pgd_offset_k(va);
+ if (pgd_val(*pgd_offset(p->mm, va)) != pgd_val(kpgd)) {
+ printk("vmalloc error: pgd %03lx is %08lx not %08lx in %s\n",
+ va >> PGDIR_SHIFT,
+ pgd_val(*pgd_offset(p->mm, va)),
+ pgd_val(kpgd),
+ p->comm);
+ errs++;
+ }
+ }
+ }
+
+ return errs;
+}
+
+void
+duanev_look_for_dangling_pgds(unsigned long addr, unsigned long size)
+{
+ struct task_struct * p;
+ unsigned long va;
+
+ /* can't use for_each_task: it skips the swapper! */
+ p = &init_task;
+ do {
+ if (pgd_val(*pgd_offset(p->mm, addr))) {
+ for (va = (addr + PGDIR_SIZE) & PGDIR_MASK; va < addr + size;
+ va = (va + PGDIR_SIZE) & PGDIR_MASK) {
+ if (pgd_val(*pgd_offset(p->mm, va)) == NULL)
+ printk("vfree: pgd %03lx is NULL but %03lx is not (in %s)\n",
+ va >> PGDIR_SHIFT,
+ addr >> PGDIR_SHIFT,
+ p->comm);
+ }
+ }
+ } while ((p = p->next_task) != &init_task);
+}
+
void * vmalloc(unsigned long size)
{
void * addr;
@@ -210,8 +257,11 @@
addr = area->addr;
if (vmalloc_area_pages(VMALLOC_VMADDR(addr), size)) {
vfree(addr);
+// duanev_look_for_dangling_pgds((unsigned long)addr, size);
return NULL;
}
+ if (duanev_ensure_pgds_same((unsigned long)addr, size))
+ printk(" size = 0x%08lx\n", size);
return addr;
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/