unfortunately I was way too optimistic about this and I also misread
part of the code while writing the above. Looking more closely
expand_stack is a race condition in itself.
Nobody is allowed to change vm_pgoff or vm_start without holding _both_
the mm sem in _write_ mode _and_ the spinlock.
expand_stack holds the mm sem in _read_ mode and the spinlock so it is
totally broken.
All the readers thinks that only holding only the read semaphore is
enough to get coherent data but expand_stack is breaking this rule and
so all the readers can race.
To fix this problem we simply need to convert all the callers of
expand_stack to hold the write semaphore instead of the read semaphore
(this will have to be propagated to all architectures). I just checked
all the callers and they're all convertible without any real pain (we
just need to do a second lookup after upgrading the lock because we
don't have a primitive to upgrade the lock from "read" to "write"
atomically without having to release it for some time in the middle, but
expand_stack is a slow path so it's not a showstopper).
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/