We just take the pagetable spinlock there, the race is all about the
pgoff.
In short you agree that the current locking is broken but you propose to
limit the usability of GROWSDOWN and GROWSUP solely to the anonymous
vmas instead of fixing the pgoff race with proper locking as I did.
My fix for the race doesn't drop the usability of GROWSDOWN that could
otherwise break userspace programs. I guess at least uml uses growsdown
vma file backed. Jeff?
Assuming it's acceptable to break GROWSDOWN on file backed vmas as you
suggested, the new locking rules with your approch would be:
1) pgoff is garbage as usual for any anon vma so don't even try to
mess with it in expand_stack
2) with only the read semaphore acquired the vm_start/vm_end of the
vmas can expand under us but the vma tree not
3) with only the read semaphore acquired to get a consistent
vm_start/vm_end of all vmas in the tree (like we need to in
expand_stack) also the page_table_lock of the mm must be
acquired
4) with the write lock vm_start/vm_end are consistent always as
well as tree so no change here
> > I considered implementing a read->write semaphore upgrade primitive but
> > it cannot be reliable
>
> There is no such thing. Never has been. It's a fundamentally impossible
> operation. We may, at some point, decide to have a "read_for_write()" and
Yes, of course with "it cannot be reliable" I meant it cannot work out
always, it would be an optimization for the common case:
spin_lock(&rwsem->lock)
if (only one reader)
gain write privilegies
err = success
else
err = faliure
spin_unlock(&rwsem->lock)
return err
it would still require the fail path in case of the faliure (multiple
readers potentially all trying to upgrade the lock) so I ignored the
optimization (expand_stack isn't a very fast path anyways).
> However, having a finer-granularity spinlock _inside_ the semaphore (see
> above suggestion) is a perfectly valid approach.
if you are 100% sure it's acceptable to break the kernel API for the
GROWSDOWN file backed vmas (which I don't think it's the case) I can
switch to your suggested fix (otherwise I prefer to keep upgrading the
semaphore in expand_stack rather to have to spinlocking at every nopage
private/shared page fault).
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/