The rename race is fixed now. Yes, it was unfixable using *existing* RCU
techniques, but one has to invent new tricks when the old bag of
tricks is empty :)
Fundamentally what happens is that rename may be *two* updates - delete
from one hash chain and insert into another hash chain. In order for
lockfree traversal to work correctly, you must have a grace period after
each update. If we do a grace period between these two updates in a rename,
it slows down renames to unacceptable levels. So we had a problem there.
The solution lies in the dcache itself - it has a fast path (cached_lookup)
and a slow path (real_lookup). So all we had to do was to detect that a
rename had happened to the dentry while we looked it up lockfree. This
is done by a generation counter (d_move_count) in the dentry and is
protected by the per-dentry spinlock which we take during rename and
a successful cache lookup.
Two things can happen due to the rename race - lookup incorrectly succeeds
or lookup incorrectly fails. The success case is easily handled by
the lockfree lookup code that looks like this -
for the dentries in the hash chain {
... More stuff....
move_count = dentry->d_move_count;
if (dentry name matches) {
/* lookup succeeds */
spin_lock(&dentry->d_lock);
if (move_count != dentry->d_move_count) {
/*
* A rename happened while looking up lockfree and
* we now cannot gurantee
* that the lookup is correct
*/
spin_unlock(&dentry->d_lock);
return slow_lookup();
}
....
....
}
... More stuff....
}
If the lookup fails due to rename race, then there will anyway be a
slow real_lookup which is serialized with rename.
Maneesh did a lot of testing using many ramfs and many millions of renames
with millions of lookups going on at the same time and slow path was hit only
100 times or so. For practical workloads, this should have absolutely no
performance impact.
Thanks
Dipankar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/