Ted is correct on this one - if we hit this bug because both the source
and target names were in the same block before the split, then after the
split we will still need to dirty only 2 blocks because of the rename
(the split blocks are accounted for in EXT3_INDEX_EXTRA_TRANS_BLOCKS).
> > Simply retrying the ext3_delete_entry() isn't sufficient, since
> > another ext3_add_entry() could move the directory entry again while
> > you're reading in the blocks as part of ext3_find_entry(). OK, that
> > would be pretty rare, since enough other directory adds would have
> > to fill up enough that another split could happen, but *is* possible.
> > (Surely our scheduler isn't that unfair....)
>
> I think we have the lock on ext3_rename. I might be wrong.
> If other process can change the dir between the add new entry
> and delete old entry. We should also need to check the entry have
> been delete from other process in case fall into dead loop?
Chris is right on this one. Like Al said, the VFS holds i_sem on
"both" directories (or the single directory if src_dir & tgt_dir are
the same). We don't need additional locking within ext3...(yet)
<aside>
For _real_ scaling of large directories, it would be good to allow
locking just individual leaf blocks of the directories instead of the
entire directory. Since the source and target directory leaf blocks
are the only possible locations for those filenames, we do not have
any races w.r.t. other threads adding/deleting files of the same name
if we lock those directory blocks.
We will probably need to do this in the next year or so for Lustre where
we have a requirement for millions of files being created/renamed/deleted
in the same directory by thousands of clients, and tens of metadata
servers load-balancing those requests.
</aside>
Cheers, Andreas
-- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/