The NFS client hasn't been interrupted, and it's filehandle will still
identifies the file object by device/inode.
> > Name to file object mapping is not part of the metadata associated with
> > a file. It is the contents of the directory and can only be modified by
> > directory operations, not operations on the file or filehandle.
>
> SUS doesn't just pronounce on the file metadata. Quoting from earlier
> in the thread:
>
> ----------
> > "synchronised I/O data integrity completion
> >
> > [...]
> >
> > * For write, when the operation has been completed or diagnosed if
> > unsuccessful. The write is complete only when the data specified in
> > the write request is successfully transferred and all file system
> > information required to retrieve the data is successfully transferred.
> ----------
So that would be the file data, and it's on-disk inode information,
indirect blocks etc. All the information that the file system needs to
retrieve the data is then available, i.e. what is required for iget()
to succeed.
Ok, iget isn't exported to userspace, but fsck will place the file in a
user reachable location.
> > I also don't see why a rename operation, which operates on the source
> > and destination parent directories would have to not only look up the
> > file object but also somehow register with all open filehandles for that
> > object that both olddir and newdir need to be written back to disk
> > during the fsync as well.
>
> They don't both have to, either one will be good enough. However,
> "neither" is not good enough, according to SUS.
Ehh, sync only olddir and you just lost any path leading to the file.
Sync only newdir and the file is reachable from two locations, but it's
linkcount is too low.
> > Using the dentry chain is not reliable, for instance instead of moving
> > dentries around Coda simply unhashes dentries when state on the server
> > changes.
>
> Could you be more specific about this, are you saying there are cases
> where there is no valid parent link from a dcache entry?
No the dcache entry could have a 'stale' fileobject associated with it
that has been superceded by a different object. This dentry is unhashed,
so that the next lookup will instantiate a new dentry which references
the new object. So syncing the stale object is useless, because it
doesn't really exist anymore, but the kernel (and actually the userspace
daemon on the client) doesn't know what the new object is until it is
accessed.
> > Working on a distributed filesystem with somewhat weaker than UNIX
> > semantics might have skewed my vision. In Coda not every client will be
> > able to figure out which are all of the possibly paths that can lead to
> > a file object. And although we currently try our best to block
> > hardlinked directories they could possibly exist, making the problems
> > even worse.
>
> We don't need all the paths, and not any specific path, just a path.
Even if that path leads to a name that got removed, thereby forcing the
object into lost+found? I thought the MTA did something like,
fd = open(tmp/file)
write(fd)
fsync(fd)
link(tmp/file, new/file)
fsync(fd) *1
unlink(tmp/file)
*1 If this fsync only syncs the path leading to tmp/file, and the unlink
tmp/file is written back to disk, which is likely because we're only
creating/syncing stuff in tmp. Now, until new/file is written there is
no path information leading to the file anymore which makes this as
'safe' as not syncing path name information at all.
Now if the application would use the directory sync, it can actually
tell the kernel that that new/file name is the interesting one to keep
and that tmp doesn't even need to be written to disk at all.
Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/