> On Linux, flushing a rename() means calling fsync() on the directory
> instead of the file. That's it. Doing that instead of fsync'ing the
> file adds at most two system calls (to open and close the directory),
> and those can be amortized over many operations on that directory
> (think "mail spool"). So the system call overhead is non-existent.
Indeed, but I can also leave the file descriptor open on any file system
on any system except SOME of Linux'. (Ok, this precludes systems that
don't offer POSIX synchronous completion semantics, but these systems
don't nessarily have fsync() either).
> ordering required for reliable operation; no more, no less. Relying
> on mount options, "chattr +S", or journaling artifacts for your
> ordering is the inefficient approach; since they impose extra
> ordering, they can never be faster and will usually be slower.
It is sometimes the only way, if the application is unaware. I hope I'm
not loosening a flame war if I mention qmail now, which is not even
softupdates aware. Without chattr +S or mount -o sync, nothing is to be
gained. OTOH, where mount -o sync only makes directory updates
synchronous, it's not too expensive, which is why the +D approach is
still useful there.
> > It's only necessary for ext2. Modern Linux filesystems (such as ext3
> > or reiserfs) don't require it.
>
> Only because they take the performance hit of flushing the whole log
> to disk on every fsync(). Combine that with "data=ordered" and see
> what happens to your performance. (Perhaps "data=ordered" should be
> called "fsync=sync".) I would rather get back the performance and
> convince application authors to understand what they are doing.
1. data=ordered is more than fsync=sync. It guarantees that data blocks
are flushed before flushing the meta data blocks that reference the data
blocks. Try this on ext2fs and lose.
2. sync() is unreliable, it can return control to the caller earlier
than what is sound. It can "complete" at any time it desires without
having completed.
(Probably so it can ever return as new blocks are written by another
process, but at least SUS v2 did not detail on this).
3. Application authors do not desire fsync=sync semantics, but they want
to rely on "fsync(fd) also syncs recent renames". It comes as a
now-guaranteed side effect of how ext3fs works, so I am told.
I'm not sure how the ext3fs journal works internally, but it'd fine with
all applications if only that part of a file system be synched that is
really relevant to the current fsync(fd). No more. It seems as though
fsync==sync is an artifact that ext2 also suffers from.
-- Matthias Andree - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/