> I do not recall anything about data=ordered or data=journal mode being
> required. I thought someone authoritative (Stephen Tweedie?) said
> that ext3 happens to commit the journal on fsync(), independent of the
> journaling mode, but that this behavior was an implementation
> coincidence and not guaranteed. (Unfortunately, I am having trouble
> finding that message... Can someone familiar with the source confirm
> or deny this?)
I know about the "happens to...", but I think after that discussion,
they'd keep it that way.
The data= mode was not part of the past discussion, that's why I brought
this up now. However, reiserfs or ext3fs with data=writeback only
journal the fsync() metadata involved, not the order of data (file
contents) versus directory contents, so you can end up with a "crash -
journal replay - file with bogus contents" scenario. I've seen this
happen on ReiserFS and I was not too fond of it, particularly not as I
don't have "fast-access" backups, I need to read a full file from SLR
tape up to the point where the desired file is stored.
> I would love to know what IS guaranteed. This fsync() question keeps
> cropping up, and as far as I know there is no authoritative statement
> anywhere about what Linux promises. "Read the source code" is the
Indeed not, and a "file system codex" to document these guarantees in
respect to path names, with link, rename, directory updates should be
documented authoritatively and should be valid for one kernel revision
until the next version (i. e. if documented 2.4.18+, it must not change
before 2.5.x).
> > That aside, it would really useful to get this "hog a writer" issue
> > ironed out either way, and that the illogical "fsync() a O_RDONLY"
> > file be resolved somehow.
>
> It is a non-issue; no resolution is necessary. If I can even read or
> write a single file on the same DISK (or bus) that some server process
> uses, I can "hog its resources" and slow it down. Horrors! Is there
> any solution??? Oh yeah, don't let me do that.
[IRONY DETECTED]
Seriously: imagine another process that opens the file your process
is writing into, but it itself has no write permission -- and busy loops
on fsync(). Should this fsync process really trigger flushing your
blocks although it has no write permissions, this _is_ a problem unless
you have some decent tagged queueing in place.
fsync() as per open group base specs issue 6 is allowed to return EBADF,
EINTR, EINVAL, EIO. Returning EINVAL for fsync(fd) after fd =
open("blah", O_RDONLY) does not sound unreasonable. You have nothing to
write in O_RDONLY, use O_RDWR or O_WRONLY instead.
> The only interesting question here is what the relevant standards say.
> And if they allow fsync() at all on a read-only descriptor, then there
> is pretty clearly only one thing that can mean. If you have a problem
> with this behavior, then configure your precious servers to keep their
> data unreadable by untrusted parties.
Or moke fsync() a no-op, meaning "your process (group) has no data to
write", or return error... EINVAL.
> > Is fsync()ing directories any portable?
>
> No, but apparently it is what Linux supports. If this were documented
> clearly somewhere, maybe application authors could be convinced to
> support it.
I don't think so. They'd rather declare ReiserFS unsupported and go with
chattr +S. Seen that.
New implementations (Courier's maildrop) still rely on BSD FFS
"synchronous directory" semantics.
-- Matthias Andree - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/