snip
>
> Because
>
> L710: err = file.f_op->write(&file, buf, cnt, &file.f_pos);
>
> does not store dirty pages to storage in case of async, we need to
> call fsync() in case if variable stable is non 0.
>
>
> Whether we should call nfsd_sync() or not, should not depend on
> whether EX_WGATHER(exp) is true or not. So, calling nfsd_sync()
> should be outside
>
> if ( EX_WGATHER(exp) ) {...}
>
> statement.
>
You've missed an important fact.
A few lines earlier is the code:
if (stable && !EX_WGATHER(exp))
file.f_flags |= O_SYNC;
so if ! EX_WGATHER, the file has the O_SYNC flag set and all IO to
that file is synchronous (whether the fs was mounted synchronous or
not).
If EX_WGATHER, the flag isn't set, and explicitly call fsync after a
short delay.
>
> Also, it should not depend on whether (inode->i_state & I_DIRTY) is
> true or not, because (inode->i_state & I_DIRTY) is true only when
> inode is being changed ( i.e. when append write is being held ),
> while we need to sync written data to filesystem even when we only
> over wrote already existing data area.
Whenever you write to a file you change the inode - by changing the
modify time at least. Look at generic_file_write in mm/filemap.c.
Notice the code:
if (count) {
remove_suid(inode);
inode->i_ctime = inode->i_mtime = CURRENT_TIME;
mark_inode_dirty_sync(inode);
}
If a non-zero amount of data is being written, the inode is marked dirty.
If don't think the test actually serves the purpose as an fsync
doesn't cause the inode to be marked clean. I'm not sure when that happens.
>
> P.S. I don't really think we should wait for 10msec At point of
> Gathered writes. I don't think this will be of any help.
> It's because fsync() have locking of it's own.
The theory is that you might get 4 write requests inside 10msec.
Each of them enter data into the cache, but don't flush it to disc.
10ms after the first arrives, it causes a fsync on the file. This
writes out all the data for the 4 requests. The other requests
eventually finish their 10ms wait, and fsync finds that there is
nothing to write out and so complete quickly.
I didn't write this code and I don't know how well this code actually
works, but it doesn't seem *wrong* or harmful in any way.
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/