I did, it's the LCSG (Linux Cabal Standards Group) :-)
> In the mean time, I agree we have
> a duty to fix it, since the current implementation can hang forever
> without improving the securty of the data one bit, therefore sync(2)
> should return after all data generated before the sync has been written
> and not wait for all data written by all processes in the system to
> complete.
Yes, absolutely, that's a bug.
> BTW: I think users would expect the system call to work as the standard
> specifies, not some better way which would break on non-Linux systems. Of
> course now working programs which conform to the standard DO break on
> Linux.
No, it should work in the _best_ way, and if the standard got it wrong then
the standard has to change.
> > For dumb filesystems, this can degenerate to 'just try to write all the dirty
> > blocks', the traditional Linux interpretation, but for journalling filesystems
> > we can do the job properly.
>
> It doesn't matter, if you write the existing dirty buffers the filesystem
> type is irrelevant.
Incorrect. The modern crop of filesystems has the concept of consistency
points, and data written after a consistency point is irrelevant except to the
next consistency point. IOW, it's often ok to leave some buffers dirty on a
sync. But for a dumb filesystem you just have to guess at what's needed for
a consistency point, and the best guess is 'whatever's dirty at the time of
sync'.
For metadata-only journalling the issues get more subtle and we need a ruling
from the ext3 guys.
> And if you have cache in your controller and/or drives
> the data might be there and not on the disk.
We're working on that, see Jen's recent series of patches re barriers.
> If you have those IBM drives
> discussed a few months ago and a bad sector, the drive may drop the data.
> The point I'm making is that doing it really right is harder than it
> seems.
That's being worked on too, see Andre Hedrik's linuxdiskcert.org.
> Also, there are applications which don't like journals because they create
> and delete lots of little files, or update the file information
> frequently, resulting in a write to the journal. Sendmail, web servers,
> and usenet news do this in many cases. That's why the noatime option was
> added.
Sorry, I don't see the connection to sync.
> > > while fsync(2) schedules I/O and waits for completion.
> >
> > Yes, right.
> >
> > > So we need to implement system call checkpoint(2) ? schedule I/O,
> > > introduce an I/O barrier, then sleep until that I/O barrier and all I/O
> > > scheduled before it occurs.
> >
> > How about adding: sync --old-broken-way
>
> The problem is that the system call should work in a way which doesn't
> violate the standard.
Waiting until the data is on the platter doesn't violate SuS.
> I think waiting for all existing dirty buffers is
> conforming, waiting until hell freezes over isn't,
Where does it say that in SuS? I not arguing in favor of waiting longer
than necessary, mind you.
> nor does it have any
> benefit to the user, since the sync is either an end of the execution
> safety net or a checkpoint. In either case the user doesn't expect to have
> the program hang after *his/her* data is safe.
Have you asked any users about that?
-- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/