Re: ext3-2.4-0.9.4

Daniel Phillips (phillips@bonn-fries.net)
Fri, 27 Jul 2001 23:16:30 +0200


On Friday 27 July 2001 19:41, Lawrence Greenfield wrote:
> From: Alan Cox <alan@lxorguk.ukuu.org.uk>
> > Lawrence Greenfield wrote:
> > > You want to help performance? Give us an fsync() that works on
> > > multiple file descriptors at once, or an async fsync() call.
> > > Don't make us fight the OS on getting data to disk.
> >
> > And what pray does an asynchronous fsync do. It seems to be a nop
> to me.
>
> An async fsync allows me to issue multiple fsyncs and then wait for
> all of them to complete, hopefully in the same framework that I would
> do async I/O (but that's an argument for another day).

I'll say. While it's truly desirable, all known filesystems are *far*
from being able to do that. An efficient, reliable fsync would do the
trick for you, or even an efficient sync. And somewhere in Andrew
Morton's bag of tricks is something to fix you up too, read his
comments carefully.

Looking forward, a sanely defined filesystem transaction interface
from userland would give the best possible combination of performance
and reliability.[1] Since we now have four filesystems (five if you
count JFFS) that could implement such a transaction interface, now is
the time to figure out what it would look like. That would include
accomodating the needs of MTA developers. It would be Linux-specific
for sure. It would also be progress. If it turned out to be the
fastest way to run a mailer we'd see it migrate to other nixes soon
enough.

> Doing reliabile transactions on disk is a hard problem. That is
> why oracle and friends have spent many man years of research on this
> kind of problem.

Tell me about it ;-)

> Current unix mailers do the smoke mirrors and prayer
> bit to reduce the probability a little that is all, regardless of fs
> and os.
>
> Isn't the point of the operating system to try to make it as easy as
> possible to do these things correctly?

begin_transaction (filesystem_handle);
<send the mail>;
if (!end_transaction (filesystem_handle))
<confirm sent>;

Something like that.[2] Caveat: this is blue-sky stuff, it is not
going to solve your problem today. Andrew Morton and Hans Reiser are
working on solving the problem today by giving you at least one mode
where rename is synchronous, or at least giving you a fast fsync.

I'm with those who think that a little short-term pain is worth it if
the final result is superior.

> Otherwise you force anyone who wants to write a reliable application
> (be it e-mail or not) to go to Oracle and one wonders why fsync() is
> even implemented.

[1] Al Viro pointed out that such a transaction interface could open up
new possibilities for DOS attacks, something that has to be anticipated
in the design.

[2] I see Alan suggested essentially the same thing in another branch
of this thread. Then by the "million flies" theorum...

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/