I believe that we would need callbacks for DFSes because many of them
have different locking granularities. For example, a number of them allow
writes from different clients to different pages of the same file
fully in parallel, with no communication between the clients required.
Communication is required only when one client attempts to write to
a page most recently written to by another client.
The locking required to support this is more complex than would
be justified in most local-filesystem cases, I suspect. ;-)
> > This sounds good to me, though am checking with some DFS people.
>
> cool thanks!
Good news, they are OK with your approach for handling the truncation
race! They still believe they need the callback for handling fine-grained
locking required for invalidations.
> > > And w/o proper high level locking, the non distributed filesystems will
> > > corrupt the VM too with truncate against nopage. I already fixed this in
> > > my tree. (see the attachment) So I wonder if the fixes could be shared.
> > > I mean, you definitely need my fixes even when using the DFS on a
> > > isolated box, and if you don't need them while using the fs locally, it
> > > means we're duplicating effort somehow.
> >
> > True -- my patches simply provided hooks to allow DFSs and local
> > filesystems to fix the problem.
> >
> > So, the idea is for the DFS to hold a fr_write_lock on the
> > truncate_lock across the invalidate_mmap_range() call, thus
> > preventing the PTEs from any racing pagefaults from being
> > installed? This seems plausible at first glance, but need
> > to stare at it some more. This might permit the current
> > do_no_page(), do_anonymous_page(), and ->nopage APIs to
> > be used, but again, need to stare at it some more.
>
> btw, we can discuss this some more next month at OLS too, if we didn't
> clear all the issues first.
Sounds great!!! Of course, if we get it all agreed to before then,
we will just have to find other things to talk about at OLS. ;-)
Hopefully by that time, one of the DFSes will be released under GPL.
Don't ask. :-/
> > (If I am not too confused, fr_write_lock() became
> > write_seqlock() in the 2.5 tree...)
>
> exactly, I didn't rename it yet in 2.4 since it would provide no runtime
> benefit, but it is exactly the same thing ;).
Cool! Then there is no need to worry about fr_lock getting accepted
into 2.5. ;-)
> > > Since I don't see the users of the new hook, it's a bit hard to judje if
> > > the duplication is legitimate or not. So overall I'd agree with Andrew
> > > that to judje the patch it'd make sense to see (or know more) about the
> > > users of the hook too.
> >
> > A simple change that takes care of all the cases certainly does
> > seem better than a more complex change that only takes care of
> > distributed filesystems!
>
> agreed.
Unless you have other thoughts, I will take a stab at making
the patches safe for each other.
Thanx, Paul
> thanks,
>
> Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/