Re: fadvise syscall?

Andrew Morton (akpm@zip.com.au)
Sun, 17 Mar 2002 23:55:01 -0800


Jeff Garzik wrote:
>
> * fadvise(2) usefulness extends past open(2). It may be useful to call
> it at various points during runtime.
>
> * I think putting hints in open(2) is the wrong direction to go. Hints
> have a potential to be very flexible. open(2) O_xxx bits are not to be
> squandered lightly, while I see a lot more value in being a little more
> loose and free with the bit assignment for an "fadvise mask" (just a
> list of hint bits). IMO it should be easier to introduce and retire
> hints, far easier than O_xxx flags.
>

Yup.

posix_fadvise() looks to be a fine interface:

int posix_fadvise(int fd, off_t offset, size_t len, int advice);

DESCRIPTION

The posix_fadvise() function shall advise the implementation on
the expected behavior of the application with respect to the data in
the file associated with the open file descriptor, fd, starting at offset
and continuing for len bytes. The specified range need not currently
exist in the file. If len is zero, all data following offset is specified.
The implementation may use this information to optimize handling
of the specified data. The posix_fadvise() function shall have no
effect on the semantics of other operations on the specified data,
although it may affect the performance of other operations.

The advice to be applied to the data is specified by the advice
parameter and may be one of the following values:

POSIX_FADV_NORMAL

Specifies that the application has no advice to give on its
behavior with respect to the specified data. It is the default
characteristic if no advice is given for an open file.
POSIX_FADV_SEQUENTIAL

Specifies that the application expects to access the specified
data sequentially from lower offsets to higher offsets.
POSIX_FADV_RANDOM

Specifies that the application expects to access the specified
data in a random order.
POSIX_FADV_WILLNEED

Specifies that the application expects to access the specified
data in the near future.
POSIX_FADV_DONTNEED

Specifies that the application expects that it will not access
the specified data in the near future.
POSIX_FADV_NOREUSE

Specifies that the application expects to access the specified
data once and then not reuse it thereafter.

We can usefully implement all of these. FADV_WILLNEED obsoletes
sys_readahead().

We'll need to cheat a bit on the offset/len thing for NORMAL and
SEQUENTIAL - just apply it to the whole file - we don't want to have to
attach an arbitrary number of silly range objects to each file for this.
(We already cheat a bit this way with msync).

Note that it applies to a file descriptor. If posix_fadvise(FADV_DONTNEED) is
called against a file descriptor, and someone else has an fd open
against the same file, that other user gets their foot shot off. That's
OK.

Given this, I don't see a persuasive need to implement a non-standard
interface. It takes an off_t, so posix_fadvise64() is also needed.

The presence of this interface doesn't imply that we don't need
good dropbehind heuristics for streaming reads and writes. We
do need those.

I wouldn't suggest that anyone rush out and implement this stuff for 2.5.
There's some decrudding needed in filemap.c first, and many of these
hints need to interact with the 2.6 VM. Whatever that will be.

A 2.4 implementation could be done any time. If anyone decides to
do this, please let me know...

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/