Re: fadvise syscall?

Anton Altaparmakov (aia21@cam.ac.uk)
Sun, 17 Mar 2002 13:41:37 +0000


At 09:10 17/03/02, Jeff Garzik wrote:
>Andrew Morton wrote:
>>Jeff Garzik wrote:
>>>So... we have madvise, why not fadvise? I would love the capability for
>>>applications to provide hints to the OS like madvise, but for file
>>>descriptors...
>>
>>The one hint which I can think of which would be beneficial would
>>be an equivalent to MADV_SEQUENTIAL. Something which says "this
>>is a big streaming read/write - don't go and evict other stuff because
>>of it". O_STREAMING perhaps. Or working dropbehind heuristics,
>>although I suspect that explicit controls will always do better.
>>
>>For MADV_RANDOM, readahead window scaling should get that right.
>>
>>What else were you thinking of?
>
>Hints for,
>* sequential read
>* sequential write
>* sequential write, where the application considers the data it's writing
>to be unlikely to be read again any time soon (hopefully implying to the
>page cache that these pages have low value as cacheable objects)
>* some sort of streaming hints, implying that the application cares a lot
>about maintaining some minimum i/o rate. note I said hint, not
>requirement. -not- guaranteed-rate-IO.
>
>I might even go so far as to advocate identifying common usage patterns,
>and creating hint constants for them, even if we don't support them in the
>kernel immediately (if ever). Makes the interface much more future-proof,
>at the expense of a few integers in a 32-bit numberspace, and a few more
>bytes in the C compiler's symbol table.

We don't need fadvise IMHO. That is what open(2) is for. The streaming
request you are asking for is just a normal open(2). It will do read ahead
which is perfect for streaming (of data size << RAM size in its current form).

When you want large data streaming, i.e. you start getting worried about
memory pressure, then you want open(2) + O_DIRECT. No caching done. Perfect
for large data streams and we have that already. I agree that you may want
some form of asynchronous read ahead with passed pages being dropped from
the cache but that could be just a open(2) + O_SEQUENTIAL (doesn't exist yet).

All of what you are asking for exists in Windows and all the semantics are
implemented through a very powerful open(2) equivalent. I don't see why we
shouldn't do the same. It makes more sense to me than inventing yet another
system call...

The Windows NT/2k/XP CreateFile() call is documented at below URL. Search
for FILE_FLAG_* and there is a nice big table with all the possible access
method hints one can give when opening or creating a file. Many of those
make perfect sense to have in the Linux kernel, too and in fact with
O_DIRECT we already have some of the functionality Windows offers (there it
would be FILE_FLAG_NO_BUFFERING)

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/filesio_7wmd.asp

Best regards,

Anton

-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/