Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE

Andrew Morton (akpm@zip.com.au)
Sat, 11 May 2002 19:40:20 -0700


yodaiken@fsmlabs.com wrote:
>
> We did some i/o profiling about 6 years ago on a big scientific
> app that had started in fortran and had been rewritten in c++
> the fortran code used r/w on files and used temp files
> the c++ did memmaps and had big data structures - taking advantage of
> memory management.
> one thing I thought was interesting is that it was easy to see how a smart
> algorithm, not even such a smart one, could adapt i/o to the patterns of
> i/o in the fortran code, but the c++ i/o patters were really complex.
>
> when everything goes into the page cache, it seems like you will loose
> information.
>

That is certainly the case. If the application is seekily writing
to a file then we currently lay the file out on-disk in the order
in which the application seeked. So reading the file back
linearly is very slow.

Now this is not necessarily a bad thing - if the file was created
seekily then it will probably be _used_ seekily so no big
loss probably.

This problem is pretty unsolvable for filesystems which map blocks
to their disk address at write(2) time. It can be solved for
allocate-on-flush filesystems via a sort of the dirty page list,
or by maintaining ->dirty_pages in a tree or whatever.

There is one "file" where this problem really does matter - the
blockdev mapping "/dev/hda1". It is both highly fragmented and
poorly sorted on the dirty_pages list.

It's pretty trivial to perform a sillysort at writeout time:
if we just wrote page N and the next page isn't N+1 then do a
pagecache probe for "N+1". That's probably sufficient. If
not, there's a simple little sort routine over at
http://www.chiark.greenend.org.uk/~sgtatham/algorithms/listsort.html
which is appropriate to our lists.

I'll be taking a look at the sillysort option once I've cleared away
some other I/O scheduling glitches.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/