XFS does something similar to this - the filesystem is split into
'allocation groups' these are independent chunks of upto 4 Gbytes
of disk space. By default a directory is placed in a different
allocation group than it's parent, file inodes are placed in the
same allocation group - preferably 'close' to the directory inode,
file data also prefers the same allocation group as the inode.
This all works OK when the filesystem is not bursting at the seams,
and when there is some correspondence between logical block numbers
in the filesystem and physical drives underneath. I believe that once
you get into complex filesystems using raid devices and smart caching
drives it gets very hard for the filesystem to predict anything
about latency of access for two blocks which it things are next
to each other in the filesystem.
>
> > All File Systems I've used have this problem. XFS is
> > just supposedly high performance. It offers some improvement, but is still
> > off by a factor of 4.
>
XFS on Irix can reach disk speeds, but for a limited set of workloads,
i.e. direct I/O into preallocated space.
Looking at your original benchmark:
hdparm -t /dev/hda
/dev/hda:
Timing buffered disk reads: 64 MB in 2.71 seconds = 23.62 MB/sec
time cp -r /usr/src/linux-2.4.6 tst
real 0m47.376s
user 0m0.180s
sys 0m2.710s
du -bs tst
144187392 tst
Actual Performance
2*144MB/48s=6MB/sec
There is a lot of difference between doing a raw read of data from
the device, and copying a large directory tree. The raw read is
purely sequential, the tree copy is basically random access since
the kernel is being asked to read and then write a lot of small
chunks of disk space.
Steve
--Steve Lord voice: +1-651-683-3511 Principal Engineer, Filesystem Software email: lord@sgi.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/