I think that it doesn't matter what the starting directory is for the
creation of directories - we always scan all groups and will always
pick the directory with the minimum number of blocks, as long as it
doesn't have too many inodes (i.e. more than average, like /dev ;-).
> Addendum to my scheme: leaf nodes should be created in their
> directories, not in the base directory. IOW, it's only directories
> that should use this trick.
That is the current algorithm anyways - create leaf nodes in the same
group as the parent directories.
Now, if there _was_ a way to influence the creation of directories in
the same group, then this would do what Andrew wants - put all of the
directories, inodes, blocks into the same group, for fast access.
Maybe if there was some hysteresis in group selection it would be
enough - pick the best directory, and then continue to create subdirs
(and files, implicitly) there until we pass some threshold(s). Some
thresholds are:
1) Block count too high. We don't want to be totally wimpy here (e.g.
bailing out when we hit the "average", but we (probably, maybe) also
don't want to untar a kernel in a single group, fill it up, and then
have to put all of the object files in another group. Maybe fill
until we hit (1 + average)/2, or (3 + average)/4?
2) Inode count/directory count too high. See #1.
3) Time (yucky, but reasonable if it is long enough, i.e. ten(s) of
seconds), so we reset the metrics, possibly in conjuction with...
4) process ID (yucky also, but possible, IF we want "tar" to put all
things in a single group), but not if the process is long-lived
(e.g. nfs, so we need (3) also)
Using (3) and (4) is ugly, because it is somewhat non-deterministic,
but 4 at least is repeatable in that a single process will continue to
write into a single group, while a multiple-writer scenario will spread
the files around (but some write to a log file might screw it up).
Also, if we have "state" in selecting a group to create files, we also
want to periodically pick a new group which is "best" to create new
directories/files in.
like #1 and/or #2. Andrew showed that this is an improvement for writing
a tarball, but with the "aging" tool he used, it didn't work as well
as the current algorithm. Unfortunately, the weights/parameters of the
Orlov code is hard to deduce from the code, so I don't know how it would
be tuned. Also, it is not certain that the aging tool is valid for our
needs.
So, the difference between the current allocator and the Orlov allocator
is that the current allocator always tries to find the "best" group for
each new directory, while the Orlov allocator will pick the parent unless
it is "bad" (i.e. too few inodes/blocks, too many directories). It may
be that "too few" and "too many" are not tuned correctly yet. Given that
tarballs and other archives are normally created with depth-first traversal,
we may not need to use the number of directories as a selection criterion
and clustering many small directories is not itself a bad thing. I'm still
not sure of the "debt" thing going on there...
Cheers, Andreas
-- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/