Re: [rfc] Near-constant time directory index for Ext2

Daniel Phillips (phillips@innominate.de)
Thu, 22 Feb 2001 00:42:42 +0100


"H. Peter Anvin" wrote:
>
> Martin Mares wrote:
> >
> > > True. Note too, though, that on a filesystem (which we are, after all,
> > > talking about), if you assume a large linear space you have to create a
> > > file, which means you need to multiply the cost of all random-access
> > > operations with O(log n).
> >
> > One could avoid this, but it would mean designing the whole filesystem in a
> > completely different way -- merge all directories to a single gigantic
> > hash table and use (directory ID,file name) as a key, but we were originally
> > talking about extending ext2, so such massive changes are out of question
> > and your log n access argument is right.
>
> It would still be tricky since you have to have actual files in the
> filesystem as well.

Have you looked at the structure and algorithms I'm using? I would not
call this a hash table, nor is it a btree. It's a 'hash-keyed
uniform-depth tree'. It never needs to be rehashed (though it might be
worthwhile compacting it at some point). It also never needs to be
rebalanced - it's only two levels deep for up to 50 million files.

This thing deserves a name of its own. I call it an 'htree'. The
performance should speak for itself - 150 usec/create across 90,000
files and still a few optmizations to go.

Random access runs at similar speeds too, it's not just taking advantage
of a long sequence of insertions into the same directory.

BTW, the discussion in this thread has been very interesting, it just
isn't entirely relevant to my patch :-)

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/