Re: [rfc] Near-constant time directory index for Ext2

H. Peter Anvin (hpa@transmeta.com)
Wed, 21 Feb 2001 14:54:04 -0800

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Dr. Kelsey Hudson: "Re: Linux stifles innovation..."
Previous message: Michael W. Bogucki: "hda: irq timeout's??"
Maybe in reply to: Daniel Phillips: "[rfc] Near-constant time directory index for Ext2"
Next in thread: Linus Torvalds: "Re: [rfc] Near-constant time directory index for Ext2"

Martin Mares wrote:
>
> Hello!
>
> > You're right. However, for each hash table operation to be O(1) the size
> > of the hash table must be >> n.
>
> If we are talking about average case complexity (which is the only possibility
> with fixed hash function and arbitrary input keys), it suffices to have
> hash table size >= c*n for some constant c which gives O(1/c) cost of
> all operations.
>

True. Note too, though, that on a filesystem (which we are, after all,
talking about), if you assume a large linear space you have to create a
file, which means you need to multiply the cost of all random-access
operations with O(log n).

> > I suggested at one point to use B-trees with a hash value as the key.
> > B-trees are extremely efficient when used on a small constant-size key.
>
> Although from asymptotic complexity standpoint hashing is much better
> than B-trees, I'm not sure at all what will give the best performance for
> reasonable directory sizes. Maybe the B-trees are really the better
> alternative as they are updated dynamically and the costs of successive
> operations are similar as opposed to hashing which is occassionally very
> slow due to rehashing unless you try to rehash on-line, but I don't
> know any algorithm for on-line rehashing with both inserts and deletes
> which wouldn't be awfully complex and slow (speaking of multiplicative
> constants, of course -- it's still O(1) per operation, but "the big Oh
> is really big there").

Well, once you multiply with O(log n) for the file indirection (which
B-trees don't need, since they inherently handle blocking and thus can
use block pointers directly) then the asymptotic complexity is the same
as well, and I think the B-trees are the overall winner.

-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Next message: Dr. Kelsey Hudson: "Re: Linux stifles innovation..."
Previous message: Michael W. Bogucki: "hda: irq timeout's??"
Maybe in reply to: Daniel Phillips: "[rfc] Near-constant time directory index for Ext2"
Next in thread: Linus Torvalds: "Re: [rfc] Near-constant time directory index for Ext2"