> A while ago I did some experimentation with simple bit-op based string
> hash functions. I.e., no multiplications / divides in the hash loop.
>
> The best I found was:
>
> int hash_fn (char * p)
> {
> int hash = 0;
> while (*p) {
> hash = hash + *p;
> // Rotate a 31 bit field 7 bits:
> hash = ((hash << 7) | (hash >> 24)) & 0x7fffffff;
> }
> return hash;
> }
Hmm. This one goes in the "catastrophic" category.
For actual names:
N=557398, m=51 sz=2048, min 82, max 4002, av 272.17, stddev 45122.99
For generated names:
N=557398, m=51 sz=2048, min 0, max 44800, av 272.17, stddev 10208445.83
A very non-uniform distribution.
> The rotate is equivalent to a multiplication by x**7 in Z_2[P=0],
> where P is the polynomial x**31 - 1 (over Z_2).
> Presumably the "best" P would be irreducible - but that would have more
> bits set in the polynomial, making reduction harder. A compromise is to
> choose P in the form x**N - 1 but with relatively few factors.
> X**31 - 1 is such a P.
It has seven irreducible factors. Hardly "almost irreducible".
Shifting the 7-bit ASCII characters over 7 bits makes sure that there
is very little interaction to start with. And the final AND that truncates
to the final size of the hash chain kills the high order bits.
No good.
Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/