> But the current hash function is just a place holder, waiting for
> an better version based on some solid theory. I currently favor the
> idea of using crc32 as the default hash function, but I welcome
> suggestions.
I once liked those things, too - but I've learned better since.
Quoting _Handbook_of_Algorithms_and_Data_Structures_ (Gonnet/Baeza-Yates,
ISBM 0-201-41607-7, Addison-Wesley):
--- snip ---
3.3.1 Practical hashing functions
[...]
A universal class of hashing functions is a class with the property that
given any input, the average performance of all the functions is good.
[...] For example, h(k) = (a * k + b) mod m with integers a != 0 and b is
a universal class of hash functions.
[...]
Keys which are strings or sequences of words (including those which are of
variable length) are best treated by considering them as a number base b.
Let the string s be composed of k characters s1s2...sk. Then
h(s) = ( sum(i=0..k-1) B^i*s(k-i) ) mod m
To obtain a more efficient version of this function we can compute
h(s) = ( ( sum(i=0..k-1) B^i*s(k-i) ) mod 2^w ) mod m
where w is the number of bits in a computer word, and the mod 2^w
operation is done by the hardware. For this function the value B = 131 is
recommended, as B^i has a maximum cycle mod 2^k for 8<=k<=64.
Hashing function for strings
int hashfunction(s)
char *s;
{ int i;
for(i=0; *s; s++) i = 131*i + *s;
return(i % m);
}
--- snip ---
I've actually used that function for a hash containing something like a
million phone numbers as keys, and there were *very* few collisions.
Similarly for another hash containgng megabytes of RFC 822 message-ids.
MfG Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/