Re: 2.4.19pre2aa1

wli@holomorphy.com
Wed, 13 Mar 2002 22:10:22 +0000

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Martin Dalecki: "Re: 2.5.6: ide driver broken in PIO mode"
Previous message: Andreas Dilger: "Re: mke2fs (and mkreiserfs) core dumps"

On Wed, Mar 13, 2002 at 02:06:48PM -0500, Richard B. Johnson wrote:
> [SNIPPED..]

Nice move. No one will have the foggiest idea without hunting for
my prior message whether your comments on what I said are accurate.

On Wed, Mar 13, 2002 at 02:06:48PM -0500, Richard B. Johnson wrote:
> You might want to look at www.eece.unm.edu/faculty/heileman/hash/hash.html
> rather than assuming everyone is uninformed. Source-code is provided
> for several hashing functions as well as source-code for tests. This
> is a relatively old reference although it addresses both the chaos
> and fractal methods discussed here, plus chaotic probe strategies
> in address hashing.

You've presented a paper that attempts to establish a connection
between chaos theory and hashing by open addressing. I'm not convinced
chaos theory is all that, but some of their measurement techniques
appear useful, and I'll probably try using them, despite the fact Linux
appears to favor hashing by separate chaining.

This URL has not convinced me you are informed, and I don't really want
to be convinced whether you're informed or not. Your contributions to
these threads have been far less than enlightening or helpful thus far.

On Wed, Mar 13, 2002 at 02:06:48PM -0500, Richard B. Johnson wrote:
> A fast random number generator is essential to produce meaningful
> tests within a reasonable period of time. It is also used within one
> of the hashing functions to 'guess' at the direction of displacement
> when an insertion slot is not immediately located, as well as the
> displacement mechanism for several chaotic methods discussed.

I don't buy this, and the reason why is that hashing is obviously
sensitive to the input distribution. To defeat any hash function,
you need only produce a distribution concentrated on a set hashing
to the same number. If your distribution is literally uniform, then
you'll never do better (by one measure) than least residue mod hash
table size. You need to produce a realistic set of test keys. The key
distributions you actually want to hash will be neither of the above.
(Though one should verify it doesn't do poorly on uniform input, it's
little more than a sanity check.)

On Wed, Mar 13, 2002 at 02:06:48PM -0500, Richard B. Johnson wrote:
> Using your own hash-function as a template for tests of the same
> hash-function, as you propose, is unlikely to provide meaningful
> results.

I'm not proposing that. I have no idea why you think I am.

If I may summarize, if you're going to use random number generation
to simulate test inputs, verify the distribution of the test inputs
you generate actually resembles the thing you're trying to simulate.

It's far more productive to get samples of kernel virtual addresses
of the objects whose addresses you're hashing, or inode numbers from
real filesystems, or filenames, or whatever, than to just feed it a
stream of numbers spewed by some random number generator no one's
heard of, and that's not going to change in the least. For a Monte
Carlo simulation of its usage in the kernel, you're going to need to
actually figure out how to simulate those things' distributions.

Hmm, I wouldn't be suprised to get a bunch of "YHBT" messages after
this but the chaos theory paper does have some useful stuff for
analyzing hash table performance. Distributions' entropies and Lyapunov
exponents of hash functions should be easy enough to compute.

Bill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Martin Dalecki: "Re: 2.5.6: ide driver broken in PIO mode"
Previous message: Andreas Dilger: "Re: mke2fs (and mkreiserfs) core dumps"