Re: Entropy from disks

Oliver Xymoron (oxymoron@waste.org)
Tue, 29 Oct 2002 13:08:31 -0600


On Tue, Oct 29, 2002 at 01:02:39PM -0500, Chris Friesen wrote:
> Oliver Xymoron wrote:
>
> I'm not an expert in this field, so bear with me if I make any blunders
> obvious to one trained in information theory.
>
> >The current Linux PRNG is playing fast and loose here, adding entropy
> >based on the resolution of the TSC, while the physical turbulence
> >processes that actually produce entropy are happening at a scale of
> >seconds. On a GHz processor, if it takes 4 microseconds to return a
> >disk result from on-disk cache, /dev/random will get a 12-bit credit.
>
> In the paper the accuracy of measurement is 1ms. Current hardware has
> tsc precision of nanoseconds, or about 6 orders of magnitude more
> accuracy. Doesn't this mean that we can pump in many more bits into the
> algorithm and get out many more than the 100bits/min that the setup in
> the paper acheives?

No, it doesn't scale quite like that. The autocorrelation property
says such bits are poorly distributed - the turbulence has an inherent
timescale at which things become hard to predict. Over short
timescales they become easier to predict. We can posit that we can
increase the measurement accuracy to a degree where we begin to see
some sort of underlying noise caused by Brownian motion and physical
processes of that sort, but then you'd have to sit down and model it
like this paper's done rather than just handwaving and saying 12 LSBs
of the TSC are real noise.

But that's all irrelevant because the TSC resolution isn't really
improving our measurement accuracy anything like that much. Nothing
the processor can measure comes in faster than the FSB clock rate,
which is a fixed lock-step multiple of the processor clock, which is
generally a fixed lock-step multiple of a 33.3MHz crystal somewhere.
The disk's microcontrollers, if not taking a clock signal directly
from the motherboard, are likely in sympathetic synchrony with them,
and are even then likely divided down a bit. Framing requirements for
disk and bus protocols lower this further.

> >My entropy patches had each entropy source (not just disks) allocate
> >its own state object, and declare its timing "granularity".
>
> Is it that straightforward? In the paper they go through a number of
> stages designed to remove correlated data. From what I remember of the
> linux prng disk-related stuff it is not that sophisticated.

No, it's not that straightforward. And no, the current code doesn't
even address cache effects. Hell, it doesn't even know about
solid-state devices. Note that SHA will do just as good a job of
removing correlation as an FFT, the problem is overestimating the
input entropy.

I think the prudent approach here is to punt and declare disks
untrusted, especially in light of the headless server attack model
where we intentionally leak entropy. It's still possible to use this
data for /dev/urandom (which my patches allow).

> >There's a further problem with disk timing samples that make them less
> >than useful in typical headless server use (ie where it matters): the
> >server does its best to reveal disk latency to clients, easily
> >measurable within the auto-correlating domain of disk turbulence.
>
> If this is truly a concern, what about having a separate disk used for
> nothing but generating randomness?

Doable, and probably best done with a userspace app feeding /dev/random.

-- 
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/