Re: Linux iSCSI Initiator, OpenSource (fwd) (Re: Gauntlet Set NOW!)

Andrew McGregor (andrew@indranet.co.nz)
Wed, 08 Jan 2003 00:24:24 +1300


No no, that's a 1% chance that one packet in the terabyte is broken.

But actually, it's not that hard to construct a peturbation to the packet
that will beat both the ethernet and TCP checksums (I gave an example that
beats TCP before). That kind of change is not likely for random bit
errors, but is quite likely to occur in just slightly marginal hardware.
Partial packet duplication or byte reordering on the highly ordered data
patterns you find in filesystem metadata could be really bad.

Like I say, debugging one crypto protocol I've seen this happen for real.
Twice in about 10000 packets, on an otherwise apparently perfectly fine
LAN. I suspect bad cabling, and changed it, but it's hard to tell that
anything has changed. That shows that my 1% is probably quite conservative
for that particular link.

Internet protocols (changing to IETF hat now) are supposed to work on the
global internet, and that means iSCSI has to be engineered to work on the
worst links imaginable, because sometime, somewhere, someone's data is
going to cross a really broken backup link that they have no way of knowing
has just come on. Possibly it's wireless, where packet corruption due to
undetected collisions happens quite frequently.

Andre routinely tests it with the IBM team in Israel, with his end in
California.

Andrew

--On Monday, January 06, 2003 22:20:46 -0600 Oliver Xymoron
<oxymoron@waste.org> wrote:

> On Tue, Jan 07, 2003 at 01:39:38PM +1300, Andrew McGregor wrote:
>> Hmm. The problem here is that there is a nontrivial probability that a
>> packet can pass both ethernet and TCP checksums and still not be right,
>> given the gigantic volumes of data that iSCSI is intended to be used
>> with. Back up a 100 terabyte array and it's more than 1%, back of the
>> envelope.
>
> What was the underlying error rate and distribution you assumed? I
> figure if it were high enough to get to your 1%, you'd have such high
> retry rates (and resulting throughput loss) that the operator would
> notice his LAN was broken weeks before said transfer completed.
>
> --
> "Love the dolphins," she advised him. "Write by W.A.S.T.E.."
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/