Re: SNACK and recovery

To: ips@ece.cmu.edu
Subject: Re: SNACK and recovery
From: Stephen Bailey <steph@cs.uchicago.edu>
Date: Mon, 09 Apr 2001 12:24:09 -0400
In-Reply-To: Message from Black_David@emc.com of "Thu, 05 Apr 2001 21:47:55 EDT." <0F31E5C394DAD311B60C00E029101A07080153CD@corpmx9.isus.emc.com>
References: <0F31E5C394DAD311B60C00E029101A07080153CD@corpmx9.isus.emc.com>
Sender: owner-ips@ece.cmu.edu

> - Does a 16-bit TCP checksum catch enough of
> the corruption events to make it acceptable to
> take drastic measures like aborting a backup
> when a 32 bit CRC fails on a response that
> made it through the 16 bit checksum?

Absolutely.

Events which create end-to-end integrity check errors are as handily
caught by TCP checksum as a CRC.  Link errors are caught by link
integrity checks, so that is not for the e2e check to protect.  The
remaining errors which are detectable by an e2e check have a signature
that most any check that's not blind stupid will detect.  For example,
back in the day, VMS's clustering software ran on Ethernet, and there
were many problems as a result of an early generation Ethernet
controller (my group...) corrupting data.  So, the VMS folks said, to
heck with performance, we're going to put a checksum on every cluster
packet.  Problem absolutely solved.  I don't know what the checksum
algorithm was, but it was not a CRC.  It was more like the TCP
checksum.

The TCP checksum escape evidence in the papers seems to be primarly in
paths which are not actually protected by it (host end points).

Looking at it from the other direction, backups have historically
always had to handle occasional problems, which has resulted in the
implementation of high-level recovery mechanisms.

Who can say with absolute certainly, and first-hand experience that
there WILL be a high frequency of checksum escapes which don't also
escape a CRC?  It seems a somewhat unlikely scenario, and my concern
is that we're making, complicated, incremental improvements for
handling a situation which will not occur.

It would be one thing if there were NO e2e check, or if the e2e check
also had to protect against link errors, or if the existing e2e check
were completely trivial, but that is just not the case here.

Steph

References:
- SNACK and recovery
  - From: Black_David@emc.com

Prev by Date: Re: iSCSI: frame formats
Next by Date: RE: iSCSI:flow control, acknowledgement, and a deterministic reco very
Prev by thread: SNACK and recovery
Next by thread: Re: SNACK and recovery
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:05:08 2001
6315 messages in chronological order