Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"

To: Stephen Bailey <steph@cs.uchicago.edu>
Subject: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
From: Michael Krause <krause@cup.hp.com>
Date: Thu, 05 Apr 2001 10:09:53 -0700
Cc: ips@ece.cmu.edu
Content-Type: text/plain; charset="us-ascii"; format=flowed
In-Reply-To: <20010403131353.B6D6494006@sandmail.sandburst.com>
References: <Message from Robert Snively <rsnively@Brocade.COM><FFD40DB4943CD411876500508BAD02797D467D@sj5-ex2.brocade.com><FFD40DB4943CD411876500508BAD02797D467D@sj5-ex2.brocade.com>
Sender: owner-ips@ece.cmu.edu

At 09:12 AM 4/3/2001 -0400, Stephen Bailey wrote:
> > The Stone and Partridge paper is mostly not applicable to an iSCSI
> > environment.  The principal failure mechanisms were major software
> > bugs in the driver stack of PC-oriented machines.

People make mistakes in all implementations.  Examination of other similar 
packet processing technology for mistakes is applicable to any effort and 
one should perform a risk assessment as to the probability of the mistakes 
being repeated here.  The fact that the mistakes were in PC-oriented 
machines is basically irrelevant and storage is not immune from having 
similar mistakes (have seen storage implementations that were just as poor 
in terms of quality as any other segment of the industry).

>I'm in complete agreement with Bob.
>
>I haven't seen a good analysis of TCP checksum escapes which resulted
>from intermediary manipulation (I haven't read the papers, but
>hopefully soon), but my hunch is that it's incredibly rare.
>
>An endpoint precipiated TCP checksum `escape' also escape a CRC or any
>other similar integrity check.  That is why I think all this
>additional integrity checking (on iSCSI headers & data), is an
>incredible amount of extra work (not just in computing the CRCs, but
>also in designing the SACK mechanism and recovery for digest failures)
>for no real gain.

I agree that some of the recovery is overkill but disagree that error 
detection is as well.  At a minimum, one needs to have a strong end-to-end 
error detection mechanism.  Many believe a 16-bit checksum is not adequate 
to protect their data and given the importance of this data to our 
customers, most feel the specification must define such a mechanism (with 
some having strong feelings that this mechanism should NOT be 
optional).  Now whether we need to have 2 CRCs, etc. is a separate debate 
but they need to be there and most of us will require that they be used in 
any product / solution delivered to the customer.

>The real loss is that it's immensely slowing time-to-market for iSCSI 
>(both in the front end specification and the back end implementation).

A fast TTM solution that is not the highest quality (prevents silent data 
corruption) will lead to customer distrust and a repeat of the FC adoption 
rate - only 10 years later has it really started to penetrate customer 
solutions.

>A straw-man proposal (very unpopular given where we are, I know) would
>be to specify iSCSI without additional integrity checks (other than
>what you can get through security mechanisms, which is probably not
>visible to iSCSI anyway), and if that `fails' (I'm sure it won't), we
>can put an integrity shim between iSCSI and the transport.
>
>One example of how to do this would be Julian's TAF.  Another would be
>the WARP RDMA layer.

If another layer is put in place that provides data integrity, then it is 
redundant to do this at the iSCSI layer as well and this is one place where 
an option can be used, i.e. one negotiates the underlying framing mechanism 
(e.g. WARP) and if it is present, then iSCSI does not activate the CRC 
services.  If it is not, then it does thereby insuring that there is always 
end-to-end data integrity present in the solution.

>We don't have to specify how to do this now

If this is to be supported then it should be specified now (can be done 
rather opaquely by just setting a "transport services" attribute for strong 
end-to-end data integrity protection.

>, and the point is that
>it's hard to do so, because we really don't know what problem we're
>solving with it.  We're OK as long as we have a way to address it in
>the future without completely chucking what already exists.
>
>The other point to remember is that iSCSI still has to make the
>ID->Proposed->Draft->Internet traversal, and anybody that thinks it's
>going to do that on the first try is kidding themselves.  It's more
>important to get SOMETHING out there that exposes the implementation
>holes than to design a cathedral on paper.

Nothing is perfect the first time out but in the tightening economy and 
increasing customer quality demands from the get-go, the trade-off between 
quality / reliability and TTM is not something people should rush to 
make.  The market is not what it used to be where good enough was alright; 
customers expect more today and with good cause.

Mike

References:
- RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
  - From: Robert Snively <rsnively@Brocade.COM>
- Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
  - From: Stephen Bailey <steph@cs.uchicago.edu>

Prev by Date: FW: iSCSI: Out Of Sequence due to null sequence with multiple connections.
Next by Date: RE: iSCSI: Out Of Sequence due to null sequence with multiple connections.
Prev by thread: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
Next by thread: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:05:09 2001
6315 messages in chronological order