RE: iSCSI: remove recovery from transport-layer connection failure(?)

To: "GUPTA,SOMESH \(HP-Cupertino,ex1\)" <somesh_gupta@am.exch.hp.com>, <julian_satran@il.ibm.com>, <ips@ece.cmu.edu>
Subject: RE: iSCSI: remove recovery from transport-layer connection failure(?)
From: "Douglas Otis" <dotis@sanlight.net>
Date: Mon, 2 Oct 2000 15:25:15 -0700
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;charset="iso-8859-1"
Importance: Normal
In-reply-to: <A5374D237E78D41195810090279CC91A539AF2@xcup04.cup.hp.com>
Sender: owner-ips@ece.cmu.edu

Somesh,

Are you referring to 802.1Q, 802.1D, or HSRP or just EtherChannel in
general?

Doug


> -----Original Message-----
> From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
> GUPTA,SOMESH (HP-Cupertino,ex1)
> Sent: Monday, October 02, 2000 1:48 PM
> To: julian_satran@il.ibm.com; ips@ece.cmu.edu
> Subject: RE: iSCSI: remove recovery from transport-layer connection
> failure(?)
>
>
> Julian,
>
> If the scenario you point out is correct (a single command lasting
> for such a long time), then of course we need a mechanism where
> we can restart the command from the approximate point of failure.
> However that would be failures lasting for "more than a fraction
> of a sec".
>
> First of all, a TCP connection does not indicate a failure that
> quickly. Secondly, there are ways to recover from a path failure
> and still preserve a TCP connection in High-Availability environments.
> I am sure most system vendors would be implementing such techniques.
>
> Somesh
>
> -----Original Message-----
> From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com]
> Sent: Sunday, October 01, 2000 1:52 AM
> To: ips@ece.cmu.edu
> Subject: Re: iSCSI: remove recovery from transport-layer connection
> failure(?)
>
>
>
>
> Steph,
>
> Assume than in the new wonderfull SAN world you have started a
> disk-to-tape
> (or disk-to-disk) long third party copy. The SAN is fine and the copy
> proceeds for an hour
> but the lousy initiator-to-copy-manager link (on which
> accidentally no data
> transfer took place) fails for a fraction of a second.
> Should we restart the command under-the-cover or drop it or ask
> the parties to provide state information to a specific SCSI
> restart driver?
>
> And we can build many similar scenarios.
>
> I think that whatever we can do simplify exception handling we should do
> (the same arguments that hold for multiple connections hold here too).
>
> I would add that in Ideal world - I would like to have transport
> "splice" a
> new TCP
> connection with an old TCP connection but failing this to happen (again
> SCTP is doing it already or not?) we should take care that simple events
> like a cable taken-out
> in some obscure part of the network will only seldom affect higher layers.
>
> Julo
>
> Stephen Bailey <steph@cs.uchicago.edu> on 27/09/2000 07:16:12
>
> Please respond to Stephen Bailey <steph@cs.uchicago.edu>
>
> To:   ips@ece.cmu.edu
> cc:    (bcc: Julian Satran/Haifa/IBM)
> Subject:  Re: iSCSI: remove recovery from transport-layer connection
>       failure(?)
>
>
>
>
> > Currently, iSCSI is spec'ed to recover from transport-layer
> > connection failures.
> >
> > The main motivation for this decision was to support tape backup
> > applications that are quite sensitive to any failures that get
> > propogated to their layer.
> >
> > So, perhaps we can remove the requirement of recovering from
> > transport-layer connection failures in iSCSI. This would simplify
> > the protocol somewhat.
> >
> > Thoughts?
>
> I'm all for eliminating command recovery.
>
> There seem to be several reasons advanced for command recovery.
>
> The first seems to be based upon an inappropriate analogy to FCP.
> Command recovery had to be added to FCP-2 because the FC layer is
> unreliable.  A single dropped FC frame leads to a failed FCP command.
> This clearly upsets tape operation even when the link is performing
> nominally.  In FCP, without command recovery, with some observable
> frequency, you will get an expected error that leads to complete,
> irrecoverable failure of a transfer stream.  The other thing that
> makes FCP-2 command recovery work well is when you are doing a write,
> which is 90% (maybe it's 99%?) of tape operation, the target can
> return an early indication of most frame drops, rather than waiting
> for a timer to expire.
>
> TCP's reliability solves this problem in another way.  By the time you
> get a TCP connection failure, you have already exhausted a set of
> reliability mechanisms which guarantee, with high certainty, that
> further data can not be transferred between the two endpoints.
>
> `the two endpoints' phrase suggests the other reason advanced for
> command recovery.  That is, to permit path failover for commands which
> are not idempotent, such as tape write sequential.  The
> problem with this, is that it is not clear HOW iSCSI command recovery
> can actually work properly, given a TCP connection failure indication.
> It takes a long time for a TCP connection to fail, and by that time,
> I'm not sure recovery would reasonably be possible.  Perhaps I'm in
> error on this assumption.  Can a tape guru (Joe from Exabyte?) comment
> on whether recovery would be possible after many seconds (tens,
> hundreds) have elapsed?
>
> The SCSI layer has never been solely responsible for ensuring reliable
> backup.  Macro scale things go wrong with tape (run off the end, get
> eaten, etc..) with relatively high frequency.  A low level backup
> engine like tar or dump will fail on a SCSI error, and that's OK.
> There must also be a higher level software component like Amanda,
> which manages retries, including operator intervention, to ensure
> reliable backup.
>
> It seems like whether iSCSI has a command recovery mechanism should be
> a function of whether somebody can stand up and say for sure that it
> solves a real problem.  So far it only seems like it MIGHT solve a
> problem.  Who can say `this solves MY problem!'?
>
> Steph
>

References:
- RE: iSCSI: remove recovery from transport-layer connection failure(?)
  - From: "GUPTA,SOMESH (HP-Cupertino,ex1)" <somesh_gupta@am.exch.hp.com>

Prev by Date: RE: SCSI URL scheme [WAS: Re: iSCSI: 2.2.6. Naming & mapping]
Next by Date: Re: iSCSI URL scheme
Prev by thread: RE: iSCSI: remove recovery from transport-layer connection failure(?)
Next by thread: RE: iSCSI: remove recovery from transport-layer connection failure(?)
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:06:54 2001
6315 messages in chronological order