RE: iSCSI: remove recovery from transport-layer connection failure(?)

To: "Stephen Bailey" <steph@cs.uchicago.edu>, <ips@ece.cmu.edu>
Subject: RE: iSCSI: remove recovery from transport-layer connection failure(?)
From: "Douglas Otis" <dotis@sanlight.net>
Date: Wed, 27 Sep 2000 09:54:40 -0700
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;charset="iso-8859-1"
Importance: Normal
In-Reply-To: <10009270415.AA03051@candide.cs.uchicago.edu>
Sender: owner-ips@ece.cmu.edu

Steph,

Any IP-SCSI spec could be seen both as a connection to a controller and as
bridge to existing drives.  With 5+ milli-second network latency,
controllers will remain adjacent to the client as a means of protection
against this latency in much the same manner a controller protects from
drive latency.  As both modes of operation are legitimate, assumptions about
transport should be tempered by these possibilities.  SCTP does provide for
a more immediate recovery.  It would also be irresponsible to promote
modification to TCP to support features already found within SCTP.

Doug

> -----Original Message-----
> From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
> Stephen Bailey
> Sent: Tuesday, September 26, 2000 9:16 PM
> To: ips@ece.cmu.edu
> Subject: Re: iSCSI: remove recovery from transport-layer connection
> failure(?)
>
>
> > Currently, iSCSI is spec'ed to recover from transport-layer
> > connection failures.
> >
> > The main motivation for this decision was to support tape backup
> > applications that are quite sensitive to any failures that get
> > propogated to their layer.
> >
> > So, perhaps we can remove the requirement of recovering from
> > transport-layer connection failures in iSCSI. This would simplify
> > the protocol somewhat.
> >
> > Thoughts?
>
> I'm all for eliminating command recovery.
>
> There seem to be several reasons advanced for command recovery.
>
> The first seems to be based upon an inappropriate analogy to FCP.
> Command recovery had to be added to FCP-2 because the FC layer is
> unreliable.  A single dropped FC frame leads to a failed FCP command.
> This clearly upsets tape operation even when the link is performing
> nominally.  In FCP, without command recovery, with some observable
> frequency, you will get an expected error that leads to complete,
> irrecoverable failure of a transfer stream.  The other thing that
> makes FCP-2 command recovery work well is when you are doing a write,
> which is 90% (maybe it's 99%?) of tape operation, the target can
> return an early indication of most frame drops, rather than waiting
> for a timer to expire.
>
> TCP's reliability solves this problem in another way.  By the time you
> get a TCP connection failure, you have already exhausted a set of
> reliability mechanisms which guarantee, with high certainty, that
> further data can not be transferred between the two endpoints.
>
> `the two endpoints' phrase suggests the other reason advanced for
> command recovery.  That is, to permit path failover for commands which
> are not idempotent, such as tape write sequential.  The
> problem with this, is that it is not clear HOW iSCSI command recovery
> can actually work properly, given a TCP connection failure indication.
> It takes a long time for a TCP connection to fail, and by that time,
> I'm not sure recovery would reasonably be possible.  Perhaps I'm in
> error on this assumption.  Can a tape guru (Joe from Exabyte?) comment
> on whether recovery would be possible after many seconds (tens,
> hundreds) have elapsed?
>
> The SCSI layer has never been solely responsible for ensuring reliable
> backup.  Macro scale things go wrong with tape (run off the end, get
> eaten, etc..) with relatively high frequency.  A low level backup
> engine like tar or dump will fail on a SCSI error, and that's OK.
> There must also be a higher level software component like Amanda,
> which manages retries, including operator intervention, to ensure
> reliable backup.
>
> It seems like whether iSCSI has a command recovery mechanism should be
> a function of whether somebody can stand up and say for sure that it
> solves a real problem.  So far it only seems like it MIGHT solve a
> problem.  Who can say `this solves MY problem!'?
>
> Steph
>

References:
- Re: iSCSI: remove recovery from transport-layer connection failure(?)
  - From: Stephen Bailey <steph@cs.uchicago.edu>

Prev by Date: RE: Zero-copy TCP stacks (Was: Avoiding deadlock in iSCSI)
Next by Date: RE: iSCSI: Session Partial Resolution
Prev by thread: Re: iSCSI: remove recovery from transport-layer connection failure(?)
Next by thread: Re: iSCSI: remove recovery from transport-layer connection failure(?)
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:07:02 2001
6315 messages in chronological order