Re: Towards Consensus on TCP Connections

To: ips@ece.cmu.edu
Subject: Re: Towards Consensus on TCP Connections
From: Stephen Bailey <steph@cs.uchicago.edu>
Date: Thu, 10 Aug 2000 11:41:50 -0500
In-Reply-To: Message from csapuntz@cisco.com of "10 Aug 2000 13:58:14 PDT." <m3itt8q1eh.fsf@csapuntz-u1.cisco.com>
Sender: owner-ips@ece.cmu.edu

> On recovery, a big concern was the tape backup issue. Do SCSI
> applications recover gracefully today from failed SCSI connections?
> My understanding was that many tape backups program abort the backup.

Tape is hard.  The main reason is that when an error occurs on a
{READ, WRITE} SEQUENTIAL, you don't really know what state the tape is
in.  Maybe the tape has advanced by the length of the failed operation
Maybe not.  Maybe the tape has been eaten.  Maybe the tape has been
ejected.  It's really hard to do anything at ANY layer except go into
heavy duty recovery (rewind and try again).

There are two distinct applications of tape with different
requirements, backup and streaming data recording.  Backup is by far
the most common application.

Many backup applications don't attempt recovery because they assume
that correcting the problem will probably require operator
intervention.  Amanda is an example of a backup application which does
recovery correctly, and in essence, it operates a layer above the
backup applications that actually touch the tape.  It is responsible
for buffering the data (on a disk), notifying the operator of the
failure, and trying it again on some arranged schedule, or on operator
request.

The best thing you can do to improve tape behavior in either the
backup or streaming application is to improve the reliability of the
data transport, which is exactly what iSCSI does simply by using TCP.

The problem FC had was that when you write an arbitrary amount of
data, eventually you WILL get a media layer error and then you're
lost.  With FC error rates, this is usually only a problem for the
streaming data application.  Nonetheless, although the streaming data
application is the minority, the customers are high profile and have
huge installations.

> Related to recovery, when a TCP/SCSI connection closes, what ramifications
> does it have on device state (like mode pages, PREVENT/ALLOW REMOVAL,
> RESERVE/RELEASE, etc.)? Where does SCSI specify this?

This is a good question.  FC sorta blew this one originally.
Reservations did not even persist across hot plugs of uninvolved
equipment in FC-AL.  As a result, you have clustering software that
rereserves every few seconds `just in case'.  In the case of FC, the
obvious solution was to reserve by node name (WWN which is unique to
the device, as opposed to port name, which is unique to the attachment
point).  The mistake was drawing too direct an analogy between
parallel SCSI and FC.  Parallel SCSI had limited addressing and
relatively stable topology, and FC had wider addressing and a much
more dynamic topology.  Currently FC is somewhat mired in backward
compatability issues with respect to these recovery topics.

Hopefully iSCSI will follow a more enlightened course.

Steph

References:
- Re: Towards Consensus on TCP Connections
  - From: csapuntz@cisco.com

Prev by Date: RE: Towards Consensus on TCP Connections
Next by Date: Re: Towards Consensus on TCP Connections
Prev by thread: Re: Towards Consensus on TCP Connections
Next by thread: RE: Towards Consensus on TCP Connections
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:07:55 2001
6315 messages in chronological order