SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: Towards Consensus on TCP Connections



    > On recovery, a big concern was the tape backup issue. Do SCSI
    > applications recover gracefully today from failed SCSI connections?
    > My understanding was that many tape backups program abort the backup.
    
    Tape is hard.  The main reason is that when an error occurs on a
    {READ, WRITE} SEQUENTIAL, you don't really know what state the tape is
    in.  Maybe the tape has advanced by the length of the failed operation
    Maybe not.  Maybe the tape has been eaten.  Maybe the tape has been
    ejected.  It's really hard to do anything at ANY layer except go into
    heavy duty recovery (rewind and try again).
    
    There are two distinct applications of tape with different
    requirements, backup and streaming data recording.  Backup is by far
    the most common application.
    
    Many backup applications don't attempt recovery because they assume
    that correcting the problem will probably require operator
    intervention.  Amanda is an example of a backup application which does
    recovery correctly, and in essence, it operates a layer above the
    backup applications that actually touch the tape.  It is responsible
    for buffering the data (on a disk), notifying the operator of the
    failure, and trying it again on some arranged schedule, or on operator
    request.
    
    The best thing you can do to improve tape behavior in either the
    backup or streaming application is to improve the reliability of the
    data transport, which is exactly what iSCSI does simply by using TCP.
    
    The problem FC had was that when you write an arbitrary amount of
    data, eventually you WILL get a media layer error and then you're
    lost.  With FC error rates, this is usually only a problem for the
    streaming data application.  Nonetheless, although the streaming data
    application is the minority, the customers are high profile and have
    huge installations.
    
    > Related to recovery, when a TCP/SCSI connection closes, what ramifications
    > does it have on device state (like mode pages, PREVENT/ALLOW REMOVAL,
    > RESERVE/RELEASE, etc.)? Where does SCSI specify this?
    
    This is a good question.  FC sorta blew this one originally.
    Reservations did not even persist across hot plugs of uninvolved
    equipment in FC-AL.  As a result, you have clustering software that
    rereserves every few seconds `just in case'.  In the case of FC, the
    obvious solution was to reserve by node name (WWN which is unique to
    the device, as opposed to port name, which is unique to the attachment
    point).  The mistake was drawing too direct an analogy between
    parallel SCSI and FC.  Parallel SCSI had limited addressing and
    relatively stable topology, and FC had wider addressing and a much
    more dynamic topology.  Currently FC is somewhat mired in backward
    compatability issues with respect to these recovery topics.
    
    Hopefully iSCSI will follow a more enlightened course.
    
    Steph
    
    
    


Home

Last updated: Tue Sep 04 01:07:55 2001
6315 messages in chronological order