RE: Comments on the Draft (sync loss)

To: "'David Robinson'" <David.Robinson@EBay.Sun.COM>, ips@ece.cmu.edu
Subject: RE: Comments on the Draft (sync loss)
From: "Hall, Howard" <howard@pirus.com>
Date: Wed, 13 Sep 2000 15:25:39 -0400
Content-Type: text/plain;charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu

David, 

There is precedent for this type of framing recovery:

From rfc1831, RPC,  - August 1995
10. RECORD MARKING STANDARD

   When RPC messages are passed on top of a byte stream transport
   protocol (like TCP), it is necessary to delimit one message from
   another in order to detect and possibly recover from protocol errors.
   This is called record marking (RM).  One RPC message fits into one RM
   record.

   A record is composed of one or more record fragments.  A record
   fragment is a four-byte header followed by 0 to (2**31) - 1 bytes of
   fragment data.  The bytes encode an unsigned binary number; as with
   XDR integers, the byte order is from highest to lowest.  The number
   encodes two values -- a boolean which indicates whether the fragment
   is the last fragment of the record (bit value 1 implies the fragment
   is the last fragment) and a 31-bit unsigned binary value which is the
   length in bytes of the fragment's data.  The boolean value is the
   highest-order bit of the header; the length is the 31 low-order bits.
   (Note that this record specification is NOT in XDR standard form!)

-Howard

-----Original Message-----
From: David Robinson [mailto:David.Robinson@EBay.Sun.COM]
Sent: Tuesday, September 12, 2000 7:29 PM
To: ips@ece.cmu.edu
Subject: Re: Comments on the Draft (sync loss)


> Section: General
> There's no framing of the headers and data on the buffers from TCP. If
> anything goes wrong with the parsing, its difficult if not impossible to
> recover. It only takes one length field to be 'off'. If this happens the
> target will probably generate lots of "Opcode not understood" messages.
We
> suggest one of two methods: 1) after seeing consecutive "Opcode not
> understood" messages it should shut down the connection if this doesn't
> solve the problem then reset the target, or 2)  When the target finds that
> it is out of sync with the initiator ( on receipt of an "Opcode not
> understood"), it will send a new iSCSI "Out of Sync" command to the
> initiator.  The initiator will assume at the reception of the "Out of
Sync"
> command that all unacknowledged outstanding requests have been dropped.
The
> initiator then sends the next command with the OOB (out of band) bit set,
> and with the OOB offset pointing to the beginning of the iSCSI header.
The
> target, after sending the "Out of Sync" command, should ignore every thing
> on that connection and wait for the OOB data to re-sync again.  This
> exchange could also work if sent from the initiator to the target.

The loss of sync is an extremely rare event, either the sender sent a
request shorter than it indicated, the TCP stack caused corruption,
or the receiver misinterpreted the request.  All are severe bugs
somewhere, in practice with NFS over TCP which has no framing, sync errors
are so rare that it is ignored.  Instead of going through a complex process
of setting OOB to resync, then determine which messages need to
be resent, why not instead simply drop the connection and do the
common lost connection recovery.  The symptoms and state that needs
to be recovered is virtually identical.

Lets try to stay simple and not overly complicate the recovery process.

	-David

Prev by Date: RE: Comments on the Draft
Next by Date: Re: patent question
Prev by thread: Re: Comments on the Draft (sync loss)
Next by thread: d-o-s atack
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:07:17 2001
6315 messages in chronological order