Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues

To: ips@ece.cmu.edu
Subject: Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues
From: Matt Wakeley <matt_wakeley@agilent.com>
Date: Tue, 30 Jan 2001 12:37:32 -0800
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
Organization: Agilent Technologies
References: <C12569E4.003EF050.00@d12mta02.de.ibm.com>
Reply-To: Matt Wakeley <matt_wakeley@agilent.com>
Sender: owner-ips@ece.cmu.edu

Julian,

I do not beleive that a MUST requirement of a sender "MUST" be verfied by the
receiver.  For example, if reserved fields MUST be zero, I am certainly not
going to verify that reserved fields are zero - I don't care about them -
they're reserved!

You are trying to detect a misbehaving target.  Well, I could build a
misbehaving target that sends your sequence numbers, but not enough of them,
and indicate no residual in the status, and you still wouldn't be able to
detect an error...

-Matt

julian_satran@il.ibm.com wrote:
> 
> Santosh,
> 
> By enforce I meant - enforce like in the legalese - i.e., police.
> If you have a MUST that you are never going to check better find a better
> solution.
> Checking it entails scoreboarding that no other SCSI protocol does or
> needs.
> 
> Sequencing is simple (and that is what FC does) and lets the target master
> the transfer
> the way it usually does for all SCSI protocols.
> 
> I feel we have spent already too much time on this single issue -:)
> 
> Julo
> 
> Santosh Rao <santoshr@cup.hp.com> on 30/01/2001 10:25:45
> 
> Please respond to Santosh Rao <santoshr@cup.hp.com>
> 
> To:   ips@ece.cmu.edu (ips)
> cc:
> Subject:  Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues
> 
> > julian_satran@il.ibm.com wrote:
> > >
> > > Santosh,
> > >
> > > The trouble with forbidding a certain behavior is that you have to
> enforce
> > > it (i.e.,  check and signal errors for units that do not behave).
> >
> > No you don't have to "enforce" it at all.  Using Santosh's example, if a
> > target broke the rules and performed data overlay, then the initiator
> will
> > always mark the I/O as bad, and the market forces will take over
> (customer
> > will buy a compliant target).
> 
> Julian,
> 
> Removing support for overlapped data xfer's has multiple benefits:
> 1)   It provides initiators a reliable way of ensuring
>      that the I/O did complete without any underrun.
> 
> 2)   It simplifies SCSI Assist implementations that no longer
>      need to deal with overlapped data xfer conditions.
> 
> 3)   It is simpler than performing book-keeping
>      on DataSN to ensure that all DataSNs have been
>      received. (IOW, score-boarding at a DataSN level,
>      instead of at a byte level.)
> 
> I'd say the count based solution is preferrable, given the above.
> All protocols inherently enforce certain behaviour by mandating
> features (the use of MUST, shall). I don't think that's a
> strong enough reason to reject this proposal.
> 
> To summarize :
> o    Dis-allow overlapped data xfer's.
> 
> o    Initiators to perform a count check as is done in FC.
> 
> o    On detecting an underrun, the command may be retried
>      BUT WITHOUT SETTING the "retry" bit. This is
>      particularly important because targets that implement
>      status recovery may be ignorant of the fact that the
>      initiator encountered a digest error [which caused
>      the underrun] and so, they just send back a Status
>      PDU under the belief that the command is complete,
>      whereas, the initiator wants all the Data PDUs
>      to be re-sent.
> 
> Regards,
> Santosh
> 
> >
> >    Besides
> > > - the whole philosophy of the SCSI set of protocols is that the target
> is
> > > the master and the initiator should let the target decide how to
> fulfill
> > > the command.   That is why we chose not to impose restrictions above
> those
> > > imposed by SCSI.  The whole set of issues is also raised only because
> we
> > > provide also for storage proxies - otherwise a stronger checksum at TCP
> > > level and recovery at TCP level would have done what we wanted and
> recovery
> > > of the type we are dealing now with would have been done at TCP level.
> > > I am confident that we can reinstate DataSN a simple mean to sequence
> (not
> > > ack) data packets and considerably simplify recovery.
> >
> > If there is a data digest failure, the iSCSI PDU is discarded, and the
> test
> > that Santosh describes will fail.  At that point, the command is
> "retried",
> > and using my example of the retry implementation in the thread "iSCSI:
> I/O
> > (command) recovery" error recovery is performed.  No need for DataSN...
> >
> > > And do not forget that raising the error up to ULP with a service
> response
> > > will make the recovery far more expensive (as Prasenjit has already
> stated)
> > > - far more than current wedge drivers do as these rarely consider
> commands
> > > in flight and the need to keep order in a target that is not yet aware
> that
> > > something went wrong.
> >
> > I agree that erorrs due to the transport should not be propagated to the
> ULP.
> > In the case of a digest failure, this means that the TCP checksum
> indicated
> > the segment was good, meaning that a middle box corrupted the TCP segment
> and
> > sent it out with a "fixed" TCP checksum.
> >
> > The simple iSCSI error recovery using the retry should handle this corner
> case
> > very well.
> >
> > -Matt
> >
> >
> > >
> > > Julo
> > >
> > > Santosh Rao <santoshr@cup.hp.com> on 28/01/2001 00:07:08
> > >
> > > Please respond to Santosh Rao <santoshr@cup.hp.com>
> > >
> > > To:   Julian Satran/Haifa/IBM@IBMIL
> > > cc:   ips@ece.cmu.edu
> > > Subject:  Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window
> issues
> > >
> > > Julian,
> > >
> > > The missing Data PDU could be detected if the initiator were to
> > > perform a count check operation upon receiving SCSI Response PDU,
> > > along the lines of :
> > >
> > > no. of bytes xfer'ed =
> > >      (Expected Data Xfer Length) - (Basic Residual Count)
> > >
> > > where,
> > > Expected Data Xfer Length -> as specified in SCSI Command PDU
> > > Basic Residual Count -> as specified in SCSI Response PDU
> > >
> > > However, this is currently not possible due to overlapped data
> > > transfers being allowed by iSCSI. If iSCSI were to dis-allow
> > > overlapping data xfer's and initiators used a count check
> > > [as is done in FC], this would also address the problem.
> > >
> > > Regards,
> > > Santosh
> > >
> > > >
> > > >
> > > >
> > > > If the header is a data header we can hardly trust the ULP to
> recognize
> > > the
> > > > error (he might be unaware
> > > > of a missing packet).  With data numbering this situation could have
> been
> > > > discovered at "status time".
> > > > The only thing we could do is restart all commands but this is
> equivalent
> > > > to a connection restart for all practical purposes.  Dropping data
> > > > numbering might have some more "side-effects" like this.
> > > > As the combination of values - tag, address, offset may stil let some
> > > > implementations to assume that they have
> > > > a correct task identifier I don't see a point in mandating a recovery
> > > > behavior and the implementer may choose to:
> > > >
> > > > -retry/restart command
> > > > -logout drop and rebuild connection login and restart/retry
> > > > -abort all task sets (practically reset the target!) and report for
> all
> > > > commands a "delivery system failure" (kick-in the ULP recovery) and
> if
> > > you
> > > > suspect the link quality rebuild it; this later behavior means also
> that
> > > > you have to stop delivering anything on any link  to the target to
> avoid
> > > > out of order execution until you have finished the cleanup - pretty
> > > drastic
> > > >
> > > > With data numbering recovery could have stayed within the confines of
> a
> > > > command even if a header was bad.
> > > > Perhaps we should leave the DataSN only as a sequencer so that at
> > > > status-time the initiator should be able to find if a data packet was
> > > > dropped (no ExpDataSN on a NOP).
> > > >
> > > > Regards,
> > > > Julo
> > > >
> > > >
> > > >
> > > >
> > > > Michael Krause <krause@cup.hp.com> on 27/01/2001 04:59:12
> > > >
> > > > Please respond to Michael Krause <krause@cup.hp.com>
> > > >
> > > > To:   Julian Satran/Haifa/IBM@IBMIL
> > > > cc:   ips@ece.cmu.edu
> > > > Subject:  Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window
> > > issues
> > > >
> > > >
> > > >
> > > >
> > > > At 07:40 PM 1/25/2001 +0200, julian_satran@il.ibm.com wrote:
> > > >
> > > >
> > > > >1) The initiator task tag cannot be trusted when a header digest
> error
> > > > >is seen. What does the phrase "provided it can recognize the
> initiator
> > > > >task tag" mean ?
> > > > >How can an initiator reliably claim that the initiator task tag is
> > > > >trustworthy ?
> > > > >
> > > > ><js> an initiator may choose to provide some redundancy in the tag
> > > itself
> > > > ></js>
> > > >
> > > > I'm aware of some techniques for inserting redundant information in
> tags
> > > > which limits the potential error exposure when a multi-bit error
> occurs,
> > > > however these are not fail-safe leading to potential incorrect
> operation
> > > -
> > > > perhaps benign in many cases; perhaps not in others. As such, if a
> header
> > > > digest error occurs, the PDU should be silently discarded and
> recovery
> > > > should be left to the ULP.  There is little to no value having two
> > > > mechanisms to solve the same problem.
> > > >
> > > > Mike
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > #################################
> > > Santosh Rao
> > > Software Design Engineer,
> > > HP, Cupertino.
> > > email : santoshr@cup.hp.com
> > > Phone : 408-447-3751
> > > #################################
> >
> 
> --
> #################################
> Santosh Rao
> Software Design Engineer,
> HP, Cupertino.
> email : santoshr@cup.hp.com
> Phone : 408-447-3751
> #################################

References:
- Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues
  - From: julian_satran@il.ibm.com

Prev by Date: [Fwd: Re: new iSCSI PDU outline]
Next by Date: RE: iSCSI : Command Ordering Proposal.
Prev by thread: Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues
Next by thread: iSCSI PDU outline
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:05:37 2001
6315 messages in chronological order