SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues



    Julian,
    
    I do not beleive that a MUST requirement of a sender "MUST" be verfied by the
    receiver.  For example, if reserved fields MUST be zero, I am certainly not
    going to verify that reserved fields are zero - I don't care about them -
    they're reserved!
    
    You are trying to detect a misbehaving target.  Well, I could build a
    misbehaving target that sends your sequence numbers, but not enough of them,
    and indicate no residual in the status, and you still wouldn't be able to
    detect an error...
    
    -Matt
    
    julian_satran@il.ibm.com wrote:
    > 
    > Santosh,
    > 
    > By enforce I meant - enforce like in the legalese - i.e., police.
    > If you have a MUST that you are never going to check better find a better
    > solution.
    > Checking it entails scoreboarding that no other SCSI protocol does or
    > needs.
    > 
    > Sequencing is simple (and that is what FC does) and lets the target master
    > the transfer
    > the way it usually does for all SCSI protocols.
    > 
    > I feel we have spent already too much time on this single issue -:)
    > 
    > Julo
    > 
    > Santosh Rao <santoshr@cup.hp.com> on 30/01/2001 10:25:45
    > 
    > Please respond to Santosh Rao <santoshr@cup.hp.com>
    > 
    > To:   ips@ece.cmu.edu (ips)
    > cc:
    > Subject:  Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues
    > 
    > > julian_satran@il.ibm.com wrote:
    > > >
    > > > Santosh,
    > > >
    > > > The trouble with forbidding a certain behavior is that you have to
    > enforce
    > > > it (i.e.,  check and signal errors for units that do not behave).
    > >
    > > No you don't have to "enforce" it at all.  Using Santosh's example, if a
    > > target broke the rules and performed data overlay, then the initiator
    > will
    > > always mark the I/O as bad, and the market forces will take over
    > (customer
    > > will buy a compliant target).
    > 
    > Julian,
    > 
    > Removing support for overlapped data xfer's has multiple benefits:
    > 1)   It provides initiators a reliable way of ensuring
    >      that the I/O did complete without any underrun.
    > 
    > 2)   It simplifies SCSI Assist implementations that no longer
    >      need to deal with overlapped data xfer conditions.
    > 
    > 3)   It is simpler than performing book-keeping
    >      on DataSN to ensure that all DataSNs have been
    >      received. (IOW, score-boarding at a DataSN level,
    >      instead of at a byte level.)
    > 
    > I'd say the count based solution is preferrable, given the above.
    > All protocols inherently enforce certain behaviour by mandating
    > features (the use of MUST, shall). I don't think that's a
    > strong enough reason to reject this proposal.
    > 
    > To summarize :
    > o    Dis-allow overlapped data xfer's.
    > 
    > o    Initiators to perform a count check as is done in FC.
    > 
    > o    On detecting an underrun, the command may be retried
    >      BUT WITHOUT SETTING the "retry" bit. This is
    >      particularly important because targets that implement
    >      status recovery may be ignorant of the fact that the
    >      initiator encountered a digest error [which caused
    >      the underrun] and so, they just send back a Status
    >      PDU under the belief that the command is complete,
    >      whereas, the initiator wants all the Data PDUs
    >      to be re-sent.
    > 
    > Regards,
    > Santosh
    > 
    > >
    > >    Besides
    > > > - the whole philosophy of the SCSI set of protocols is that the target
    > is
    > > > the master and the initiator should let the target decide how to
    > fulfill
    > > > the command.   That is why we chose not to impose restrictions above
    > those
    > > > imposed by SCSI.  The whole set of issues is also raised only because
    > we
    > > > provide also for storage proxies - otherwise a stronger checksum at TCP
    > > > level and recovery at TCP level would have done what we wanted and
    > recovery
    > > > of the type we are dealing now with would have been done at TCP level.
    > > > I am confident that we can reinstate DataSN a simple mean to sequence
    > (not
    > > > ack) data packets and considerably simplify recovery.
    > >
    > > If there is a data digest failure, the iSCSI PDU is discarded, and the
    > test
    > > that Santosh describes will fail.  At that point, the command is
    > "retried",
    > > and using my example of the retry implementation in the thread "iSCSI:
    > I/O
    > > (command) recovery" error recovery is performed.  No need for DataSN...
    > >
    > > > And do not forget that raising the error up to ULP with a service
    > response
    > > > will make the recovery far more expensive (as Prasenjit has already
    > stated)
    > > > - far more than current wedge drivers do as these rarely consider
    > commands
    > > > in flight and the need to keep order in a target that is not yet aware
    > that
    > > > something went wrong.
    > >
    > > I agree that erorrs due to the transport should not be propagated to the
    > ULP.
    > > In the case of a digest failure, this means that the TCP checksum
    > indicated
    > > the segment was good, meaning that a middle box corrupted the TCP segment
    > and
    > > sent it out with a "fixed" TCP checksum.
    > >
    > > The simple iSCSI error recovery using the retry should handle this corner
    > case
    > > very well.
    > >
    > > -Matt
    > >
    > >
    > > >
    > > > Julo
    > > >
    > > > Santosh Rao <santoshr@cup.hp.com> on 28/01/2001 00:07:08
    > > >
    > > > Please respond to Santosh Rao <santoshr@cup.hp.com>
    > > >
    > > > To:   Julian Satran/Haifa/IBM@IBMIL
    > > > cc:   ips@ece.cmu.edu
    > > > Subject:  Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window
    > issues
    > > >
    > > > Julian,
    > > >
    > > > The missing Data PDU could be detected if the initiator were to
    > > > perform a count check operation upon receiving SCSI Response PDU,
    > > > along the lines of :
    > > >
    > > > no. of bytes xfer'ed =
    > > >      (Expected Data Xfer Length) - (Basic Residual Count)
    > > >
    > > > where,
    > > > Expected Data Xfer Length -> as specified in SCSI Command PDU
    > > > Basic Residual Count -> as specified in SCSI Response PDU
    > > >
    > > > However, this is currently not possible due to overlapped data
    > > > transfers being allowed by iSCSI. If iSCSI were to dis-allow
    > > > overlapping data xfer's and initiators used a count check
    > > > [as is done in FC], this would also address the problem.
    > > >
    > > > Regards,
    > > > Santosh
    > > >
    > > > >
    > > > >
    > > > >
    > > > > If the header is a data header we can hardly trust the ULP to
    > recognize
    > > > the
    > > > > error (he might be unaware
    > > > > of a missing packet).  With data numbering this situation could have
    > been
    > > > > discovered at "status time".
    > > > > The only thing we could do is restart all commands but this is
    > equivalent
    > > > > to a connection restart for all practical purposes.  Dropping data
    > > > > numbering might have some more "side-effects" like this.
    > > > > As the combination of values - tag, address, offset may stil let some
    > > > > implementations to assume that they have
    > > > > a correct task identifier I don't see a point in mandating a recovery
    > > > > behavior and the implementer may choose to:
    > > > >
    > > > > -retry/restart command
    > > > > -logout drop and rebuild connection login and restart/retry
    > > > > -abort all task sets (practically reset the target!) and report for
    > all
    > > > > commands a "delivery system failure" (kick-in the ULP recovery) and
    > if
    > > > you
    > > > > suspect the link quality rebuild it; this later behavior means also
    > that
    > > > > you have to stop delivering anything on any link  to the target to
    > avoid
    > > > > out of order execution until you have finished the cleanup - pretty
    > > > drastic
    > > > >
    > > > > With data numbering recovery could have stayed within the confines of
    > a
    > > > > command even if a header was bad.
    > > > > Perhaps we should leave the DataSN only as a sequencer so that at
    > > > > status-time the initiator should be able to find if a data packet was
    > > > > dropped (no ExpDataSN on a NOP).
    > > > >
    > > > > Regards,
    > > > > Julo
    > > > >
    > > > >
    > > > >
    > > > >
    > > > > Michael Krause <krause@cup.hp.com> on 27/01/2001 04:59:12
    > > > >
    > > > > Please respond to Michael Krause <krause@cup.hp.com>
    > > > >
    > > > > To:   Julian Satran/Haifa/IBM@IBMIL
    > > > > cc:   ips@ece.cmu.edu
    > > > > Subject:  Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window
    > > > issues
    > > > >
    > > > >
    > > > >
    > > > >
    > > > > At 07:40 PM 1/25/2001 +0200, julian_satran@il.ibm.com wrote:
    > > > >
    > > > >
    > > > > >1) The initiator task tag cannot be trusted when a header digest
    > error
    > > > > >is seen. What does the phrase "provided it can recognize the
    > initiator
    > > > > >task tag" mean ?
    > > > > >How can an initiator reliably claim that the initiator task tag is
    > > > > >trustworthy ?
    > > > > >
    > > > > ><js> an initiator may choose to provide some redundancy in the tag
    > > > itself
    > > > > ></js>
    > > > >
    > > > > I'm aware of some techniques for inserting redundant information in
    > tags
    > > > > which limits the potential error exposure when a multi-bit error
    > occurs,
    > > > > however these are not fail-safe leading to potential incorrect
    > operation
    > > > -
    > > > > perhaps benign in many cases; perhaps not in others. As such, if a
    > header
    > > > > digest error occurs, the PDU should be silently discarded and
    > recovery
    > > > > should be left to the ULP.  There is little to no value having two
    > > > > mechanisms to solve the same problem.
    > > > >
    > > > > Mike
    > > > >
    > > > >
    > > > >
    > > > >
    > > > >
    > > >
    > > > --
    > > > #################################
    > > > Santosh Rao
    > > > Software Design Engineer,
    > > > HP, Cupertino.
    > > > email : santoshr@cup.hp.com
    > > > Phone : 408-447-3751
    > > > #################################
    > >
    > 
    > --
    > #################################
    > Santosh Rao
    > Software Design Engineer,
    > HP, Cupertino.
    > email : santoshr@cup.hp.com
    > Phone : 408-447-3751
    > #################################
    


Home

Last updated: Tue Sep 04 01:05:37 2001
6315 messages in chronological order