SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues



    > julian_satran@il.ibm.com wrote:
    > > 
    > > Santosh,
    > > 
    > > The trouble with forbidding a certain behavior is that you have to enforce
    > > it (i.e.,  check and signal errors for units that do not behave).
    > 
    > No you don't have to "enforce" it at all.  Using Santosh's example, if a
    > target broke the rules and performed data overlay, then the initiator will
    > always mark the I/O as bad, and the market forces will take over (customer
    > will buy a compliant target).
    
    Julian,
    
    Removing support for overlapped data xfer's has multiple benefits:
    1)	It provides initiators a reliable way of ensuring 
    	that the I/O did complete without any underrun.
    
    2) 	It simplifies SCSI Assist implementations that no longer
    	need to deal with overlapped data xfer conditions.
    
    3)	It is simpler than performing book-keeping
    	on DataSN to ensure that all DataSNs have been
    	received. (IOW, score-boarding at a DataSN level,
    	instead of at a byte level.)
    
    I'd say the count based solution is preferrable, given the above.
    All protocols inherently enforce certain behaviour by mandating
    features (the use of MUST, shall). I don't think that's a 
    strong enough reason to reject this proposal.
    
    To summarize :
    o	Dis-allow overlapped data xfer's.
    
    o	Initiators to perform a count check as is done in FC.
    
    o	On detecting an underrun, the command may be retried
    	BUT WITHOUT SETTING the "retry" bit. This is 
    	particularly important because targets that implement 
    	status recovery may be ignorant of the fact that the
    	initiator encountered a digest error [which caused
    	the underrun] and so, they just send back a Status
    	PDU under the belief that the command is complete,
    	whereas, the initiator wants all the Data PDUs
    	to be re-sent.
    
    Regards,
    Santosh
    
    
    > 
    >    Besides
    > > - the whole philosophy of the SCSI set of protocols is that the target is
    > > the master and the initiator should let the target decide how to fulfill
    > > the command.   That is why we chose not to impose restrictions above those
    > > imposed by SCSI.  The whole set of issues is also raised only because we
    > > provide also for storage proxies - otherwise a stronger checksum at TCP
    > > level and recovery at TCP level would have done what we wanted and recovery
    > > of the type we are dealing now with would have been done at TCP level.
    > > I am confident that we can reinstate DataSN a simple mean to sequence (not
    > > ack) data packets and considerably simplify recovery.
    > 
    > If there is a data digest failure, the iSCSI PDU is discarded, and the test
    > that Santosh describes will fail.  At that point, the command is "retried",
    > and using my example of the retry implementation in the thread "iSCSI: I/O
    > (command) recovery" error recovery is performed.  No need for DataSN...
    > 
    > > And do not forget that raising the error up to ULP with a service response
    > > will make the recovery far more expensive (as Prasenjit has already stated)
    > > - far more than current wedge drivers do as these rarely consider commands
    > > in flight and the need to keep order in a target that is not yet aware that
    > > something went wrong.
    > 
    > I agree that erorrs due to the transport should not be propagated to the ULP. 
    > In the case of a digest failure, this means that the TCP checksum indicated
    > the segment was good, meaning that a middle box corrupted the TCP segment and
    > sent it out with a "fixed" TCP checksum.
    > 
    > The simple iSCSI error recovery using the retry should handle this corner case
    > very well.
    > 
    > -Matt
    > 
    > 
    > > 
    > > Julo
    > > 
    > > Santosh Rao <santoshr@cup.hp.com> on 28/01/2001 00:07:08
    > > 
    > > Please respond to Santosh Rao <santoshr@cup.hp.com>
    > > 
    > > To:   Julian Satran/Haifa/IBM@IBMIL
    > > cc:   ips@ece.cmu.edu
    > > Subject:  Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues
    > > 
    > > Julian,
    > > 
    > > The missing Data PDU could be detected if the initiator were to
    > > perform a count check operation upon receiving SCSI Response PDU,
    > > along the lines of :
    > > 
    > > no. of bytes xfer'ed =
    > >      (Expected Data Xfer Length) - (Basic Residual Count)
    > > 
    > > where,
    > > Expected Data Xfer Length -> as specified in SCSI Command PDU
    > > Basic Residual Count -> as specified in SCSI Response PDU
    > > 
    > > However, this is currently not possible due to overlapped data
    > > transfers being allowed by iSCSI. If iSCSI were to dis-allow
    > > overlapping data xfer's and initiators used a count check
    > > [as is done in FC], this would also address the problem.
    > > 
    > > Regards,
    > > Santosh
    > > 
    > > >
    > > >
    > > >
    > > > If the header is a data header we can hardly trust the ULP to recognize
    > > the
    > > > error (he might be unaware
    > > > of a missing packet).  With data numbering this situation could have been
    > > > discovered at "status time".
    > > > The only thing we could do is restart all commands but this is equivalent
    > > > to a connection restart for all practical purposes.  Dropping data
    > > > numbering might have some more "side-effects" like this.
    > > > As the combination of values - tag, address, offset may stil let some
    > > > implementations to assume that they have
    > > > a correct task identifier I don't see a point in mandating a recovery
    > > > behavior and the implementer may choose to:
    > > >
    > > > -retry/restart command
    > > > -logout drop and rebuild connection login and restart/retry
    > > > -abort all task sets (practically reset the target!) and report for all
    > > > commands a "delivery system failure" (kick-in the ULP recovery) and if
    > > you
    > > > suspect the link quality rebuild it; this later behavior means also that
    > > > you have to stop delivering anything on any link  to the target to avoid
    > > > out of order execution until you have finished the cleanup - pretty
    > > drastic
    > > >
    > > > With data numbering recovery could have stayed within the confines of a
    > > > command even if a header was bad.
    > > > Perhaps we should leave the DataSN only as a sequencer so that at
    > > > status-time the initiator should be able to find if a data packet was
    > > > dropped (no ExpDataSN on a NOP).
    > > >
    > > > Regards,
    > > > Julo
    > > >
    > > >
    > > >
    > > >
    > > > Michael Krause <krause@cup.hp.com> on 27/01/2001 04:59:12
    > > >
    > > > Please respond to Michael Krause <krause@cup.hp.com>
    > > >
    > > > To:   Julian Satran/Haifa/IBM@IBMIL
    > > > cc:   ips@ece.cmu.edu
    > > > Subject:  Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window
    > > issues
    > > >
    > > >
    > > >
    > > >
    > > > At 07:40 PM 1/25/2001 +0200, julian_satran@il.ibm.com wrote:
    > > >
    > > >
    > > > >1) The initiator task tag cannot be trusted when a header digest error
    > > > >is seen. What does the phrase "provided it can recognize the initiator
    > > > >task tag" mean ?
    > > > >How can an initiator reliably claim that the initiator task tag is
    > > > >trustworthy ?
    > > > >
    > > > ><js> an initiator may choose to provide some redundancy in the tag
    > > itself
    > > > ></js>
    > > >
    > > > I'm aware of some techniques for inserting redundant information in tags
    > > > which limits the potential error exposure when a multi-bit error occurs,
    > > > however these are not fail-safe leading to potential incorrect operation
    > > -
    > > > perhaps benign in many cases; perhaps not in others. As such, if a header
    > > > digest error occurs, the PDU should be silently discarded and recovery
    > > > should be left to the ULP.  There is little to no value having two
    > > > mechanisms to solve the same problem.
    > > >
    > > > Mike
    > > >
    > > >
    > > >
    > > >
    > > >
    > > 
    > > --
    > > #################################
    > > Santosh Rao
    > > Software Design Engineer,
    > > HP, Cupertino.
    > > email : santoshr@cup.hp.com
    > > Phone : 408-447-3751
    > > #################################
    > 
    
    
    -- 
    #################################
    Santosh Rao
    Software Design Engineer,
    HP, Cupertino.
    email : santoshr@cup.hp.com
    Phone : 408-447-3751
    #################################
    


Home

Last updated: Tue Sep 04 01:05:38 2001
6315 messages in chronological order