|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issuesJulian, I do not beleive that a MUST requirement of a sender "MUST" be verfied by the receiver. For example, if reserved fields MUST be zero, I am certainly not going to verify that reserved fields are zero - I don't care about them - they're reserved! You are trying to detect a misbehaving target. Well, I could build a misbehaving target that sends your sequence numbers, but not enough of them, and indicate no residual in the status, and you still wouldn't be able to detect an error... -Matt julian_satran@il.ibm.com wrote: > > Santosh, > > By enforce I meant - enforce like in the legalese - i.e., police. > If you have a MUST that you are never going to check better find a better > solution. > Checking it entails scoreboarding that no other SCSI protocol does or > needs. > > Sequencing is simple (and that is what FC does) and lets the target master > the transfer > the way it usually does for all SCSI protocols. > > I feel we have spent already too much time on this single issue -:) > > Julo > > Santosh Rao <santoshr@cup.hp.com> on 30/01/2001 10:25:45 > > Please respond to Santosh Rao <santoshr@cup.hp.com> > > To: ips@ece.cmu.edu (ips) > cc: > Subject: Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues > > > julian_satran@il.ibm.com wrote: > > > > > > Santosh, > > > > > > The trouble with forbidding a certain behavior is that you have to > enforce > > > it (i.e., check and signal errors for units that do not behave). > > > > No you don't have to "enforce" it at all. Using Santosh's example, if a > > target broke the rules and performed data overlay, then the initiator > will > > always mark the I/O as bad, and the market forces will take over > (customer > > will buy a compliant target). > > Julian, > > Removing support for overlapped data xfer's has multiple benefits: > 1) It provides initiators a reliable way of ensuring > that the I/O did complete without any underrun. > > 2) It simplifies SCSI Assist implementations that no longer > need to deal with overlapped data xfer conditions. > > 3) It is simpler than performing book-keeping > on DataSN to ensure that all DataSNs have been > received. (IOW, score-boarding at a DataSN level, > instead of at a byte level.) > > I'd say the count based solution is preferrable, given the above. > All protocols inherently enforce certain behaviour by mandating > features (the use of MUST, shall). I don't think that's a > strong enough reason to reject this proposal. > > To summarize : > o Dis-allow overlapped data xfer's. > > o Initiators to perform a count check as is done in FC. > > o On detecting an underrun, the command may be retried > BUT WITHOUT SETTING the "retry" bit. This is > particularly important because targets that implement > status recovery may be ignorant of the fact that the > initiator encountered a digest error [which caused > the underrun] and so, they just send back a Status > PDU under the belief that the command is complete, > whereas, the initiator wants all the Data PDUs > to be re-sent. > > Regards, > Santosh > > > > > Besides > > > - the whole philosophy of the SCSI set of protocols is that the target > is > > > the master and the initiator should let the target decide how to > fulfill > > > the command. That is why we chose not to impose restrictions above > those > > > imposed by SCSI. The whole set of issues is also raised only because > we > > > provide also for storage proxies - otherwise a stronger checksum at TCP > > > level and recovery at TCP level would have done what we wanted and > recovery > > > of the type we are dealing now with would have been done at TCP level. > > > I am confident that we can reinstate DataSN a simple mean to sequence > (not > > > ack) data packets and considerably simplify recovery. > > > > If there is a data digest failure, the iSCSI PDU is discarded, and the > test > > that Santosh describes will fail. At that point, the command is > "retried", > > and using my example of the retry implementation in the thread "iSCSI: > I/O > > (command) recovery" error recovery is performed. No need for DataSN... > > > > > And do not forget that raising the error up to ULP with a service > response > > > will make the recovery far more expensive (as Prasenjit has already > stated) > > > - far more than current wedge drivers do as these rarely consider > commands > > > in flight and the need to keep order in a target that is not yet aware > that > > > something went wrong. > > > > I agree that erorrs due to the transport should not be propagated to the > ULP. > > In the case of a digest failure, this means that the TCP checksum > indicated > > the segment was good, meaning that a middle box corrupted the TCP segment > and > > sent it out with a "fixed" TCP checksum. > > > > The simple iSCSI error recovery using the retry should handle this corner > case > > very well. > > > > -Matt > > > > > > > > > > Julo > > > > > > Santosh Rao <santoshr@cup.hp.com> on 28/01/2001 00:07:08 > > > > > > Please respond to Santosh Rao <santoshr@cup.hp.com> > > > > > > To: Julian Satran/Haifa/IBM@IBMIL > > > cc: ips@ece.cmu.edu > > > Subject: Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window > issues > > > > > > Julian, > > > > > > The missing Data PDU could be detected if the initiator were to > > > perform a count check operation upon receiving SCSI Response PDU, > > > along the lines of : > > > > > > no. of bytes xfer'ed = > > > (Expected Data Xfer Length) - (Basic Residual Count) > > > > > > where, > > > Expected Data Xfer Length -> as specified in SCSI Command PDU > > > Basic Residual Count -> as specified in SCSI Response PDU > > > > > > However, this is currently not possible due to overlapped data > > > transfers being allowed by iSCSI. If iSCSI were to dis-allow > > > overlapping data xfer's and initiators used a count check > > > [as is done in FC], this would also address the problem. > > > > > > Regards, > > > Santosh > > > > > > > > > > > > > > > > > > > If the header is a data header we can hardly trust the ULP to > recognize > > > the > > > > error (he might be unaware > > > > of a missing packet). With data numbering this situation could have > been > > > > discovered at "status time". > > > > The only thing we could do is restart all commands but this is > equivalent > > > > to a connection restart for all practical purposes. Dropping data > > > > numbering might have some more "side-effects" like this. > > > > As the combination of values - tag, address, offset may stil let some > > > > implementations to assume that they have > > > > a correct task identifier I don't see a point in mandating a recovery > > > > behavior and the implementer may choose to: > > > > > > > > -retry/restart command > > > > -logout drop and rebuild connection login and restart/retry > > > > -abort all task sets (practically reset the target!) and report for > all > > > > commands a "delivery system failure" (kick-in the ULP recovery) and > if > > > you > > > > suspect the link quality rebuild it; this later behavior means also > that > > > > you have to stop delivering anything on any link to the target to > avoid > > > > out of order execution until you have finished the cleanup - pretty > > > drastic > > > > > > > > With data numbering recovery could have stayed within the confines of > a > > > > command even if a header was bad. > > > > Perhaps we should leave the DataSN only as a sequencer so that at > > > > status-time the initiator should be able to find if a data packet was > > > > dropped (no ExpDataSN on a NOP). > > > > > > > > Regards, > > > > Julo > > > > > > > > > > > > > > > > > > > > Michael Krause <krause@cup.hp.com> on 27/01/2001 04:59:12 > > > > > > > > Please respond to Michael Krause <krause@cup.hp.com> > > > > > > > > To: Julian Satran/Haifa/IBM@IBMIL > > > > cc: ips@ece.cmu.edu > > > > Subject: Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window > > > issues > > > > > > > > > > > > > > > > > > > > At 07:40 PM 1/25/2001 +0200, julian_satran@il.ibm.com wrote: > > > > > > > > > > > > >1) The initiator task tag cannot be trusted when a header digest > error > > > > >is seen. What does the phrase "provided it can recognize the > initiator > > > > >task tag" mean ? > > > > >How can an initiator reliably claim that the initiator task tag is > > > > >trustworthy ? > > > > > > > > > ><js> an initiator may choose to provide some redundancy in the tag > > > itself > > > > ></js> > > > > > > > > I'm aware of some techniques for inserting redundant information in > tags > > > > which limits the potential error exposure when a multi-bit error > occurs, > > > > however these are not fail-safe leading to potential incorrect > operation > > > - > > > > perhaps benign in many cases; perhaps not in others. As such, if a > header > > > > digest error occurs, the PDU should be silently discarded and > recovery > > > > should be left to the ULP. There is little to no value having two > > > > mechanisms to solve the same problem. > > > > > > > > Mike > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > ################################# > > > Santosh Rao > > > Software Design Engineer, > > > HP, Cupertino. > > > email : santoshr@cup.hp.com > > > Phone : 408-447-3751 > > > ################################# > > > > -- > ################################# > Santosh Rao > Software Design Engineer, > HP, Cupertino. > email : santoshr@cup.hp.com > Phone : 408-447-3751 > #################################
Home Last updated: Tue Sep 04 01:05:37 2001 6315 messages in chronological order |