|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issuesActually, its not that bad. Parallel SCSI logic already considers a similar case. That is the case where one of the tagged commands never completes. What happens is that one of the upper layers (sometimes the application and sometimes the operating system) times the I/O. The I/O associated with this tag will timeout and the layer that is responsible takes corrective action. With a disk it is very simple ... just abort the I/O and re-issue it. With something like a tape, the application has to determine if the tape needs to be backspaced or not and then re-issues the I/O. In parallel SCSI, the driver or one of the upper layers will usually issue a Target Reset to get rid of all the tags (because it is assumed that the device has malfunctioned). This is not considered drastic in many systems and is very natural when testing RAID systems. So ... for us, all we need to do is drop the packet. In the iSCSI spec, we are basically saying you can rely on corrupted data and I just don't think that is a good idea. Eddy ----Original Message Follows---- From: julian_satran@il.ibm.com To: ips@ece.cmu.edu Subject: Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues Date: Sat, 27 Jan 2001 22:24:31 +0200 If the header is a data header we can hardly trust the ULP to recognize the error (he might be unaware of a missing packet). With data numbering this situation could have been discovered at "status time". The only thing we could do is restart all commands but this is equivalent to a connection restart for all practical purposes. Dropping data numbering might have some more "side-effects" like this. As the combination of values - tag, address, offset may stil let some implementations to assume that they have a correct task identifier I don't see a point in mandating a recovery behavior and the implementer may choose to: -retry/restart command -logout drop and rebuild connection login and restart/retry -abort all task sets (practically reset the target!) and report for all commands a "delivery system failure" (kick-in the ULP recovery) and if you suspect the link quality rebuild it; this later behavior means also that you have to stop delivering anything on any link to the target to avoid out of order execution until you have finished the cleanup - pretty drastic With data numbering recovery could have stayed within the confines of a command even if a header was bad. Perhaps we should leave the DataSN only as a sequencer so that at status-time the initiator should be able to find if a data packet was dropped (no ExpDataSN on a NOP). Regards, Julo Michael Krause <krause@cup.hp.com> on 27/01/2001 04:59:12 Please respond to Michael Krause <krause@cup.hp.com> To: Julian Satran/Haifa/IBM@IBMIL cc: ips@ece.cmu.edu Subject: Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues At 07:40 PM 1/25/2001 +0200, julian_satran@il.ibm.com wrote: >1) The initiator task tag cannot be trusted when a header digest error >is seen. What does the phrase "provided it can recognize the initiator >task tag" mean ? >How can an initiator reliably claim that the initiator task tag is >trustworthy ? > ><js> an initiator may choose to provide some redundancy in the tag itself ></js> I'm aware of some techniques for inserting redundant information in tags which limits the potential error exposure when a multi-bit error occurs, however these are not fail-safe leading to potential incorrect operation - perhaps benign in many cases; perhaps not in others. As such, if a header digest error occurs, the PDU should be silently discarded and recovery should be left to the ULP. There is little to no value having two mechanisms to solve the same problem. Mike _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com
Home Last updated: Tue Sep 04 01:05:39 2001 6315 messages in chronological order |