|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI: error recoveryPierre, Interesting scenario - but ENTIRELY WRONG. A more carefull reading of the draft would have solved your problem. After a failed connection the two parties (I & T) are supposed to do some cleanup. In the old draft that was accomplished by having the initiator indicate in the new login what old connection it is replacing. In the new draft there is an explicit logout that is required before resending unacked command. This mechanism was carefully designed to help avoid ghost commands appearing at the target. Nevertheless - as David Black has suggested - you are encouraged to look for holes. As to publish or not that is entirely a question of taste. I would certainly expect the problems to be real or at least harder to crack that this one (no pun intended). Regards, Julo Pierre Labat <pierre_labat@hp.com> on 07/11/2000 02:25:28 Please respond to Pierre Labat <pierre_labat@hp.com> To: ips@ece.cmu.edu cc: Subject: Re: iSCSI: error recovery Hello, Some suggestions to simplify/secure the error recovery. Regards, Pierre Using several TCP connections gives an unreliable media. Requests,responses and data can be lost,duplicated or ghost because TCP connection(s) can drop. Trying to do a recovery can lead to some problems. The following scenarios describe some of the problems we will have. I am sure one can find other ones. Scenario 1 ---------- In this first scenario the recovery is delayed unecessary, the retry of a command will fail. Initiator_ExpCmdRN = 1 Target_ExpCmdRN = 4 1) Cmd 5 and Cmd 6 sent over NIC1 on the way to the target 2) NIC1 fails 3) Initiator detecting that NIC1 failed, retries Cmd5 and Cmd 6 on an other NIC and TCP connection with their unchanged CmdRN (5 and 6) because 5 and 6 are greater than than Initiator_ExpCmdRN. (It is the algorithm described in the draft) 4) The Cmd 5 and Cmd 6 (sent from the failed NIC1 enters the target) Target_ExpCmdRN is updated to 7. These commands have no chance to complete correctly because their TCP connection has been dropped on the initiator side. 5) The retry of the Cmd enters the target (through another TCP connection) But their CmdRN (5 and 6) are less than Target_ExpCmdRN. Hence they are dropped by the target. 6) The retry mechanism fails. The initiator will have to wait for the timeout of the commands 5 and 6 to try another recovery. Scenario 2 ---------- Initiator_ExpCmdRN = 1 Target_ExpCmdRN = 4 Imagine the session has 4 TCP connections. 1) Initiator sends a command with CmdRN = 7 over the TCP connection 1. Commands 5 and 6 are on the flight between the initiator and the target (on the TCP connection 4 for example). 2) The command 7 is blocked somewhere on the network because of congestion. 3) The TCP connection 1 fails unexpectedly on the initiator side (for whatever reason: hard soft,cable disconnected...) and the target can't be notified. 4) The initiator (as specified in the draft) sends a retry with CmdRN unchanged (CmdRN=7) on the TCP connection 2. 5) The TCP connection 2 fails unexpectedly on the initiator side (for whatever reason: hard soft,cable disconnected...) and the target can't be notified. 6) The initiator (as specified in the draft) sends a retry with CmdRN unchanged (CmdRN=7) on the TCP connection 3. 5) The target receives the retry from the connection 3, then the retry from the connection 2 then the original command from the connection 1. In fact, no luck, it receives things in the inverse order the initiator sent them. All these retries/command have the same CmdRN(=7) and same initiator task tag, hence the target get several retry for the same command and has no clue how to re-order them. When the target receives the second retry (from cx 2) it doesn't know what to with it. If it supersedes the first retry, the retry will fail because the completion will be send on the connection 2 that is failed on the initiator side. If it doesn't supersede and if the retries would have come in order, the retry would have failed too. Scenario 3 ---------- 1) Cmd 1 sent to the target but blocked in TCP connection 1 2) The initiator sends plenty of commands on other TCP connection(s) that are OK. 3) TCP connection 1 fails on initiator side 4) Abort of Cmd 1 sent on TCP connection 2. The Abort is non-numbered (CmdRN=0). The abort is received by the target that returns "function rejected" because there is no matching task tag. 5) At this point the initiator doesn't know what to do. Because it doesn't know if the command has been lost or if it will come in the target later. 6) The command 1 finally reaches the target (ghost IO), and is not aborted. Scenario 4 ---------- In this scenario, the whole traffic of a session is blocked when one command fails. 1) Cmd 10 sent to the target but blocked in TCP connection 1 and will never reach the target. 2) The initiator sends plenty of commands on other TCP connection(s) that are OK. 3) TCP connection 1 fails on initiator side 4) Abort of Cmd 10 sent on TCP connection 2. The Abort is numbered using a new CmdRN. The abort is received by the target but not processed because the CmdRN of the abort is greater that Target_ExpCmdRN that is blocked on 10. 5) The entire command processing (through all TCP connections) is blocked on the target at Target_ExpCmdRN = 10 till SCSI retries the command 10 with the same CmdRN (that can takes several seconds). And if SCSI doesn't retry with the same CmdRN (10) we have a dead lock. Scenario 5 ---------- Initiator_ExpCmdRN=Target_ExpCmdRN=5 Initiator_MaxCmdRN=Target_MaxCmdRN=100 Two TCP connections are used. 1) The initiator sends the command CmdRN=5 over the connection 1 then the commands CmdRN=6 to CmdRN=100 over the connection 2 2) The initiator can send no more command because current CmdRN = MaxCmdRN 3) The TCP connexion 1 breaks on the initiator side and the command 5 will never reach the target. 4) The initiator wants to do a recovery with numbered commands (abort task for example), but can't send it because CmdRN = MaxCmdRN. 5) The target doesn't want to increment MaxCmdRN because its already buffered commands up to 100 and have no extra buffer space. It waits for receiving command 5. It could be because it allocated a maximum amount of memory space for the non ordered commands it receives. 6) The initiator waits for MaxCmdRN to increase and the target waits for command 5 to come or be aborted. We have a dead lock. Scenario 6 ---------- 1) the initiator sends the command CmdRN=1 on a TCP connection 2) the command is stuck in the network 3) The command timeout on the initiator 4) the initiator "retry" the command on the same TCP connexion and the retry command is in the network 5) the target receives the original command, executes it, and sends the completion. 6) the initiator receives the completion, it doesn't know if it is from the original command or from the "retry" command because the same initiator task tag is used in both commands Solve these problems ==================== To get rid off all these corner cases and have a basic, simple and robust recovery mechanism that avoids or manages lost,duplicated or ghost we could do: - keep the fact that every numbered command with a CmdRN out of the window [Target_ExpCmdRN,Target_MaxCmdRN] is discarded silently. - recover commands always doing an abort then sending again the command with a new CmdRN and a new initiator task tag. - modify sligthly the abort, send it non numbered and change a little bit the way non numbered messages are coded. Below are listed the modifications: Modification of the coding of the headers ----------------------------------------- for non numbered commands: -------------------------- Add a bit in the iSCSI header to indicate if the transaction is numbered or not. It allows to use (in case the command is non numbered) the CmdRN field to reference a command the transaction is targeted to. Currently to indicate that a command is non numbered CmdRN must be set to 0. When the non numbered bit is set, the target doesn't discard the request if CmdRN is out of the window [Target_ExpCmdRN,Target_MaxCmdRN]. CmdRN indicates the command the non numbered transaction is targeted to. If the non numbered transaction is not targeted to any specific command CmdRN is set to 0. Doing that gives an Abort more robust (see below). Modification of Abort task: --------------------------- The abort is sent non numbered (with the bit non numbered set) The CmdRN is updated with the value corresponding to the command to abort. When the target receives an abort: - If there is no task associated with CmdRN and if CmdRN is out of the window [Target_ExpCmdRN,Target_MaxCmdRN]. The abort returns immediately with success. - If there is no task associated with CmdRN but if CmdRN is in the window [Target_ExpCmdRN,Target_MaxCmdRN]. The target marks CmdRN as "jump". It means that when Target_ExpCmdRN will reach CmdRN, it only will jump to CmdRN+1. It prevents a dead lock if the command to abort never comes to the target. - If there is a task associated with CmdRN. The target aborts the task or cleans the ressources if the task was not yet in a task set, marks CmdRN as "jump", and returns successfully. The recovery mechanism "retrying" the commands ============================================== Beside the basic recovery abort/new command the more sophisticated "retry" may be faster. The initiator (instead of doing an abort and sending again the command with a new initiator task tag and a new CmdRN) can send a "retry" message. To avoid the problems described in the scenarios, the "retry" message must be more sophisticated than simply setting the retry bit as specified in the draft. It must combine a part of the job of an "abort task" (to fill the holes in the CmdRN sequence to allow Target_ExpCmdRN to make progress) and the job of sending again the command. Modification of the "retry" --------------------------- This "retry" message has the format of the SCSI command pdu except: - a "referenced initiator task tag" field is added. It references the command to "retry" - a "timestamp" field (integer) is added. When the initiator sends a "retry" it: - sets the retry bit and the non numbered bit - updates the CmdRN field with the value of the CmdRN of the command to retry - generates a new initiator task tag(not the one of the task to retry) - updates the "referenced initiator task tag" with the one of the command to retry. - sets the timestamp is 0. For the following "retry(s)" of the same command (in the case the first one failed) the initiator generates a new initiator task tag and increments the timestamp. The target when receiving a retry: - check if there is a task already associated with CmdRN. - if NO (the command has been lost or will come later (ghost)) the target acts as if it was receiving the original command. It records the timestamp. - if YES the target check the timestamp. If the one in the retry is older than the one in the target, the "retry" is discarded silently. If the timestamp in the "retry" is newer than the one in the target associated to the command, the current task is stopped and restarted, the new timestamp is recorded by the target. Sending the retry non numbered allows the "retry" to reach the target even if the command window is closed. That can prevent the kind of dead lock described in scenario 5. That solves the scenario 1 too. In the case the first retry doesn't work and the initiator needs to send another one (for the same command), sending the retries with different "initiator task tags" allows the initiator to do the correspondance between the retries PDUs and their completions. In general as the main goal of the initiator task tag is to allow the initiator to do the correspondance between the request and the responses, it is cleaner for each initiator request to generate a new initiator task tag. Having a timestamp avoid the problems described in the scenario 2. The target knows to sort between new PDUs and the ghost ones. Using the CmdRN to reference the command to retry, allows the target to: - fill the holes in the CmdRN sequence at the target, even if the original command never reached the target. Target_ExpCmdRN can make progress. A initiator must not send a "retry" if it acknowledged the Status of the corresponding command. The target can forget the CmdRN of a command as soon as the corresponding status has been acknowledged. If the target receives a retry with the CmdRN that is not in the window [Target_ExpCmdRN,MaxCmdRN] and that doesn't correspond to any task whose the status as not yet been acknowledged by the initiator, the target answers with an iSCSI status of the kind "out of range". It seems to me that these three modifications (non numbered command, abort task, retry) allows to have a robust recovery eliminating the problems generated by the duplicates, ghosts, missing iSCSI PDUs. The target always knows what to do exactly, it is specified, and the targe is never blocked. The StatRN is usefull only if "retry" is used.
Home Last updated: Tue Sep 04 01:06:27 2001 6315 messages in chronological order |