|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] iSCSI: SCSI timeout handling changeAll: Currently, if a command is not acknowledged by the ULP timeout, iSCSI mandates the initiators to tear up the session. The rationale behind this is that if the initiator could not get the command through (in possibly multiple retries) even by the ULP timeout, there's a serious problem with the session. But there are some drawbacks to this approach - - tearing up a session due to a NIC failure is disruptive to potentially several other active tasks on other NICs. - this puts those initiator implementations not wanting to do within-connection recovery (i.e. no retries) at a disadvantage, since one digest error would cause potentially several active I/Os to be terminated. - (albeit not very serious, ) this behavior is different from today's storage stacks' expectations - of being able to selectively abort one I/O on a timeout (with no command retransmissions). To address these issues, and also to simplify the current Task Management request PDU, I propose the following changes to handling SCSI timeouts - Following changes to section 3.5: - Abort Task MUST always be sent immediate. - Abort Task task management function request MUST be sent with its CmdSN equal to the CmdSN of the task to be aborted, and the Referenced Task Tag initialized to the ITT of the task to be aborted. - Consequent to the above, drop the RefCmdSN field in the Task Management command payload that is currently only used by the Abort Task function. Following changes to section 8.6: Propose the following text to replace the current - An iSCSI initiator MAY attempt to plug a command sequence gap on the target end (in the absence of an acknowledgement of the command by way of ExpCmdSN) before the ULP timeout by retrying the unacknowledged command, as described in section 8.1. On a ULP timeout for a command that carried a CmdSN of n, if the ExpCmdSN is still less than (n+1) on ULP timeout, the iSCSI initiator MUST abort the command using the Abort Task task management function request. In this process, the target may see the abort request before the original command itself due to one of the three reasons - - the original command was dropped due to digest error, or - the Abort Task request was shipped out-of-order on the same connection, or - the connection the original command sent on was successfully logged out. If the abort request is received prior to the original command, targets MUST consider the original command with that CmdSN to be received and discard the original command if and when received - i.e. treating it as a duplicate CmdSN. Initiators desirous of maintaining command ordering while maintaining the same session MUST NOT issue Abort Task on an unacknowledged command because of this reason. Following changes to section 2.2.2.1: - The above approach exposes the possibility that some stale (aborted from target's perspective) commands could be stuck in the TCP connection long enough for the CmdSN wrap - similar to the issue we dealt with for command retries. So, aborting unacknowledged commands should require the same flushing actions described for command retries. [ I almost would prefer at this point to require flushing all connections every 2^31 -1 commands starting from InitCmdSN, than enumerating these cases individually...] Comments? -- Mallikarjun Mallikarjun Chadalapaka Networked Storage Architecture Network Storage Solutions Organization MS 5668 Hewlett-Packard, Roseville. cbm@rose.hp.com
Home Last updated: Thu Nov 15 02:18:05 2001 7820 messages in chronological order |