|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] iSCSI: SCSI timeout handling change
All:
Currently, if a command is not acknowledged by the ULP
timeout, iSCSI mandates the initiators to tear up the session.
The rationale behind this is that if the initiator could not
get the command through (in possibly multiple retries) even
by the ULP timeout, there's a serious problem with the session.
But there are some drawbacks to this approach -
- tearing up a session due to a NIC failure is
disruptive to potentially several other active tasks
on other NICs.
- this puts those initiator implementations not wanting
to do within-connection recovery (i.e. no retries) at
a disadvantage, since one digest error would cause
potentially several active I/Os to be terminated.
- (albeit not very serious, ) this behavior is different
from today's storage stacks' expectations - of being
able to selectively abort one I/O on a timeout (with
no command retransmissions).
To address these issues, and also to simplify the current Task
Management request PDU, I propose the following changes to handling
SCSI timeouts -
Following changes to section 3.5:
- Abort Task MUST always be sent immediate.
- Abort Task task management function request MUST be sent
with its CmdSN equal to the CmdSN of the task to be aborted,
and the Referenced Task Tag initialized to the ITT of the
task to be aborted.
- Consequent to the above, drop the RefCmdSN field in the
Task Management command payload that is currently only
used by the Abort Task function.
Following changes to section 8.6:
Propose the following text to replace the current -
An iSCSI initiator MAY attempt to plug a command sequence gap on
the target end (in the absence of an acknowledgement of the command
by way of ExpCmdSN) before the ULP timeout by retrying the
unacknowledged command, as described in section 8.1.
On a ULP timeout for a command that carried a CmdSN of n, if the
ExpCmdSN is still less than (n+1) on ULP timeout, the iSCSI initiator
MUST abort the command using the Abort Task task management function
request. In this process, the target may see the abort request
before the original command itself due to one of the three reasons -
- the original command was dropped due to digest error, or
- the Abort Task request was shipped out-of-order
on the same connection, or
- the connection the original command sent on was
successfully logged out.
If the abort request is received prior to the original command,
targets MUST consider the original command with that CmdSN to
be received and discard the original command if and when received -
i.e. treating it as a duplicate CmdSN. Initiators desirous of
maintaining command ordering while maintaining the same session
MUST NOT issue Abort Task on an unacknowledged command because
of this reason.
Following changes to section 2.2.2.1:
- The above approach exposes the possibility that some stale
(aborted from target's perspective) commands could be stuck
in the TCP connection long enough for the CmdSN wrap - similar
to the issue we dealt with for command retries. So, aborting
unacknowledged commands should require the same flushing
actions described for command retries. [ I almost would
prefer at this point to require flushing all connections
every 2^31 -1 commands starting from InitCmdSN, than enumerating
these cases individually...]
Comments?
--
Mallikarjun
Mallikarjun Chadalapaka
Networked Storage Architecture
Network Storage Solutions Organization
MS 5668 Hewlett-Packard, Roseville.
cbm@rose.hp.com
Home Last updated: Thu Nov 15 02:18:05 2001 7820 messages in chronological order |