iSCSI: SCSI timeout handling change

To: ips <ips@ece.cmu.edu>
Subject: iSCSI: SCSI timeout handling change
From: "Mallikarjun C." <cbm@rose.hp.com>
Date: Tue, 13 Nov 2001 11:21:25 -0800
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
Organization: Hewlett-Packard, Roseville
Reply-To: cbm@rose.hp.com
Sender: owner-ips@ece.cmu.edu

All:

Currently, if a command is not acknowledged by the ULP 
timeout, iSCSI mandates the initiators to tear up the session.
The rationale behind this is that if the initiator could not
get the command through (in possibly multiple retries) even
by the ULP timeout, there's a serious problem with the session.
But there are some drawbacks to this approach -

        - tearing up a session due to a NIC failure is 
          disruptive to potentially several other active tasks
          on other NICs.
        - this puts those initiator implementations not wanting
          to do within-connection recovery (i.e. no retries) at
          a disadvantage, since one digest error would cause 
          potentially several active I/Os to be terminated.
        - (albeit not very serious, ) this behavior is different 
          from today's storage stacks' expectations - of being 
          able to selectively abort one I/O on a timeout (with 
          no command retransmissions).

To address these issues, and also to simplify the current Task
Management request PDU, I propose the following changes to handling 
SCSI timeouts -

Following changes to section 3.5:

- Abort Task MUST always be sent immediate. 

- Abort Task task management function request MUST be sent 
  with its CmdSN equal to the CmdSN of the task to be aborted, 
  and the Referenced Task Tag initialized to the ITT of the 
  task to be aborted.

- Consequent to the above, drop the RefCmdSN field in the 
  Task Management command payload that is currently only 
  used by the Abort Task function.

Following changes to section 8.6:

Propose the following text to replace the current -

An iSCSI initiator MAY attempt to plug a command sequence gap on
the target end (in the absence of an acknowledgement of the command
by way of ExpCmdSN) before the ULP timeout by retrying the
unacknowledged command, as described in section 8.1.

On a ULP timeout for a command that carried a CmdSN of n, if the
ExpCmdSN is still less than (n+1) on ULP timeout, the iSCSI initiator
MUST abort the command using the Abort Task task management function
request.  In this process, the target may see the abort request 
before the original command itself due to one of the three reasons -
	- the original command was dropped due to digest error, or 
  	- the Abort Task request was shipped out-of-order 
          on the same connection, or
	- the connection the original command sent on was
          successfully logged out.

If the abort request is received prior to the original command, 
targets MUST consider the original command with that CmdSN to 
be received and discard the original command if and when received - 
i.e. treating it as a duplicate CmdSN.  Initiators desirous of 
maintaining command ordering while maintaining the same session 
MUST NOT issue Abort Task on an unacknowledged command because 
of this reason.

Following changes to section 2.2.2.1:
- The above approach exposes the possibility that some stale
  (aborted from target's perspective) commands could be stuck
  in the TCP connection long enough for the CmdSN wrap - similar
  to the issue we dealt with for command retries.  So, aborting
  unacknowledged commands should require the same flushing
  actions described for command retries. [ I almost would 
  prefer at this point to require flushing all connections
  every 2^31 -1 commands starting from InitCmdSN, than enumerating 
  these cases individually...]

Comments?
-- 
Mallikarjun 


Mallikarjun Chadalapaka
Networked Storage Architecture
Network Storage Solutions Organization
MS 5668	Hewlett-Packard, Roseville.
cbm@rose.hp.com

Prev by Date: RE: FCencap: List ALL SOF/EOF codes
Next by Date: RE: FCencap: List ALL SOF/EOF codes
Prev by thread: iSCSI: The new security normative statements in 09
Next by thread: Re: iSCSI: SCSI timeout handling change
Index(es):
- Date
- Thread

Home

Last updated: Thu Nov 15 02:18:05 2001
7820 messages in chronological order