RE: iSCSI Reqts: In-Order Delivery

To: "Charles Monia" <cmonia@NishanSystems.com>, "Santosh Rao \(E-mail\)" <santoshr@cup.hp.com>
Subject: RE: iSCSI Reqts: In-Order Delivery
From: "Douglas Otis" <dotis@sanlight.net>
Date: Mon, 23 Apr 2001 12:09:44 -0700
Cc: "Ips \(E-mail\)" <ips@ece.cmu.edu>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;charset="iso-8859-1"
Importance: Normal
In-Reply-To: <B300BD9620BCD411A366009027C21D9B17344E@ariel.nishansystems.com>
Sender: owner-ips@ece.cmu.edu
Charles,

Your solution requires a fair amount of tracking of commands based solely on
their Client Tags.  These Tags are randomly generated but will need to
retain sequential order for your scheme.  The transport must remember the
type of command sent together with their relative placement based only on
the Client Tag.  In addition, these commands will need to be placed into
different categories.  Those commands executed out of sequence by means of a
bypass flag, those commands that are Task Management commands, and commands
affected by these other types of commands.  It seems that in large part,
these concerns can be met with proper handing of the transport without such
laborious sorting of the Client Tags.  The out-of-sequence or bypass flag
also depends on the transport sorting the Client Tag.  In addition to
disabling flow-control, this technique of not incrementing the serialization
of these commands, requires all commands with the same serialization value
to be sent on the same connection without acknowledgment, if these commands
are also to be kept in sequence.  This connection requirement is yet to be
specified.

Ver 6, Pg 12:
   "iSCSI may avoid delivering some command to the
   SCSI layer if so required by some prior SCSI or iSCSI action (e.g.,
   clear task set Task Management request received before all the
   commands it was supposed to act on)."

Here, there seems to be expectations of the iSCSI transport interpreting the
content of the SCSI commands.  How this is done is not obvious.  Is the
transport expected to generate SCSI responses?

In addition, although iSCSI presently relies on ACA, there are few
applications that implement ACA.  It would appear for iSCSI to work with the
present protocol, significant application changes are required.  With the
proposal I am suggesting, this is not a problem as all bypassed commands are
rejected back to the Initiator.  The drivers that implement iSCSI will be
required to provide handling for these commands that bypass other commands.
The amount of information contained in a rejected command list should be
relatively small and these occasions for such Management rare.  Without
proper handling of these events, there will be 2:00 AM alarm pagers going
off.

Here in the proposal, sorting CmdSN based on LUN values takes place within a
"Barrier List."  I can not tell what is implied by these recovery
instructions.  What is meant by Remove, Release, Drop, Cleanup, Placeholder,
and ALL.  What is the intended feedback to the initiator for this Clean-up?
It would appear the transport works on behalf of the target.  In the
proposal that I am suggesting, there is no actions within the transport on
behalf of the target.  All decisions are done either by the Target or the
Initiator.  None by the transport.

The concept is simple.  Keep the transport simple.  Do not expect the
transport to decipher SCSI commands.  Do not expect the transport to respond
on behalf of the Target.  Do not expect the transport to sort pending
commands based on LUN value.  Do not expect the transport to require SCSI
and iSCSI ACA.

In the case of session wide serialization, what is good for the goose is
also good for the gander.  It is important from the prospect of quickly
detecting an error and knowing the server state to also use session wide
serialization from the server.  The technique of replicating Management
commands down each connection in addition to changing global commands into
specific commands already over burdens the set-aside that must be made to
handle these non-serialized management commands.  My proposal eliminates the
problem of set-aside resources and loss of server state.  Rather than
silently rejecting commands out-of-sequence, these rejections are reported.
Once done, this feature can be used to extract pending commands in a simple
and direct manner without burdening the transport.

As attempts are made to support the SCSI architecture, rather than
increasing the intelligence of the transport, efforts should be made to
simplify the transport.  The number of fields that the transport must
manipulate will be met with complexity and non-uniform implementation.

See:
http://www.ietf.org/internet-drafts/draft-otis-iscsi-fullack-00.txt

Ver 6, Pg 92:
     "N.B. As an alternative to Logout and reissue commands, the
      initiator MAY instead reset the target and terminate all
      outstanding commands with a service response indicating
      Delivery Subsystem Failure. The initiator MUST perform one of
      the two actions."

...

Ver 6, Pg 93:
   "The following general mechanism can be used to achieve the effect of
   ordered delivery for task management commands while enabling the
   "urgent" delivery that some of them imply and immediate execution of
   the task management commands without:

      At Initiator when a relevant task management command is issued:

         a) if ExpCmdSN is equal to CmdSN skip to step c
         b) mark all pending commands with a CmdSN field between
         ExpCmdSN and the current CmdSN and a relevant LUN as
         candidates for cleanup and retain CmdSN in a "barrier list".
         c) send the task management command for immediate delivery
         to the target

      At initiator when updating ExpCmdSN:

         a) if the "barrier list" is empty or ExpCmdSN is less than
         the first entry in the barrier list then skip to step d
         b) remove the barrier list entry and remove and drop all
         entries marked for cleanup having a CmdSN field less than
         ExpCmdSN
         c) go to step a
         d) release all queued entries between the old and new
         ExpCmdSN from the queue

      At target when receiving a relevant task management command for
      immediate delivery:

         a) if ExpCmdSN is equal to CmdSN skip to step c
         b) mark all pending entries (commands received and
         placeholders) with a CmdSN field between ExpCmdSN and the
         current CmdSN as candidates for cleanup and retain CmdSN in
         a "barrier list" including the referenced LUN (or an ALL
         marker)
         c) send the task management command to SCSI for immediate
         execution

      At target when updating ExpCmdSN (releasing ordered commands to
      SCSI):

         a) if the "barrier list" is empty or ExpCmdSN is less than
         the first entry in the barrier list then skip to step d
         b) remove the barrier list entry and remove and drop all
         entries marked for cleanup and having the same LUN as the
         barrier entry (any if the barrier is marked ALL) and a CmdSN
         field less than ExpCmdSN
         c) go to step a
         d) release all queued entries between the old and new
         ExpCmdSN from the queue

   Note that this scheme will withstand connection recovery."

Doug

> Hi Santosh:
>
> Please see below.
>
> > Charles Monia wrote:
> >
> > > > (1) MUST provide ordered delivery of SCSI commands from
> > > >       the initiator to the target in the absence of transport
> > > >       errors visible to iSCSI (e.g., iSCSI CRC failure,
> > > >       unexpected TCP connection closure).
> > >
> > > Does the term "SCSI commands" include task management
> > functions as well?  If
> > > not, it should.
> >
> >
> > Charles,
> >
> > Could iSCSI use a variant of the approach FCP-2 takes to solve the
> > ordering issue for task mgmt error recovery ?
> >
> > The FCP-2 task management error recovery scheme is :
> > - task mgmt function uses CRN 0
> > - task mgmt function is executed immediately with no ordering
> > latencies
> > - both initiator & target clear all resources that can be cleared
> > un-ambiguously.
> > - any ambiguous exchanges shall be aborted by the port that
> > detects the
> > ambiguous state.
> >
> > In the case of iSCSI, an analogous approach could be :
> > - task mgmt function uses immediate delivery flag for the
> > task mgmt PDU.
> > - task mgmt fn executed immediately avoiding any ordering latencies.
> > - initiator & target clear all resources that can be cleared
> > un-ambiguously.
> > - initiator uses Abort Task to explicitly abort all active outstanding
> > I/Os at the time the task mgmt fn was issued to avoid any ambiguous
> > stale PDUs of an exchange from appearing at the target.
> >
> > Such an approach would avoid latencies on the execution of
> > the task mgmt
> > fn while still flushing out all the stale PDUs upon completion of the
> > initiator actions for that task mgmt fn.
> >
>
> The problem is to avoid scenarios where the initiator and target's view of
> the task set are out of step.  Specifically, we must avoid the
> case where an
> initiator receives a PDU from a task it believes has been terminated.
>
> In that respect, the technique you describe above should work for an ABORT
> TASK operation.
>
> In the case of ABORT TASK SET, the function could be emulated by issuing a
> series of ABORT TASK requests. For CLEAR TASK SET, an initiator would
> probably want to do the individual ABORT TASK operations, followed by a
> CLEAR TASK SET to terminate tasks from other initiators.  I assume TARGET
> RESET and LUN RESET would be emulated in a manner similar to
> CLEAR TASK SET.
> In all of these cases there may be some "atomicity" side effects caused by
> doing things one at a time instead of all at once.
>
> The only sticky problem is insuring that the CLEAR ACA function
> works right.
> By that I mean that you don't want to issue the function until all prior
> SCSI commands that were in flight when the ACA occurred have been
> terminated
> with the ACA ACTIVE status.  You can't simply replicate the
> command on each
> connection since you might inadvertently clear a subsequent ACA. (Yes -- I
> know these are all edge cases, but we may as well try to get it right.)
> Maybe the thing to do is implement the function such that the ACA
> interlock
> is not cleared until the CLEAR ACA function is sent on all the connections
> comprising the session.
>
> One minor distinction worth noting is that CRN is enforced in the SCSI
> layer, whereas cmdSN is enforced in the iSCSI transport.  So, a CRN of 0
> doesn't take effect until the transport presents the command to the SCSI
> layer for processing.  In that case, leapfrogging of PDU ordering never
> occurs.
>
> Incidentally, I've made the tacit assumption that commands on a given
> connection are presented to the SCSI layer in order they were sent,
> regardless of whether or nor cmdSN was set to 0.  I assume the framing
> mechanisms that have been discussed for buffer offloading do not
> affect this
> behavior.  I.e., a fully formed PDU slated for immediate delivery won't be
> passed to the SCSI layer before a partially complete PDU that was received
> earlier.
>
> If that's true, immediate delivery seems to have no meaning in a
> single-connection scenario.  What's more, in all cases, the iSCSI layer
> doesn't really have to be aware of task management semantics -- unless
> someone decides to intermix immediate and sequential commands in a
> multi-connection session.  Then all bets are off.
>
> Charles
>
References:
- RE: iSCSI Reqts: In-Order Delivery
  - From: Charles Monia <cmonia@NishanSystems.com>
Prev by Date: RE: iSCSI: Re: iSCSI & Linked Commands
Next by Date: RE: Target Reset
Prev by thread: RE: iSCSI Reqts: In-Order Delivery
Next by thread: RE: iSCSI Reqts: In-Order Delivery
Index(es):
- Date
- Thread
Home
Last updated: Tue Sep 04 01:04:55 2001
6315 messages in chronological order