Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"

To: julian_satran@il.ibm.com
Subject: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
From: Santosh Rao <santoshr@cup.hp.com>
Date: Thu, 05 Apr 2001 19:22:09 -0700
Cc: ips@ece.cmu.edu
Content-Type: multipart/mixed;boundary="------------DE45A889A782FB2E095710DE"
Organization: Hewlett Packard, Cupertino.
References: <C1256A26.0008FB1A.00@d12mta02.de.ibm.com>
Sender: owner-ips@ece.cmu.edu

julian_satran@il.ibm.com wrote:
> 
> Santosh,
> 
> SNACK and SACK are the same thing (I just renamed them to avoid confusion
> with TCP SACK).
> The status is acked by ExpStatSN (and only indirectly by SNACK). SNACK
> enables fast recovery of
> a hole (whithout having to resort to a timeout).

Julian,

The bottom line is that the current SNACK mechanism as defined in the
spec will NOT work if it is made optional, and at the same time, it is
too expensive to mandate the SNACK mechanism. 

The current SNACK mechanism is really a negative ACK requesting the
target to re-send the status PDU. This mechanism has 2 dis-advantages :

a) requires targets to retain I/O state information until StatSN is
acknowledged.
b) Does not allow forward progress with the release of I/O resources in
the event that a target could not retain that state information or for
some other reason could not service the SNACK.

I am suggesting that the alternate model of SACK be used, wherein, a
SACK is an individual ACK of a received status PDU. This SACK only kicks
in on detection of a hole. The hole is implicitly plugged by the
initiator on eventual completion of the command 
[on timeout followed by abort or retry].

The advantage of this alternate model is :
a) Does not require state information to be stored at targets beyond I/O
completion.
b) Allows a more reliable mechanism of resource release.

The dis-advantage of this mechanism is :
a) It results in I/O timeout when Status PDU was dropped due to a digest
error.

Once again, the question boils down to the rate of TCP checksum escapes
and the probability of such escapes affecting status PDUs. If this is
low enough, such a timeout on a digest error of a status PDU should be
acceptable. 

>  We decided long ago
> against individual acks as bulk acking through a window is cheaper and
> safer (repetition).

I am not suggesting removal of bulk ack scheme. My suggestion is that
SACK kick in on a hole and the initiator revert to bulk ACK scheme once
it considers the hole to be plugged (thru the eventual completion of the
I/O on the timeout path followed by abort or retry).

- Santosh


> 
> Julo
> 
> Santosh Rao <santoshr@cup.hp.com> on 05/04/2001 21:06:18
> 
> Please respond to Santosh Rao <santoshr@cup.hp.com>
> 
> To:   Julian Satran/Haifa/IBM@IBMIL
> cc:   ips@ece.cmu.edu
> Subject:  Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
> 
> Julian,
> 
> The existing StatSN SNACK mechanism will NOT work if it is made
> optional. The original request that was made in the thread
> http://ips.pdl.cs.cmu.edu/mail/msg03257.html was to allow a SACK
> mechanism that would allow individual Status PDUs to be acknowledged
> [not SNACK which requests re-send of missing Status PDU, and thereby,
> requires the target to retain state information until StatSN is
> acknowledged].
> 
> If StatSN SNACK is made optional, a target that does not support SNACK
> will result in holes never being filled in StatSN sequence, and thereby,
> initiators being unable to acknowledge status PDUs received after the
> hole. This can cause targets to hold onto stale I/O state information
> for very long periods. [or forever].
> 
> With the current StatSN SNACK scheme, a target can NEVER discard its old
> I/O state information, since if it does so, it cannot satisfy SNACK
> requests. If SNACK requests are not satisfied, holes remain in the
> StatSN sequence at the initiator and it cannot acknowledge Status PDUs
> received thereafter.
> 
> If we must retain the StatSN mechanism in iSCSI, then, the SACK
> mechanism [as opposed to a SNACK], wherein, the initiator ack's
> individual status PDUs received when a hole occurs should be the
> preferred scheme. This alows both sides to continue the handshake of
> resource release even in the presence of holes, without imposing
> requirements on targets to retain I/O state information.
> 
> The holes created in StatSN are implicitly filled by the initiator based
> on the result of its "retry" of the failed command. Alternatively, the
> StatSN hole is considered to be filled if the initiator chooses not to
> retry the command [ex: on ULP timeout].
> 
> - Santosh
> 
> julian_satran@il.ibm.com wrote:
> 
> >
> > Mallikarjun,
> >
> > You are right. Too much travel and jet lag.  The reason we made SACK and
> > status recovery
> > practically a MUST is that without them we are bound to have only session
> > drop as an alternative.
> > If the target does not keep any information after it has sent out status
> it
> > can't even retry a command.  And if it can retry a command it should be
> > able to do SACK.
> >
> > But perhaps there is a place in the market for the kind of devices Somesh
> > is suggesting that do all recovery at SCSI level (and that can't copy a
> > terabyte of data without a session drop).
> >
> > If that is true (which I doubt) we can make SNACK support optional.
> >
> > Julo
> >
> > "Mallikarjun C." <cbm@rose.hp.com> on 05/04/2001 04:09:46
> >
> > Please respond to cbm@rose.hp.com
> >
> > To:   ips@ece.cmu.edu
> > cc:
> > Subject:  Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
> >
> > >
> > >Santosh,
> > >
> > >I can't find the place where this is stated. SNACK as a PDU type is
> > >mandated. But it can be rejected outright.
> >
> > Sorry, you agreed that status SACK is mandatory in ERT forum last
> > week in response to my comments.  Has there been a change in your
> opinion?
> >
> > Attached is the email in a long email thread (issue 3) where you agreed
> > to make this explicit in rev06.
> > --
> > Mallikarjun
> >
> > Mallikarjun Chadalapaka
> > Networked Storage Architecture
> > Network Storage Solutions Organization
> > MS 5668   Hewlett-Packard, Roseville.
> > cbm@rose.hp.com
> >
> > >1.2.2.2 show explicitely that SACK can be rejected. We can add a
> protocol
> > >specific parameter in the target Logical Unit Control Page (non-setable)
> > by
> > >which the target will indicate support for SNACK.
> > >
> > >Julo
> >
> > Santosh Rao <santoshr@cup.hp.com> on 04/04/2001 23:53:32
> >
> > Please respond to Santosh Rao <santoshr@cup.hp.com>
> >
> > To:   Julian Satran/Haifa/IBM@IBMIL
> > cc:   Jon Hall <jhall@emc.com>, ips@ece.cmu.edu
> > Subject:  Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
> >
> > julian_satran@il.ibm.com wrote:
> > >
> > > Jon,
> > >
> > > Inexpensive implementation are always free to do away with recovery.
> That
> > > si true for targets too.
> >
> > That's not the interpretation one gets from reading the spec and prior
> > discussions on this list. Per the spec, support for Status SACK is
> > mandatory while support for data SACK is optional.
> >
> > IOW, targets MUST retains state information to satisfy a potential
> > status SACK request.
> >
> > - Santosh
> >
> >
> ------------------------------------------------------------------------------
> 
> >
> > >From julian_satran@il.ibm.com Tue Mar 27 05:16:54 PST 2001
> > Received: from mailhub.rose.hp.com (mailhub.rose.hp.com [15.96.64.24]) by
> > core.rose.hp.com with ESMTP (8.8.6 (PHNE_14041)/8.8.6 SMKit7.02) id
> > FAA26277 for <cbm@core.rose.hp.com>; Tue, 27 Mar 2001 05:16:52 -0800
> (PST)
> > From: julian_satran@il.ibm.com
> > Received: from atlrel2.hp.com (atlrel2.hp.com [15.10.184.10]) by
> > mailhub.rose.hp.com with ESMTP (8.7.1/8.7.3 SMKit7.02) id FAA10600 for
> > <cbm@rose.hp.com>; Tue, 27 Mar 2001 05:15:51 -0800 (PST)
> > Received: from d06lmsgate.uk.ibm.COM (d06lmsgate.uk.ibm.com
> [195.212.29.1])
> >      by atlrel2.hp.com (Postfix) with ESMTP id AFC11120
> >      for <cbm@rose.hp.com>; Tue, 27 Mar 2001 08:15:49 -0500 (EST)
> > Received: from d12relay01.de.ibm.com (d12relay01.de.ibm.com
> [9.165.215.22])
> >      by d06lmsgate.uk.ibm.COM (1.0.0) with ESMTP id NAA50078;
> >      Tue, 27 Mar 2001 13:55:50 +0100
> > Received: from d12mta02.de.ibm.com (d12mta02_cs0 [9.165.222.253])
> >      by d12relay01.de.ibm.com (8.8.8m3/NCO v4.95) with SMTP id PAA174358;
> >      Tue, 27 Mar 2001 15:09:58 +0200
> > Received: by d12mta02.de.ibm.com(Lotus SMTP MTA v4.6.5  (863.2
> 5-20-1999))
> > id C1256A1C.00483A46 ; Tue, 27 Mar 2001 15:08:55 +0200
> > X-Lotus-FromDomain: IBMIL@IBMDE
> > To: cbm@rose.hp.com
> > Cc: someshg@yahoo.com, steph@cs.uchicago.edu, hufferd@us.ibm.com,
> >         cbm@rose.hp.com, ldalleore@snapserver.com,
> >         Venkat Rangan <venkat@rhapsodynetworks.com>, Black_David@emc.com
> > Message-ID: <C1256A1C.0048399C.00@d12mta02.de.ibm.com>
> > Date: Tue, 27 Mar 2001 15:12:01 +0200
> > Subject: Re: iSCSI ERT: error recovery comments
> > Mime-Version: 1.0
> > Content-type: text/plain; charset=us-ascii
> > Content-Disposition: inline
> > Status: RO
> >
> > Comments in text.  Thanks Julo
> >
> > "Mallikarjun C." <cbm@rose.hp.com> on 27/03/2001 01:41:48
> >
> > Please respond to cbm@rose.hp.com
> >
> > To:   cbm@rose.hp.com, someshg@yahoo.com, steph@cs.uchicago.edu, Julian
> >       Satran/Haifa/IBM@IBMIL, John Hufferd/San Jose/IBM@IBMUS
> > cc:   Black_David@emc.com
> > Subject:  iSCSI ERT: error recovery comments
> >
> > Hi Julian and the Team,
> >
> > Here are some comments on error recovery issues.  I hope these will
> > be addressed soon.  Thanks.
> >
> > 1. The draft should clearly state that if a target doesn't support
> >   retry (replay in my previous memo's terminology), it must not silently
> >   accept a command with retry bit and re-do the I/O.
> >
> > 2. Consequent to the above -
> >      - Clarification required on section 6.7.1, page 83, last para.
> >           Please confirm and clarify in the draft: If the target sends
> >           a response with an iSCSI error response of "SACK-rejected" that
> >           implicitly terminates the task - no retries are allowed. If the
> >           target sends a Reject PDU with "Data SACK Reject" code, the
> task
> >           stays open and the initiator may try to recover using
> SACK/retry.
> > +++ I will clarify
> > it will read:
> >    An iSCSI target MAY reject a data-SNACK and terminate the command with
> >    an iSCSI command response of SNACK rejected. In this case, the task is
> >    terminated and no future action is expected at target and initiator.
> >
> >    Alternatively, an iSCSI target MAY reject a data-SNACK with a reject
> >    response of data SNACK rejected. In this case the task is still open
> and
> >    may be recovered using the retry.
> >
> > +++
> >         - On a data digest error on a data PDU without the F-bit, the
> draft
> >           states that the target must wait for a data PDU with the F-bit
> >           (per section 6.2), then a command termination is signalled with
> >           a Reject PDU!  I like the formulation in 2.4.2 better.  I
> > strongly
> >           recommend that similarly, the target shall send a SCSI Response
> >           with a iSCSI response of "delivery subsystem failure".  In
> > general,
> >           I suggest that anytime a target terminates a task internally,
> it
> >           must generate a SCSI Response PDU with an appropriate response
> > code.
> > +++ It reads now:
> >
> >    When a target receives an iSCSI PDU with a header digest error or a
> >    payload digest error in an iSCSI PDU, it MUST answer with a Reject
> iSCSI
> >    PDU with a Reason-code of Header-Digest-error or Data-Digest-Error and
> >    discard the offending PDU.  If the error is a Data-Digest-Error in a
> >    Data-PDU, the target MUST either request retransmission with a R2T or
> >    answer with a command response PDU with a response-code of
> >    delivery-subsystem-failure and abort the task. If the target is
> >    answering with an error in the command response PDU it must wait for
> the
> >    target to receive all the data (signaled by a Data PDU with the final
> >    bit Set for all outstanding R2Ts) the command response PDU.
> > ++++
> >
> > 3. While the following is implied in different sections, it is not
> >   obvious.  Please clarify the following in the draft - "Status SACK
> >   support is mandatory, whereas data SACK support is not."
> >
> > +++ will do in 2.16.1 +++
> >
> > 4. The general policy of retry should be that all ordered commands
> >   shall support retry bit, since the loss of an ordered command
> >   creates a hole in target scoreboarding and stalls the target
> >   pipeline.  Retry hopefully can plug the hole quickly to avoid this.
> >
> > 5. As a fallout of the above comment, Retry bit must be supported
> >   for Text Commands.
> >
> > +++ I have added the X-bit.  The reason I did no earlier was that I could
> > not foresee
> > a case in which the command is not idempotent - I can allways be resent -
> > but I guess it is cleaner with the X +++
> >
> > 6. Section 2.20, page 71 on Reject must specify if a retry of the
> operation
> >   is allowed for each Reject PDU reason code.  Lack of specification
> could
> >   lead to interoperability issues down the road with "retry wars" raging
> >   between heterogeneous implementations (ex., target rejects the retry
> bit,
> >   initiator retries the "retry" bit,....).
> > +++ the part now reads:
> >
> >    The reject Reason is coded as follows:
> >
> >       1 - Format Error
> >       2 - Header Digest Error
> >       3 - Data (payload) Digest Error
> >       4 - Data-SNACK Reject
> >       5 - Command Retry Reject
> >       15 - Full Feature Phase Command before login
> >
> >       Some of the reject reasons terminate or prevent the creation of a
> >       task at the target and no retry is possible in those cases. Format
> >       error for a command, Command Retry Reject and Full Feature Phase
> >       Command before login are in this category.
> >
> > 7. NOP-OUT does not require CmdSNs.  Why make it an ordered command
> >   and run the risk of a digest error on it leading to a hole in
> >   command ordering?
> >
> > +++ the reason I wanted it ordered is to check the whole command path -
> but
> > you may try to convince me that it is not a good idea +++
> > --
> > Mallikarjun
> >
> > Mallikarjun Chadalapaka
> > Networked Storage Architecture
> > Network Storage Solutions Organization
> > MS 5668   Hewlett-Packard, Roseville.
> > cbm@rose.hp.com
>  - santoshr.vcf

begin:vcard 
n:Rao;Santosh 
tel;work:408-447-3751
x-mozilla-html:FALSE
org:Hewlett Packard, Cupertino.;SISL
adr:;;19420, Homestead Road, M\S 43LN,	;Cupertino.;CA.;95014.;USA.
version:2.1
email;internet:santoshr@cup.hp.com
title:Software Design Engineer
x-mozilla-cpt:;21088
fn:Santosh Rao
end:vcard

References:
- Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
  - From: julian_satran@il.ibm.com

Prev by Date: SNACK and recovery
Next by Date: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
Prev by thread: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
Next by thread: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:05:10 2001
6315 messages in chronological order