SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"



    
    Julian,
    
    I don't understand.  Are you saying that an "expensive" target
    will implement specific error recovery mechanisms for very rare
    events?  Or are you saying that this case is not a rare event?
    
    If the former, there is a problem of completeness (e.g., should
    there be recovery procedures for when the sun goes nova :-).
    If the latter, this would be very interesting and useful to
    know about...
    
    -Jon
    
    julian_satran@il.ibm.com writes:
    >
    >Jon,
    >
    >Inexpensive implementation are always free to do away with recovery. That
    >si true for targets too.
    >But not specifying the mechanism for the more expensive one we make them
    >non-interoperable.
    >
    >Julo
    >
    >"Jon Hall" <jhall@emc.com> on 04/04/2001 22:55:40
    >
    >Please respond to "Jon Hall" <jhall@emc.com>
    >
    >To:   ips@ece.cmu.edu
    >cc:
    >Subject:  Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
    >
    >
    >But CRC errors are not really the issue.  It is the
    >singular case of a TCP cksum failing to detect what a
    >CRC succeeds in detecting, and this occurring to a TCP
    >segment containing an iSCSI hdr with a StatSN.
    >
    >Is there a reason to believe that iSCSI StatSNs will be
    >lost at a higher rate than is currently documented for TCP
    >cksum failure?  Or, is the problem a loss of one TCP segment
    >in tens (possibly hundreds) of millions of segments.  Where
    >the bad segment may contain a StatSN but probably doesn't
    >because it is a data pdu.  If the latter, why does a SCSI-level
    >timeout and retry (on the initiator) not suffice?  [Note,
    >an initiator timeout/retry does not require a connection
    >to be closed.]
    >
    >I realize that I am being annoyingly repetitious, but it is
    >not an idle question.  For some targets, retained rsp status
    >is not cheap (and retained rsp data is not tractable at all).
    >
    >IMO there appears to be no real need for SNACK.  And, more
    >radically, there appears to be no need for StatSNs.
    >
    >Maybe, as Somesh said, this is a dead horse but why include
    >something in the spec which suggests a need for target-side
    >complexity, while not solving a clear and compelling
    >requirement?
    >
    >-Jon
    >
    >julian_satran@il.ibm.com writes:
    >>
    >>SNACK is here for two reasons - Status retry (which is cheap) and Data
    >>retry as a side benefit.
    >>CRC errors are not that rare (although we don't have real data the
    >>simulation with file systems seem to indicate that numbers could be as
    >high
    >>a 0.0002%). A restart of link - is expensive (slow start) and even if they
    >>are far lower for many applications a slow start is a painfull event.
    >>
    >>Removing them from the spec is not a path we should take lightly.
    >>
    >>Julo
    >>
    >>"Jon Hall" <jhall@emc.com> on 02/04/2001 16:13:35
    >>
    >>Please respond to "Jon Hall" <jhall@emc.com>
    >>
    >>To:   ips@ece.cmu.edu
    >>cc:
    >>Subject:  Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
    >>
    >>
    >>
    >>
    >>
    >>I agree with Somesh.  And would go farther -- the complexity
    >>that results from retaining enough target-side state to respond
    >>to a SACK/SNACK request is non-trivial and needs clear justification.
    >>Intuitively, a CRC that discovers an error in an iSCSI pdu header
    >>(that the TCP cksum missed) seems like it should be a rare event.
    >>
    >>What is the frequency of this event?  IMO the answer to this
    >>question should be written into the protocol spec -- assuming
    >>that it substantiates the benefit of SACK/SNACK.  Otherwise, the
    >>SACK/SNACK pdu should be removed.
    >>
    >>-Jon
    >>
    >>julian_satran@il.ibm.com writes:
    >>>
    >>>Somesh,
    >>>
    >>>As I stated earlier - the DataSN was created to detect missing data PDUs.
    >>>SNACK is needed to recover missing StatusSN and missing dataSN is only a
    >>>bonus if the target wants to support it.  It is a trivial mechanism and I
    >>>think it should stay.
    >>>
    >>>Julo
    >>>
    >>>"Somesh Gupta" <someshg@yahoo.com> on 31/03/2001 02:25:52
    >>>
    >>>Please respond to someshg@yahoo.com
    >>>
    >>>To:   Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu
    >>>cc:
    >>>Subject:  RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
    >>>
    >>>
    >>>
    >>>
    >>>Sorry to have been missing for a while. Hope you will
    >>>appreciate my being back in action :-). It was a fairly
    >>>clear consensus in Orlando that applications broke up
    >>>their transfers into reasonably small chunks i.e. they
    >>>did not have very long running transfers.
    >>>
    >>>Therefore the consensus was that a command level recovery
    >>>mechanism was sufficient instead of an ack/sack for each
    >>>data PDU.
    >>>
    >>>The SACK mechanism was a post Orlando invention. Without
    >>>an ack mechanism (for every data PDU), the SACK mechanism
    >>>just imposes additional burden on either end of the session,
    >>>without really much benefit.
    >>>
    >>>The benefit of having SACK is of saving bandwidth in case
    >>>the data part of the data PDU failed an integrity check
    >>>(but passed TCP checksum). This is a rare enough case that
    >>>as a percentage, the bandwidth loss from retransmitting
    >>>all the data associated with a read or write command is
    >>>very very small.
    >>>
    >>>In addition, it avoids the complexity of restarting
    >>>something from the middle, as compared to from the begining.
    >>>
    >>>To me it seems that there is significant simplicity (from
    >>>implementation, reliability and recovery process) from
    >>>having smaller data transfer per command.
    >>>
    >>>I would really like to get rid of the SACK command.
    >>>
    >>>Somesh
    >>>
    >>>> -----Original Message-----
    >>>> From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
    >>>> julian_satran@il.ibm.com
    >>>> Sent: Wednesday, March 28, 2001 6:57 AM
    >>>> To: ips@ece.cmu.edu
    >>>> Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
    >>>>
    >>>>
    >>>>
    >>>>
    >>>> Mallikarjun,
    >>>>
    >>>> Last summer I thought that recovery within a connection should be left
    >>to
    >>>> TCP. It is simple and could be made available through IPsec (if no new
    >>>> option of any form can be added).
    >>>>
    >>>> Two things killed this:
    >>>>
    >>>>    The requirement to have a data encapsulation that can pass through
    >>>>    application proxies (like a storage router)
    >>>>    The "NO WAY" message we got from IESG-Security on a CRC only IPSec
    >>>>    header
    >>>>
    >>>>
    >>>> As for the ACK - I am very much in favor of it (it is a no brainer) and
    >>>> implementations are in fact allowed to drop even unacked data.
    >>>>
    >>>> I am bound by the Orlando meeting decision to drop it. Except the
    >>regular
    >>>> "oppose everything" crowd the two vocal opponents where Somesh Gupta
    >and
    >>>> Matt Wakeley.
    >>>>
    >>>> David may want or not to re-open the issue - I am not going to ask for
    >>>it.
    >>>>
    >>>> Regards,
    >>>> Julo
    >>>>
    >>>> "Mallikarjun C." <cbm@rose.hp.com> on 28/03/2001 00:45:02
    >>>>
    >>>> Please respond to cbm@rose.hp.com
    >>>>
    >>>> To:   Black_David@emc.com
    >>>> cc:   Julian Satran/Haifa/IBM@IBMIL, cbm@rose.hp.com,
    >someshg@yahoo.com,
    >>>>       steph@cs.uchicago.edu, John Hufferd/San Jose/IBM@IBMUS,
    >>>>       ldalleore@snapserver.com, venkat@rhapsodynetworks.com
    >>>> Subject:  RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
    >>>>
    >>>>
    >>>>
    >>>>
    >>>> David and Julian,
    >>>>
    >>>> I appreciate both your views, and should I say that they're
    >>>> along predicted lines :-)
    >>>>
    >>>> - David's right in saying that the situation is akin to FC's.
    >>>>   However, I would like to point out that FC is an unreliable
    >>>>   transport, and hence is forced to pick up a lot of the transport
    >>>>   baggage (at least in FCP-2, as I understand), in addition
    >>>>   to being a SCSI encapsulation layer.  Unfortunately, even with
    >>>>   TCP being the "reliable" transport, iSCSI is going along the
    >>>>   same lines - ie. transport baggage + SCSI encapsulation.  My
    >>>>   point is - if this is indeed a necessary evil, why don't we
    >>>>   complete iSCSI's transport functionality by data-ACKs?
    >>>>
    >>>> - If data SACK is introduced mostly to make up for TCP's shortcomings,
    >>>>   we're making its usage (and implementation) drastically less
    >appealing
    >>>>   since the only way error recovery algorithms can *rely* on data SACK
    >>>>   is when replay is supported (or, "ReplaySupport=yes"  in my
    >proposal),
    >>>>   which is extremely expensive.  IOW, we're defining data SACK in the
    >>>>   draft and not providing any incentives to implement and use it!
    >>>>
    >>>> - I submit that since iSCSI is being hailed as the ideal SCSI Transport
    >>>>   protocol in its definition so far (and I believe, rightly so -
    >>>mandating
    >>>>   command ordering, bi-di support, SCSI CRN support to name a few
    >>>> examples),
    >>>>   the perfectly SCSI-legal R/W interactions that break in other
    >>>transports
    >>>>   *do not* have to break in iSCSI.
    >>>>
    >>>> - A last idea (may seem radical at this point) in regards to iSCSI
    >>>>   being a "full transport". This provides us an opportunity to "cast
    >>>>   off" the transport baggage in future when we truly move to a
    >>"reliable"
    >>>>   transport (perhaps TCP with CRCs/SCTP ?) - if we do a good job of
    >>>>   keeping the encapsulation stuff separate from the transport stuff.
    >>>>   (Julian, I heard from Randy that ideas similar to this were explored
    >>>>   in your Haifa meeting.  And yes, he recalls they were given up since
    >>>>   TCP was supposed to be reliable and granularity of recovery was
    >deemed
    >>>>   one I/O.)
    >>>>
    >>>> With that said, may I request David (with his co-chair hat on, :-))
    >>>> to add some binding comments/observations on this discussion?
    >>>>
    >>>> If we decide to leave data SACKs as unattractive to implement, the
    >draft
    >>>> should in the least add a statement like - "Note that satisfying all
    >>>> possible data SACK requests for a task with an unacknowledged status
    >>>> implies implementing the I/O replay buffer on the part of targets."
    >>>> --
    >>>> Mallikarjun
    >>>>
    >>>>
    >>>> Mallikarjun Chadalapaka
    >>>> Networked Storage Architecture
    >>>> Network Storage Solutions Organization
    >>>> MS 5668   Hewlett-Packard, Roseville.
    >>>> cbm@rose.hp.com
    >>>>
    >>>>
    >>>>
    >>>>
    >>>> >I think Julian's basically right -- I would point
    >>>> >out that any case of write after read that breaks
    >>>> >over iSCSI will also break over Fibre Channel.
    >>>> >On FC, the scenario starts with a frame CRC failure
    >>>> >on read data at the Initiator, so applications
    >>>> >have to cope and typically do so by enforcing
    >>>> >ordering at the app rather than using SCSI task
    >>>> >ordering.
    >>>> >
    >>>> >While SCSI has clever tools like ACA and task
    >>>> >ordering that appear to allow dependent operations
    >>>> >to be sent to the target concurrently, in practice
    >>>> >they don't work and/or aren't used (funny thing,
    >>>> >those two reinforce each other ;-) ).  Hence
    >>>> >a minimal approach to them is in order:
    >>>> >- Make sure the result will interoperate.
    >>>> >- Make sure T10 doesn't ding us for leaving something
    >>>> >    completely out.
    >>>> >- Don't specify anything not needed for the above.
    >>>> >
    >>>> >My 0.02,
    >>>> >--David
    >>>> >
    >>>> >> -----Original Message-----
    >>>> >> From:  julian_satran@il.ibm.com [SMTP:julian_satran@il.ibm.com]
    >>>> >> Sent:  Tuesday, March 27, 2001 9:23 AM
    >>>> >> To:    cbm@rose.hp.com
    >>>> >> Cc:    someshg@yahoo.com; steph@cs.uchicago.edu; hufferd@us.ibm.com;
    >>>> >> cbm@rose.hp.com; ldalleore@snapserver.com; Venkat Rangan;
    >>>> >> Black_David@emc.com
    >>>> >> Subject:    Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
    >>>> >>
    >>>> >>
    >>>> >>
    >>>> >> Mallikarjun,
    >>>> >>
    >>>> >> I commiserate with you at the lack of ack for data but the Orlando
    >>>> meeting
    >>>> >> stated - no.  Recall that I kept the number only as a mechanism to
    >>>> detect
    >>>> >> missing packets.
    >>>> >>
    >>>> >> You can achieve the effect you want by keeping around data for a
    >>while
    >>>> >> (you
    >>>> >> determine how long and then discard).
    >>>> >>
    >>>> >> If a SACK comes and you can recover - fine. If not you either
    >>reaccess
    >>>> the
    >>>> >> media (if you know how) or reject
    >>>> >> and let the initiator retry.
    >>>> >>
    >>>> >> You should not worry about R/W conflicts as programs bound to have
    >>>such
    >>>> >> conflicts either:
    >>>> >>
    >>>> >> 1)can live with them or
    >>>> >> 2)protect themselves through some locks and rely on
    >>>> "operation-end-status"
    >>>> >> to keep results deterministic.
    >>>> >>
    >>>> >> Regards,
    >>>> >> Julo
    >>>> >>
    >>>> >>
    >>>> >>
    >>>> >> "Mallikarjun C." <cbm@rose.hp.com> on 27/03/2001 03:34:16
    >>>> >>
    >>>> >> Please respond to cbm@rose.hp.com
    >>>> >>
    >>>> >> To:   cbm@rose.hp.com, someshg@yahoo.com, steph@cs.uchicago.edu,
    >>>Julian
    >>>> >>       Satran/Haifa/IBM@IBMIL, John Hufferd/San Jose/IBM@IBMUS
    >>>> >> cc:   Black_David@emc.com
    >>>> >> Subject:  iSCSI ERT: data SACK/replay buffer/"semi-transport"
    >>>> >>
    >>>> >>
    >>>> >>
    >>>> >>
    >>>> >> Hi Error Recovery Team,
    >>>> >>
    >>>> >> iSCSI can discard PDUs because of digest errors and request
    >>>> >> retransmissions using the iSCSI data SACK.  To deal with such
    >>>> >> an eventuality, targets that want to support data SACK have
    >>>> >> the following options:
    >>>> >>
    >>>> >> (A) maintain a complete "replay" buffer for the entire I/O since
    >>>> >>   a SACK could come anytime before the status is ack'ed by the
    >>>> >>   initiator. [ simple, but extremely expensive in memory resources]
    >>>> >>
    >>>> >> (B) (re-introduce data-ACKs into the draft, and) implement
    >data-ACKs.
    >>>> >>   Thus enables keeping only those I/O buffers that haven't been
    >>ack'ed
    >>>> >>   by the initiator. IOW, become a real full transport! [ everyone
    >>>> disliked
    >>>> >>   it earlier...]
    >>>> >>
    >>>> >> (C) re-access the medium for data retransmission requests.  Now
    >there
    >>>> >>   are 3 sub-cases in this to handle the changed data on the medium
    >in
    >>>a
    >>>> >>   write-after-read scenario.  (SEE NOTE.1 at the bottom on how it is
    >>>> >> legal.)
    >>>> >>      (1) On seeing any write, stall till status is ack'ed for all
    >the
    >>>> >>             previous reads (basically drain the pipe). [simple, but
    >>>> incurs
    >>>> >>             an additional roundtrip delay for all writes].
    >>>> >>      (2) A variation of the above, keep an eye only on the prior
    >>>> >>             overlapping reads. [more BW efficient, but complicated
    >to
    >>>> >>             resolve the block dependencies in a stream of
    >>>> reads followed
    >>>> >>             by writes]
    >>>> >>         (3) Document the caveat and leave it upto the applications
    >>>> >>             to avoid this case since this leads to data integrity
    >>>> issues.
    >>>> >>             [pushing to apps since the transport can't get it
    >right!]
    >>>> >>
    >>>> >> My first preference is (B), followed by (A), and I suggest we not go
    >>>> >> to (C) at all with its inherent dangers.
    >>>> >>
    >>>> >> Doing (B) naturally completes the transport job that iSCSI has taken
    >>>> >> on itself in view of TCP's claimed unreliable checksum.  That is the
    >>>> >> right thing to do architecturally instead of being a
    >>"semi-transport"!
    >>>> >>
    >>>> >> Comments?
    >>>> >> --
    >>>> >> Mallikarjun
    >>>> >>
    >>>> >>
    >>>> >> Mallikarjun Chadalapaka
    >>>> >> Networked Storage Architecture
    >>>> >> Network Storage Solutions Organization
    >>>> >> MS 5668   Hewlett-Packard, Roseville.
    >>>> >> cbm@rose.hp.com
    >>>> >>
    >>>> >>
    >>>>
    >>>__________________________________________________________________________
    >
    >>>> >> Note.1: A Read followed by a Write (to the same blocks) is perfectly
    >>>> legal
    >>>> >>         if SCSI sets the ORDERED task attribute on both the
    >>>> commands AND
    >>>> >>         sets the NACA bit to one to indicate that Write shall be
    >>>> executed
    >>>> >>         only if the Read did not fail (result in a Check Condition).
    >>>> >>
    >>>> >>         In the current case, since Read completed just fine from
    >>>SCSI's
    >>>> >>         point of view, SCSI is moving on to execute Write.  Those
    >>read
    >>>> >> buffers
    >>>> >>         had been freed up since iSCSI received an ACK at the TCP
    >>>level,
    >>>> >> and
    >>>> >>         since iSCSI has no other way to have the data ack'ed!
    >>
    >
    


Home

Last updated: Tue Sep 04 01:05:10 2001
6315 messages in chronological order