SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: F2F Mtg Summary for Framing and Error Recovery



    Jim-
    
    The error recovery slides and spreadsheet are now available
    on Julian's web site at:
    
    http://www.haifa.il.ibm.com/satran/ips/PaloAlto-MarkBakke-crc-recovery.pdf
    
    http://www.haifa.il.ibm.com/satran/ips/PaloAlto-MarkBakke-iSCSI-errors.xls
    
    They make a lot of assumptions; please enjoy them responsibly.
    
    --
    Mark
    
    "WENDT,JIM (HP-Roseville,ex1)" wrote:
    > 
    > Here is a quick summary of the outcome from the June 27-28 Design
    > Teams Face-to-Face meeting in Palo Alto in the two focal areas:
    > Framing and Error Recovery. The bulk of the meeting was spent
    > discussing framing scenarios, requirements, and alternatives.
    > 
    > As soon as the slide decks make it onto Julian's web site,
    > I'll email out that info.
    > 
    > Regards,
    > Jim Wendt
    > Networked Storage Architecture / NSSO
    > Hewlett-Packard Company
    > jim_wendt@hp.com 916-785-5198
    > 
    > ---------------------------------------------------------------
    > F2F Meeting Summary for Framing and Error Recovery
    > Design Teams Face-to-Face Meeting / June 27-28 Palo Alto
    > 
    > ---------------- Framing: The Bottom Line ----------------------
    > 
    > To cut to the chase, the following rough proposal was generated
    > for handling ULP Framing for iSCSI:
    > 
    > A) Proposed changes to "ULP Framing for TCP" I-D are:
    >     1) Modify I-D to include two framing modes:
    >         - "Marker mode" for unmodified TCP stacks
    >         - "PDU-alignment mode" for modified TCP stacks
    > 
    >     2) ULP is responsible for negotiating use of framing protocol
    >        and enabling framing behavior on the TCP connection in an
    >        unambiguous manner
    > 
    >     3) The framing protocol usage, framing mode, and framing
    >        operational parameters are negotiated separately in each
    >        direction on a TCP connection. Thus there are "Senders"
    >        and "Receivers" on a framing TCP connection. An iSCSI
    >        Initiator or Target is both a Sender and a Receiver
    >        with respect to an framing TCP connection.
    > 
    >     4) ULP is responsible for negotiating use of a specific
    >        framing mode over the TCP connection by having the
    >        receiver request highest framing mode desired from sender
    >        (first PDU-alignment, then Marker, then none) and having
    >        the sender comply:
    >          - if receiver requests, and sender supports, PDU-alignment
    >            mode, then sender MUST enable PDU-alignment mode
    >          - else if receiver requests, and sender supports, Marker
    >            mode, then sender MUST enable Marker mode
    >          - else don't use framing protocol
    > 
    >     5) ULP is responsible for negotiating framing operational
    >        parameters:
    >          - Marker period (in Marker mode)
    >          - Receiver maximum PDU size (in Marker mode)
    >          - Framing keys (in PDU-alignment mode)
    >          - ULP packing behavior (in PDU-alignment mode)
    > 
    >     6) Change the marker fields to be 16-bits rather than 32-bits
    >        (and refer to as "offsets" rather than "pointers")
    > 
    >     *) An updated version of the "ULP Framing for TCP I-D"
    >        reflecting these changes has been posted (7/9/01) to TSVWG
    >        for comments (draft-ietf-tsvwg-tcp-ulp-frame-00)
    > 
    > B) Proposed changes to iSCSI spec are:
    >     1) Remove Markers appendix from iSCSI spec (Appendix D. Synch
    >        and Steering with Fixed Interval Markers)
    > 
    >     2) iSCSI spec adds wording to the effect of:
    >        * iSCSI initiator and target framing behavior over a TCP
    >          connection is defined in draft-ietf-tsvwg-tcp-ulp-frame-00
    >          (or eventual RFC#)
    >        * an iSCSI initiator or target is both a sender and receiver
    >          with respect to framing behavior
    >        * an iSCSI framing sender MUST implement Marker mode, and
    >          MAY implement PDU-alignment mode, as defined in <I-D>
    >        * an iSCSI framing receiver MAY implement PDU-alignment
    >          mode, or Marker mode, or both, or none as defined in <I-D>
    >        * an iSCSI receiver on a framing TCP connection dictates
    >          use of the highest framing mode desired from sender as
    >          follows:
    >          - if receiver requests, and sender supports, PDU-alignment
    >            mode, then sender MUST enable PDU-alignment mode
    >          - else if receiver requests, and sender supports, Marker
    >            mode, then sender MUST enable Marker mode
    >          - else framing behavior is disabled
    >        * Perhaps there is some description of probable framing
    >          scenarios capturing the most likely combinations of
    >          the following attributes:
    >          - initiator or target
    >          - software implementation or hardware implementation
    >          - unmodified or modified TCP stack
    >          - sender AND receiver framing behaviors (no framing,
    >            or Marker mode, or PDU-Alignment mode)
    >          - values for framing operational parameters
    > 
    >     3) > Still need to determine iSCSI mechanism for turning on
    >          Framing protocol Marker mode operation
    > 
    >     4) > Still need to determine iSCSI mechanism for negotiating
    >          framing operational parameters:
    >             - Framing mode (if both Marker and PDU Alignment mode
    >               are supported)
    >             - Marker period (in Marker mode)
    >             - Receiver maximum PDU size (in Marker mode)
    >             - Framing keys (in PDU-alignment mode, if supported)
    >             - ULP packing behavior (in PDU-alignment mode, if
    >               supported)
    > 
    > -----------
    > The reasoning for these proposed changes is as follows:
    > 
    >     1) Re: Merge Marker mode into "ULP Framing for TCP" I-D
    >          a) The TCP-related framing work already has mindshare
    >             in TSVWG and this work is embodied in the current
    >             framing I-D. Rather than dilute the framing effort
    >             with additional I-Ds, all framing related work
    >             should be collected into a modified version of the
    >             existing framing I-D.
    > 
    >          b) Other ULPs may also find Marker mode useful in
    >             software-only unmodified-TCP client scenarios
    > 
    >          c) The framing I-D appears to be a reasonable literary
    >             vehicle for documenting the collection of framing
    >             schemes. The I-D could be extended in the future to
    >             include a byte or word stuffing frame marker method
    >             such as COBS.
    > 
    >          d) A single framing I-D may help to encourage a single
    >             consistent interface with the ULP regardless of which
    >             framing mode is employed.
    > 
    >          e) The iSCSI spec can simply reference the one framing I-D.
    > 
    >     2) Re: Make Marker mode mandatory for all iSCSI implementations,
    >        and PDU-Alignment mode optional for all iSCSI implementations.
    > 
    >         a) This allows interoperation of software-only,
    >            UNMODIFIED-TCP-stack clients with hardware-accelerated,
    >            small-buffer-memory storage arrays. This applies to both
    >            1Gbps-client/1Gbps-array and 1Gbps-client/10Gbps-array
    >            scenarios.
    > 
    >         b) One potentially compelling application for iSCSI involves
    >            software-only implementations on mainstream desktops and
    >            laptops operating over unmodified TCP stacks to access
    >            centralized storage arrays.
    > 
    >         c) Software implementations are likely to exist far into the
    >            future. Individual software-only clients may not operate
    >            at 10Gbps, but will be combined together with other clients
    >            that aggregate to 10Gbps.
    > 
    >         d) The only framing mechanisms that can operate completely
    >            above a client TCP and not require any modification to
    >            the client's standard TCP stack are the interval-based
    >            (Marker mode, periodic PDU alignment, fixed length PDU)
    >            and byte-stuffing (COBS) framing schemes. All other
    >            framing mechanisms (including PDU-Alignment mode)
    >            require modification to the client's TCP stack.
    > 
    >         e) The processing overhead for a client software
    >            implementation to insert Markers is small compared to
    >            the processing overhead of a byte-stuffing scheme.
    > 
    >         f) Receivers are allowed to dictate the sender's framing
    >            behavior because it is the receiver that is impacted
    >            by the presence or absence of framing behavior on the
    >            connection.
    > 
    >         g) Hardware-accelerated receivers can be implemented with
    >            minimal buffer memory, meaning that they always rely on
    >            framing-based direct data placement processing, only if
    >            it is known in advance that every client the receiver
    >            could potentially interoperate with is capable of
    >            providing the necessary framing-based behavior. These
    >            hardware-accelerated receivers will request, and expect
    >            that, the sender insert markers (or PDU-Alignment if
    >            supported).
    > 
    >         h) Since a software-implemented receiver may incur extra
    >            data movements in processing markers, these receivers
    >            can request, and expect that, a sender NOT insert
    >            markers, if desired.
    > 
    >         i) Marker mode doesn't completely eliminate the need for
    >            buffer memory on the receiver. The receiver still needs
    >            to use "eddy buffers" that temporarily hold incoming data
    >            after a dropped segment containing a ULP header up until
    >            the next ULP header is located in the packet stream, and
    >            which exist for as long as the original ULP header segment
    >            is outstanding. But Marker mode does greatly reduce the
    >            amount of memory needed as compared to a traditional TCP
    >            receiver's reassembly memory requirements (often equal to
    >            number-of-connections X round-trip-pipe-size). The Marker
    >            mode small memory requirements are dependent upon the
    >            period of the marker, and the size of the ULP PDUs being
    >            restricted to a reasonably small value. The larger that
    >            either one is, the larger the eddy buffer memory
    >            requirements. Also, an eddy buffer is required each time
    >            a ULP header is dropped, so that multiple ULP header drops
    >            in close proximity may cause multiple eddy buffers to be
    >            temporarily pending on a connection.
    > 
    >         j) The PDU-alignment framing mode is preferred. However, it
    >            may be several years before all of the different software
    >            TCP/IP implementations will be able to support framing
    >            behavior.
    > 
    > -----------
    > Open Issues:
    > 
    >     1) Acceptability of the PDU-Alignment framing mode's reliance
    >        on "key+length" matching across resegmenting middleboxes
    >          - In PDU-Alignment mode each TCP segment payload contains
    >            one complete framing PDU (consisting of an 8 byte
    >            framing header followed by one or more complete ULP
    >            PDUs). Thus, every TCP segment has the TCP header
    >            followed immediately by the framing header.
    >          - In certain cases a single framing PDU must be broken
    >            across multiple TCP segments (such as dynamic Path MTU
    >            reductions), resulting in TCP segments where a framing
    >            header doesn't immediately follow the TCP header.
    >          - The framing I-D defines sender behaviors that allow
    >            PDU-alignment mode to function deterministically and
    >            correctly in all cases where the TCP segmentation
    >            flowing from sender to receiver is not altered.
    >          - If the TCP segmentation from sender to receiver is
    >            altered by an intermediary (resegmenting middlebox),
    >            and a framing-header-containing segment drop or
    >            reordering has occurred such that the receiver is
    >            attempting to locate the next framing header in the
    >            segment stream, then the receiver must examine the
    >            first 8 bytes of each incoming TCP segment payload for
    >            a valid framing header containing valid Key(6B) and
    >            Length(2B) fields.
    >          - A false-positive occurs if, upon resegmentation by a
    >            middlebox, the receiver gets a TCP segment in which
    >            the first 8 bytes of the payload indicate a valid
    >            framing header (the first 6 bytes match the
    >            previously exchanged random key value, and the next
    >            2 bytes contain a valid length), yet the TCP segment
    >            payload isn't actually a framing header.
    >          - While it is felt that the probability of a
    >            false-positive in these resegmenting-middlebox scenarios
    >            will be sufficiently low, further analysis work may be
    >            may be required in this area.
    >          - Note that this mechanism is NOT a scanning technique
    >            for locating start-of-frame across an arbitrary byte
    >            stream. It only provides an indication of PDU
    >            alignment or not. The first 8 bytes of the TCP segment
    >            payload are examined to determine if the segment
    >            contains the start of a ULP PDU.
    > 
    >     2) None of the current framing schemes take TCP data integrity
    >        into account. It either needs to be decided:
    >          a) how to detect when a data integrity problem occurs
    >             within a framing header, and what to do about it
    >             (even if it just kills the TCP connection),
    >          b) or that a sufficient level of data integrity needs
    >             to be provided for all protocols running over TCP
    >             via a more holistic approach.
    > 
    >     3) Do Markers work at 10Gbps
    >          - The feasibility of markers at 10Gbps has been questioned.
    >            It would be beneficial to hear specifics regarding why
    >            Markers won't work at 10Gbps. Markers don't allow for a
    >            no-memory direct data placement NIC since eddy-buffers
    >            are required. So, support for clients with unmodified
    >            TCP stacks comes at a cost, which is the cost of
    >            supporting eddy buffers on the NIC.
    >          - One question is whether the eddy buffers can be contained
    >            entirely in the ASIC or need to be in off-chip memory.
    > 
    > ---------------- Error Recovery: The Bottom Line ----------------------
    > 
    > 1) Information was presented regarding estimated iSCSI header and
    >    data digest error rates, and possible approaches to iSCSI
    >    error recovery. The error rates info is summarized as follows:
    > 
    >         a) "Good Internet"
    >               - 1500 byte MTU / 8192 byte iSCSI PDU
    >               - TCP checksum mismatch 1 in 90,000
    >               - Checksum escape 1 in 135M to 1 in 10B
    >               - For bandwidth of 30Mbps @ 100msec RTT
    >                   > 8 to 600 digest errors per year
    >                   > 1 header digest every 2 months to 10 years
    >               - For bandwidth of 300Mbps @ 10msec RTT
    >                   > 80 to 6,000 digest errors per year
    >                   > 1 to 70 header digest errors per year
    > 
    >         b) "Bad Internet"
    >               - 1500 byte MTU / 8192 byte iSCSI PDU
    >               - TCP checksum mismatch 1 in 11,000
    >               - Checksum escape 1 in 16M to 1 in 1B
    >               - For bandwidth of 10Mbps @ 100msec RTT
    >                   > .5 to 33 digest errors per week
    >                   > 26 to 1,650 digest errors per year
    >                   > 0.3 to 20 header digest errors per year
    >               - For bandwidth of 100Mbps @ 10msec RTT
    >                   > 5 to 335 digest errors per week
    >                   > 260 to 16,500 digest errors per year
    >                   > 3 to 200 header digest errors per year
    > 
    >         c) 1Gbps iSCSI connection
    >               - assuming current TCP yields 70% bandwidth utilization
    >               - at 100msec
    >                   > packet loss less than 1 in 50M
    >                   > .5 to 40 digest errors per year
    >                   > 1 header digest error every 2..100 years
    >               - at 10msec
    >                   > packet loss less than 1 in 500,000
    >                   > 60 to 4,000 digest errors per year
    >                   > 1 to 40 header digest errors per year
    > 
    >         d) 10Gbps iSCSI connection
    >               - assuming current TCP yields 70% bandwidth utilization
    >               - at 100msec
    >                   > packet loss less than 1 in 5 billion
    >                   > 1 digest error every 3 mo to 10 years
    >                   > 1 header digest error every 20 to 1000 years
    >               - at 10msec
    >                   > packet loss less than 1 in 50M
    >                   > 6 to 400 digest errors per year
    >                   > 1 header digest error every 3 mo to 12 years
    > 
    >         e) The frequency of framing header corruption escaping the
    >            TCP checksum mechanism is on the order of the frequency
    >            of the iSCSI header escaping, but depends on the
    >            mechanism used as well as the MSS and iSCSI PDU sizes:
    >              - Markers (at 2k intervals) - 1/3 as likely as iSCSI
    >                headers.
    >              - Framing (w/o chunking) - 1.5 times as likely
    >              - Framing (w/ chunks) - 1/2 as likely as iSCSI headers.
    >            These assumed an 8k iSCSI PDU size, except for Framing
    >            (w/o chunking), which assumed a 1k iSCSI PDU to fit in a
    >            single segment.  All schemes had a framing header size
    >            of 8 bytes, and assumed an MSS of 1460.
    > 
    > 2) No definitive conclusions were reached during the F2F in regards
    >    to Error Recovery mechanisms.
    >       - Further work needs to be done in this area.
    >       - Mallikarjun Chadalapaka, Mark Bakke, and others can help
    >         move the work forward in this area.
    > 
    > 2) It would be valuable to collect information regarding TCP
    >    checksum mismatch rates on production systems. If anyone has
    >    access to fairly busy systems and can collect the following
    >    information, you can forward it to Mark Bakke (mbakke@cisco.com).
    >    You'll want to collect three data items:
    >       a) sysUpTime
    >       b) tcpInSegs - total number of inbound TCP segments
    >       c) tcpInErrs - total number of inbound TCP segments with errors
    >          (most likely checksum mismatches, but some implementations
    >          may count other error discards here as well)
    > 
    > ---------------- Slide Decks ------------------------------------
    > 
    > Sorry, but these materials aren't on the web yet. Hopefully
    > they will be in the next week or two. I'll email when they are
    > available on a server somewhere.
    > 
    > * "iSCSI Framing Presentation" - slides/spreadsheet - Matt Wakeley
    > * "TCP Framing Discussion" - slides - Jim Wendt
    > * "Recovering From iSCSI Digest Errors" - slides - Mark Bakke
    > * "Expected iSCSI digest error rates on Internet connections"
    >   - spreadsheet - Mark Bakke
    > * CRC and checksum performance - slides - Jonathan Stone
    > 
    > ---------------- Framing Discussion Summary ----------------------
    > 
    > /iSCSI usage scenarios
    > A wide variance of usage scenarios were strongly represented:
    > * High-speed short-distance storage LANs
    > * High-speed long-distance storage WANs
    > * Multitudes of low-end clients using software iSCSI client
    >   implementations and unmodified software TCP stacks
    > * Multiple first-generation 1Gbps clients aggregating to
    >   next-generation 10Gbps storage arrays
    > * A variety of IP networks and paths with potential for both
    >   TCP-level resegmenting middleboxes and dynamic changes in Path MTU.
    > 
    > /Memory-based solutions
    > * It was felt that 1Gbps memory-based solutions are feasible and
    >   may be cost-competitive (e.g. there is no usage of direct data
    >   placement nor framing mechanisms in this case)
    > * There were different opinions regarding whether 10Gbps memory-based
    >   solutions would be cost-competitive or feasible for 10Gbps.
    > * There was discussion regarding the comparative cost of memory-based
    >   and no-memory iSCSI HBAs and infrastructure relative to Fibre Channel
    > * There were concerns regarding next-generation 10Gbps storage arrays
    >   that want to support first-generation clients. The next-generation
    >   10Gbps storage arrays can only implement no-memory solutions if the
    >   first-generation clients were mandated to implement support for
    >   framing (thus making direct data placement possible on the storage
    >   array).
    > * Hybrid schemes were discussed where a next-generation 10Gbps
    >   storage array would contain a moderate amount of memory to handle
    >   non-framing first-generation clients while using full framing and
    >   direct data placement with no memory buffers for 10Gbps clients
    > * Matt Wakeley has created a spreadsheet for high-speed memory
    >   subsystem costs
    > 
    > /Direct data placement alternatives
    > * Discussion of various levels at which direct data placement
    >   information can ride:
    >     - Above TCP (iSCSI task tags, RDMA protocol)
    >     - At transport (TCP RDMA option)
    >     - Below transport (TAF)
    > 
    > /iSCSI layering scenarios and evolution
    > * iSCSI can be layered:
    >     - over normal TCP
    >     - over Markers over normal TCP
    >     - over Framing TCP
    >     - over RDMA+chunking over Framing TCP
    > * Layering iSCSI over RDMA+chunking doesn't seem likely for
    >   first-generation iSCSI implementations
    > 
    > /Framing alternatives
    > * Framing mechanism classes:
    >     - Intervalic (Periodic Markers, Periodically aligned headers,
    >       Fixed size ULP PDUs)
    >     - Framing aware TCP (ala ULP Framing over TCP I-D)
    >     - TCP message boundary indications (Reserved bit, TCP option,
    >       URG pointer, PSH bit, etc)
    >     - Byte stuffing (COBS, 7B/8B, etc)
    > * Framing mechanism characteristics:
    >     - sender TCP modifications required?
    >     - receiver memory requirements (full TCP receive window,
    >       eddy buffers, IP reassembly buffers)
    >     - level of TCP changes (none, behavioral, header fields)
    >     - support ULP PDU > TCP MSS
    >     - software processing overhead
    >     - hardware implementation complexity
    >     - handle dynamic Path MTU changes
    >     - handle resegmenting TCP middlebox
    >     - require [dynamic] chunking above TCP
    >     - emit short segments more often than typical
    >     - added protocol bytes overhead
    >     - tied to TCP sequence number processing
    >     - increase probability of segment drops
    >     - TCP aesthetics
    > 
    > /Markers and ULP Framing merge
    > * Proposal to merge Marker mechanism into current "ULP Framing for
    >   TCP I-D" and have iSCSI mandate implementation of Marker mode
    > * See "Framing: The Bottom Line" section above
    > 
    > ---------------- Error Recovery Discussion Summary ---------------
    > 
    > /Mark Bakke slides and spreadsheet
    > * Mark presented slides and a spreadsheet discussing:
    >     - expected iSCSI header and data digest error rates given
    >       link bandwidth, RTT, probability of segment drop, and
    >       probability of TCP checksum escape
    >     - recommended iSCSI error handling approaches for Header
    >       digest and data digest errors
    >     - <slides link>
    >     - <spreadsheet link>
    > 
    > /Discussion regarding iSCSI error recovery complexity
    > * It was felt that 90% of the recovery complexity already exists
    >   for the sake of session recovery (aftet a TCP connection failure)
    >   and if only "within-command" recovery was eliminated, it wouldn't
    >   substantially simplify the protocol or its specification.
    >   However, this assertion needs to be validated.
    > * It was felt that complete command recovery would probably be a
    >   dequate for the expected error incidence (not a noticable impact),
    >   but it hasn't been shown how adopting this approach would reduce
    >   complexity.
    > 
    > /Discussion re: IPSec SA's
    > * There was some discussion regarding use of IPSec and concerns
    >   that the set of Security Associations would not fit into on-chip
    >   memory, forcing the SAs to be cached in off-chip memory.
    > 
    > /Jonathan Stone slides
    > * Jonathan presented data analysis from his soon-to-be-completed
    >   dissertation regarding the nature of empirically-observed
    >   transport-level errors, and the error detection performance of
    >   CRC and checksum algorithms on such.
    > 
    > ---------------- List of Attendees -----------------------
    > 
    > Mark Bakke, Stephen Bailey, Uri Elzur, Somesh Gupta, Randy Haagens,
    > John Hufferd, Jim Pinkerton, Venkat Rangan, Allyn Romanow,
    > Costa Sapuntzakis, Julian Satran, Jonathan Stone, Matt Wakeley,
    > Jim Wendt, Jim Williams
    > 
    > --------------------------------------------------------------
    
    -- 
    Mark A. Bakke
    Cisco Systems
    mbakke@cisco.com
    763.398.1054
    


Home

Last updated: Tue Sep 04 01:04:16 2001
6315 messages in chronological order