TCP Framing Support in iSCSI - Proposal

To: ips@ece.cmu.edu
Subject: TCP Framing Support in iSCSI - Proposal
From: "WENDT,JIM (HP-Roseville,ex1)" <jim_wendt@hp.com>
Date: Tue, 31 Jul 2001 16:11:19 -0700
Cc: "WENDT,JIM (HP-Roseville,ex1)" <jim_wendt@hp.com>
Content-Type: text/plain;charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu
The following rough proposal for iSCSI TCP Framing
Support has been generated by the Framing Team as a result
of the Face-to-Face Design Teams meeting and subsequent 
discussions. It consists of changes to the "ULP Framing
for TCP" and "iSCSI" internet-drafts as summarized below.
An I-D detailing these, and other related, changes is
forthcoming.

Regards,
Jim Wendt
jim_wendt@hp.com


-------- iSCSI TCP Framing Support Proposal - Summary ------

A) Proposed changes to the "ULP Framing for TCP" I-D are:

    1) Modify I-D to include two framing modes:
        - "Marker mode" for unmodified TCP stacks
        - "PDU-alignment mode" for modified TCP stacks

       Note: An updated version of the "ULP Framing for TCP I-D"
       reflecting these changes has been posted (7/9/01) to TSVWG
       for comments (draft-ietf-tsvwg-tcp-ulp-frame-00)

    2) ULP is responsible for negotiating use of a framing mode
       and enabling framing behavior on the TCP connection in an
       unambiguous manner.

    3) There are "Senders" and "Receivers" on each unidirectional
       data flow of a framing TCP connection. The use of framing,
       the framing mode, and framing operational parameters values
       are negotiated separately for each direction of a framing 
       TCP connection. This means that framing behavior may be in
       use on one direction of a TCP connection but not the
       reverse direction, or that different framing modes may
       be in use.

    4) Creation of the following operational parameters and
       semantics:
         - Marker period (in Marker mode)
         - Receiver maximum PDU size (in Marker mode)
         - Framing keys (in PDU-alignment mode)
         - ULP packing behavior (in PDU-alignment mode)
 
    5) ULP is responsible for negotiating use of a specific
       framing mode over the TCP connection, and for negotiating
       values for the framing operation parameters

    6) Change the marker fields to be 16-bits rather than 32-bits
       (and refer to as "offsets" rather than "pointers")


B) Proposed changes to the iSCSI spec are:

    1) Remove Markers appendix from iSCSI spec (Appendix D. Synch
       and Steering with Fixed Interval Markers)

    2) iSCSI spec adds wording to the effect of:
      
       Framing Protocol:
       * iSCSI initiator and target framing behavior over a TCP
         connection is defined in draft-ietf-tsvwg-tcp-ulp-frame-00
         (or eventual RFC#)
       * an iSCSI initiator or target is both a sender and receiver
         with respect to framing behavior over a TCP connection.

       iSCSI Sender:
       * an iSCSI framing sender SHOULD implement PDU-alignment mode,
         as defined in <Framing I-D>
       * if an iSCSI framing sender does not implement PDU-alignment
         mode, it MUST implement Marker mode, as defined in 
         <Framing I-D>

       iSCSI Receiver:
       * an iSCSI receiver may choose to not implement any 
         framing mode
       * an iSCSI framing receiver MAY implement PDU-alignment
         mode, or Marker mode, or both as defined in <Framing I-D>
       * an iSCSI framing receiver on a TCP connection dictates
         use of the highest framing mode desired from the sender
         by progressing through the following sequence:
           - if the receiver requests PDU-alignment mode from the
             sender, and the sender supports PDU-alignment mode,
             then the sender MUST enable use of PDU-alignment mode
             on the TCP connection
           - else if the sender does not support PDU-alignment mode,
             then the receiver MAY request Marker mode from the
             sender. If the sender also supports Marker mode, then
             the sender MUST enable use of Marker mode
           - otherwise, the receiver has not requested use of
             a framing mode and the sender MUST NOT enable use of
             any framing mode on the TCP connection
           - [Note: the above rules may be moved to the Framing I-D]
 
       Interoperation:
       * the framing mode or modes that a receiver implementation 
         chooses to support will determine which senders it can
         perform direct data placement for (since senders can choose
         to implemented either of the framing modes). It is
         anticipated that there will be receiver implementations
         which support the following combinations of framing modes:
               1. PDU-alignment + Markers
               2. PDU-alignment only
               3. no framing

       Chunking:
       * Use of PDU-alignment mode requires that a dynamic
         chunking layer be implemented above the framing TCP
         layer.

    3) > Still need to determine iSCSI mechanism for turning on
         Framing protocol Marker mode operation

    4) > Still need to determine iSCSI mechanism for negotiating
         values for framing operational parameters
    
    5) Perhaps there is some description of probable framing
       scenarios capturing the most likely combinations of
       the following attributes:
         - initiator or target
         - software implementation or hardware implementation
         - unmodified or modified TCP stack
         - sender AND receiver framing behaviors (no framing, 
           or Marker mode, or PDU-Alignment mode)
         - values for framing operational parameters


-------------------------------------------------------
The reasoning for these proposed changes is as follows:

    1) iSCSI use of direct data placement and framing

         a) Direct data placement allows a HW-accelerated
            interface card to place each incoming ULP PDU (and
            ideally each TCP segment payload) directly into its
            final application buffer in end-system memory, even 
            when the underlying TCP segments arrive out of order.
            The ULP PDU carries the buffer location information
            for doing the direct placement. This means that the
            TCP doesn't need to hold application data in an
            internal receive queue while sequence gaps are filled,
            nor subsequently copy that data into final
            application buffers. This allows the interface
            card to minimize or eliminate the very fast and
            large receive queue memory that would normally be
            required when running over networks with large
            bandwidth-delay products (10-100 Gbps,
            200msec RTT).

         b) In order to perform direct data placement, the
            placement function must be able to locate ULP
            headers in the TCP segment stream, and extract
            placement information, even when TCP segments
            arrive out of order. A framing mechanism provides
            the underlying wire protocol and behaviors to
            enable this.

         c) The I-D "The Case for RDMA"
            (draft-csapuntz-caserdma-00.txt) discusses
            the benefits of direct data placement in the
            context of a generalized RDMA facility.


    2) Merging Marker mode into "ULP Framing for TCP" I-D

         a) The TCP-related framing work already has mindshare
            in TSVWG and this work is embodied in the current
            framing I-D. Rather than dilute the framing effort 
            with additional I-Ds, all framing related work
            should be collected into a modified version of the
            existing framing I-D.

         b) Other ULPs may also find Marker mode useful in
            software-only unmodified-TCP client scenarios

         c) The framing I-D appears to be a reasonable literary
            vehicle for documenting the collection of framing
            schemes. The I-D could be extended in the future to
            include a byte or word stuffing frame marker method
            such as COBS.

         d) A single framing I-D may help to encourage a single
            consistent interface with the ULP regardless of which
            framing mode is employed.

         e) The iSCSI spec can simply reference the one framing I-D.


    3) Specifying iSCSI sender and receiver framing support

        a) Receivers can choose to not implement framing.
           Software implementations of receivers may incur extra 
           data movements in processing framing and generally get
           no benefit from using framing. It is anticipated that
           these receiver software implementations will not 
           support framing.

        b) Hardware-accelerated receivers that want to perform
           direct data placement and eliminate or minimize the
           amount of TCP reassembly memory (for links with large
           bandwidth-delay products) will require senders to
           support framing behavior. These receiver
           implementations are only viable if they can rely
           on the fact that all senders are capable of
           supporting some framing mode.

        c) Framing is done for the receiver's benefit, and is
           mostly a minor inconvenience for the sender. However,
           senders may have limitations regarding which framing 
           mode(s) they can support. So, a sender is allowed to
           implement the framing mode(s) most suited to it, and
           the receiver is allowed to select from these supported
           framing modes, or choose not to utilize framing on
           the connection.

        d) There isn't one framing mode that is best for all
           senders:
              - Marker mode is best suited to software
                implementations that run over unmodified TCP stacks
              - PDU-alignment mode is best suited to hardware
                implementations that want to minimize or eliminate
                buffer memory and reduce per-packet processing
                complexity

        e) Marker mode is the best choice for a framing mechanism
           that can operate completely above a client TCP stack
           and not require any modification to that stack.
               - Other interval-based approaches (periodic PDU
                 alignment, fixed length PDU) require padding
                 and waste bandwidth
               - bit-stuffing and byte-stuffing schemes (COBS,
                 etc) have a much higher processing overhead

        f) Marker mode supports one potentially compelling
           application for iSCSI involving software-only
           implementations on mainstream desktops and laptops
           operating over unmodified TCP stacks to access 
           centralized storage arrays. These software
           implementations are likely to exist far into the 
           future. Individual software-only clients may not operate 
           at 10Gbps, but may be combined together with other
           clients that could aggregate to 10Gbps, thus making
           direct data placement compelling for a 10Gbps receiver
           even if the senders are only operating at 1Gbps.

        g) Marker mode doesn't completely eliminate the need for 
           buffer memory on the receiver. The receiver still needs 
           to use "eddy buffers" that temporarily hold incoming data
           after a dropped segment containing a ULP header up until 
           the next ULP header is located in the packet stream, and 
           which exist for as long as the original ULP header segment 
           is outstanding. But Marker mode does greatly reduce the 
           amount of memory needed as compared to a traditional TCP
           receiver's reassembly memory requirements (often equal to 
           number-of-connections X round-trip-pipe-size). The Marker 
           mode small memory requirements are dependent upon the 
           period of the marker, and the size of the ULP PDUs being 
           restricted to a reasonably small value. The larger that 
           either one is, the larger the eddy buffer memory 
           requirements. Also, an eddy buffer is required each time 
           a ULP header is dropped, so that multiple ULP header drops
           in close proximity may cause multiple eddy buffers to be 
           temporarily pending on a connection.

        h) PDU-alignment framing mode allows each ULP PDU to be
           aligned with, and sent in, a single TCP segment under
           normal conditions (with the added requirement that a
           chunking layer needs to be implemented between iSCSI and
           the framing TCP stack). This behavior allows each TCP
           segment to be fully self-describing with respect to
           direct placment. Thus, each incoming TCP segment payload
           can be processed and direct placed as it arrives with
           no residual state information nor eddy buffer memory 
           required.

        i) PDU-alignment framing mode does require the use of
           a small number of "eddy buffers" when dynamic changes
           in the network path MTU occur and packets arrive out
           of order.

        j) The PDU-alignment framing mode is preferred. However, it 
           may be several years before all of the different software 
           TCP/IP implementations will be able to support framing 
           behavior. To do this, software interfaces will need to 
           change, and something needs to drag them there.  This 
           will take some time, so we have markers to help out in the
           meantime.  If all receivers that can use framing can do
           either one, and senders that can do PDU-alignment should
           do so, we will have a larger set of PDU-alignment
           implementations that may help pull the rest of the
           software interfaces along with the

        k) It is anticipated that a receiver will not
           implement only Markers. The receiver implementations will 
           probably be:
               1. PDU-alignment + Markers
               2. PDU-alignment only
               3. no framing

-----------------------------------------------------------
Open Issues:

    1) Interoperability between sender and receiver
        - Given that both senders and receivers have a choice in
          which framing mode(s) they implement, there is the
          potential for the sender to implement one framing mode
          and the receiver to implement a different framing mode
          (e.g. the sender implements only Marker mode, and the
          receiver implements only PDU-alignment mode).
        - In this situation, the receiver and sender would not
          enable framing on the TCP connection, and the receiver
          would not be able to perform direct data placement.
          Throughput from sender to receiver would likely be
          greatly reduced should any TCP segment drops occur.

    2) None of the current framing schemes take TCP data integrity
       into account. It either needs to be decided:
         a) how to detect when a data integrity problem occurs
            within a framing header, and what to do about it 
            (even if it just kills the TCP connection),
         b) or that a sufficient level of data integrity needs
            to be provided for all protocols running over TCP
            via a more holistic approach.

    3) Acceptability of the PDU-Alignment framing mode's reliance
       on "key+length" matching across resegmenting middleboxes
         - In PDU-Alignment mode each TCP segment payload contains
           one complete framing PDU (consisting of an 8 byte
           framing header followed by one or more complete ULP
           PDUs). Thus, every TCP segment has the TCP header 
           followed immediately by the framing header.
         - In certain cases a single framing PDU must be broken 
           across multiple TCP segments (such as dynamic Path MTU
           reductions), resulting in TCP segments where a framing
           header doesn't immediately follow the TCP header.
         - The framing I-D defines sender behaviors that allow
           PDU-alignment mode to function deterministically and
           correctly in all cases where the TCP segmentation
           flowing from sender to receiver is not altered.
         - If the TCP segmentation from sender to receiver is
           altered by an intermediary (resegmenting middlebox),
           and a framing-header-containing segment drop or 
           reordering has occurred such that the receiver is
           attempting to locate the next framing header in the
           segment stream, then the receiver must examine the 
           first 8 bytes of each incoming TCP segment payload for
           a valid framing header containing valid Key(6B) and 
           Length(2B) fields.
         - A false-positive occurs if, upon resegmentation by a 
           middlebox, the receiver gets a TCP segment in which
           the first 8 bytes of the payload indicate a valid
           framing header (the first 6 bytes match the
           previously exchanged random key value, and the next 
           2 bytes contain a valid length), yet the TCP segment 
           payload isn't actually a framing header.
         - While it is felt that the probability of a
           false-positive in these resegmenting-middlebox scenarios
           will be sufficiently low, further analysis work may be
           may be required in this area.
         - Note that this mechanism is NOT a scanning technique
           for locating start-of-frame across an arbitrary byte
           stream. It only provides an indication of PDU
           alignment or not. The first 8 bytes of the TCP segment
           payload are examined to determine if the segment
           contains the start of a ULP PDU.

    4) Do Markers work at 10Gbps?
         - The feasibility of markers at 10Gbps has been questioned.
           It would be beneficial to hear specifics regarding why 
           Markers won't work at 10Gbps. Markers don't allow for a
           no-memory direct data placement NIC since eddy-buffers 
           are required. So, support for clients with unmodified 
           TCP stacks comes at a cost, which is the cost of 
           supporting eddy buffers on the NIC.
         - One question is whether the eddy buffers can be contained 
           entirely in the ASIC or need to be in off-chip memory.

----------------------------------------------------------
Prev by Date: RE: can i discuss infiniband here
Next by Date: remove
Prev by thread: FCIP: FCIP Discovery design team
Next by thread: iSCSI: draft 7: iSCSI response and SCSI sense data
Index(es):
- Date
- Thread
Home
Last updated: Tue Sep 04 01:04:08 2001
6315 messages in chronological order