|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: F2F Mtg Summary for Framing and Error RecoveryJim- The error recovery slides and spreadsheet are now available on Julian's web site at: http://www.haifa.il.ibm.com/satran/ips/PaloAlto-MarkBakke-crc-recovery.pdf http://www.haifa.il.ibm.com/satran/ips/PaloAlto-MarkBakke-iSCSI-errors.xls They make a lot of assumptions; please enjoy them responsibly. -- Mark "WENDT,JIM (HP-Roseville,ex1)" wrote: > > Here is a quick summary of the outcome from the June 27-28 Design > Teams Face-to-Face meeting in Palo Alto in the two focal areas: > Framing and Error Recovery. The bulk of the meeting was spent > discussing framing scenarios, requirements, and alternatives. > > As soon as the slide decks make it onto Julian's web site, > I'll email out that info. > > Regards, > Jim Wendt > Networked Storage Architecture / NSSO > Hewlett-Packard Company > jim_wendt@hp.com 916-785-5198 > > --------------------------------------------------------------- > F2F Meeting Summary for Framing and Error Recovery > Design Teams Face-to-Face Meeting / June 27-28 Palo Alto > > ---------------- Framing: The Bottom Line ---------------------- > > To cut to the chase, the following rough proposal was generated > for handling ULP Framing for iSCSI: > > A) Proposed changes to "ULP Framing for TCP" I-D are: > 1) Modify I-D to include two framing modes: > - "Marker mode" for unmodified TCP stacks > - "PDU-alignment mode" for modified TCP stacks > > 2) ULP is responsible for negotiating use of framing protocol > and enabling framing behavior on the TCP connection in an > unambiguous manner > > 3) The framing protocol usage, framing mode, and framing > operational parameters are negotiated separately in each > direction on a TCP connection. Thus there are "Senders" > and "Receivers" on a framing TCP connection. An iSCSI > Initiator or Target is both a Sender and a Receiver > with respect to an framing TCP connection. > > 4) ULP is responsible for negotiating use of a specific > framing mode over the TCP connection by having the > receiver request highest framing mode desired from sender > (first PDU-alignment, then Marker, then none) and having > the sender comply: > - if receiver requests, and sender supports, PDU-alignment > mode, then sender MUST enable PDU-alignment mode > - else if receiver requests, and sender supports, Marker > mode, then sender MUST enable Marker mode > - else don't use framing protocol > > 5) ULP is responsible for negotiating framing operational > parameters: > - Marker period (in Marker mode) > - Receiver maximum PDU size (in Marker mode) > - Framing keys (in PDU-alignment mode) > - ULP packing behavior (in PDU-alignment mode) > > 6) Change the marker fields to be 16-bits rather than 32-bits > (and refer to as "offsets" rather than "pointers") > > *) An updated version of the "ULP Framing for TCP I-D" > reflecting these changes has been posted (7/9/01) to TSVWG > for comments (draft-ietf-tsvwg-tcp-ulp-frame-00) > > B) Proposed changes to iSCSI spec are: > 1) Remove Markers appendix from iSCSI spec (Appendix D. Synch > and Steering with Fixed Interval Markers) > > 2) iSCSI spec adds wording to the effect of: > * iSCSI initiator and target framing behavior over a TCP > connection is defined in draft-ietf-tsvwg-tcp-ulp-frame-00 > (or eventual RFC#) > * an iSCSI initiator or target is both a sender and receiver > with respect to framing behavior > * an iSCSI framing sender MUST implement Marker mode, and > MAY implement PDU-alignment mode, as defined in <I-D> > * an iSCSI framing receiver MAY implement PDU-alignment > mode, or Marker mode, or both, or none as defined in <I-D> > * an iSCSI receiver on a framing TCP connection dictates > use of the highest framing mode desired from sender as > follows: > - if receiver requests, and sender supports, PDU-alignment > mode, then sender MUST enable PDU-alignment mode > - else if receiver requests, and sender supports, Marker > mode, then sender MUST enable Marker mode > - else framing behavior is disabled > * Perhaps there is some description of probable framing > scenarios capturing the most likely combinations of > the following attributes: > - initiator or target > - software implementation or hardware implementation > - unmodified or modified TCP stack > - sender AND receiver framing behaviors (no framing, > or Marker mode, or PDU-Alignment mode) > - values for framing operational parameters > > 3) > Still need to determine iSCSI mechanism for turning on > Framing protocol Marker mode operation > > 4) > Still need to determine iSCSI mechanism for negotiating > framing operational parameters: > - Framing mode (if both Marker and PDU Alignment mode > are supported) > - Marker period (in Marker mode) > - Receiver maximum PDU size (in Marker mode) > - Framing keys (in PDU-alignment mode, if supported) > - ULP packing behavior (in PDU-alignment mode, if > supported) > > ----------- > The reasoning for these proposed changes is as follows: > > 1) Re: Merge Marker mode into "ULP Framing for TCP" I-D > a) The TCP-related framing work already has mindshare > in TSVWG and this work is embodied in the current > framing I-D. Rather than dilute the framing effort > with additional I-Ds, all framing related work > should be collected into a modified version of the > existing framing I-D. > > b) Other ULPs may also find Marker mode useful in > software-only unmodified-TCP client scenarios > > c) The framing I-D appears to be a reasonable literary > vehicle for documenting the collection of framing > schemes. The I-D could be extended in the future to > include a byte or word stuffing frame marker method > such as COBS. > > d) A single framing I-D may help to encourage a single > consistent interface with the ULP regardless of which > framing mode is employed. > > e) The iSCSI spec can simply reference the one framing I-D. > > 2) Re: Make Marker mode mandatory for all iSCSI implementations, > and PDU-Alignment mode optional for all iSCSI implementations. > > a) This allows interoperation of software-only, > UNMODIFIED-TCP-stack clients with hardware-accelerated, > small-buffer-memory storage arrays. This applies to both > 1Gbps-client/1Gbps-array and 1Gbps-client/10Gbps-array > scenarios. > > b) One potentially compelling application for iSCSI involves > software-only implementations on mainstream desktops and > laptops operating over unmodified TCP stacks to access > centralized storage arrays. > > c) Software implementations are likely to exist far into the > future. Individual software-only clients may not operate > at 10Gbps, but will be combined together with other clients > that aggregate to 10Gbps. > > d) The only framing mechanisms that can operate completely > above a client TCP and not require any modification to > the client's standard TCP stack are the interval-based > (Marker mode, periodic PDU alignment, fixed length PDU) > and byte-stuffing (COBS) framing schemes. All other > framing mechanisms (including PDU-Alignment mode) > require modification to the client's TCP stack. > > e) The processing overhead for a client software > implementation to insert Markers is small compared to > the processing overhead of a byte-stuffing scheme. > > f) Receivers are allowed to dictate the sender's framing > behavior because it is the receiver that is impacted > by the presence or absence of framing behavior on the > connection. > > g) Hardware-accelerated receivers can be implemented with > minimal buffer memory, meaning that they always rely on > framing-based direct data placement processing, only if > it is known in advance that every client the receiver > could potentially interoperate with is capable of > providing the necessary framing-based behavior. These > hardware-accelerated receivers will request, and expect > that, the sender insert markers (or PDU-Alignment if > supported). > > h) Since a software-implemented receiver may incur extra > data movements in processing markers, these receivers > can request, and expect that, a sender NOT insert > markers, if desired. > > i) Marker mode doesn't completely eliminate the need for > buffer memory on the receiver. The receiver still needs > to use "eddy buffers" that temporarily hold incoming data > after a dropped segment containing a ULP header up until > the next ULP header is located in the packet stream, and > which exist for as long as the original ULP header segment > is outstanding. But Marker mode does greatly reduce the > amount of memory needed as compared to a traditional TCP > receiver's reassembly memory requirements (often equal to > number-of-connections X round-trip-pipe-size). The Marker > mode small memory requirements are dependent upon the > period of the marker, and the size of the ULP PDUs being > restricted to a reasonably small value. The larger that > either one is, the larger the eddy buffer memory > requirements. Also, an eddy buffer is required each time > a ULP header is dropped, so that multiple ULP header drops > in close proximity may cause multiple eddy buffers to be > temporarily pending on a connection. > > j) The PDU-alignment framing mode is preferred. However, it > may be several years before all of the different software > TCP/IP implementations will be able to support framing > behavior. > > ----------- > Open Issues: > > 1) Acceptability of the PDU-Alignment framing mode's reliance > on "key+length" matching across resegmenting middleboxes > - In PDU-Alignment mode each TCP segment payload contains > one complete framing PDU (consisting of an 8 byte > framing header followed by one or more complete ULP > PDUs). Thus, every TCP segment has the TCP header > followed immediately by the framing header. > - In certain cases a single framing PDU must be broken > across multiple TCP segments (such as dynamic Path MTU > reductions), resulting in TCP segments where a framing > header doesn't immediately follow the TCP header. > - The framing I-D defines sender behaviors that allow > PDU-alignment mode to function deterministically and > correctly in all cases where the TCP segmentation > flowing from sender to receiver is not altered. > - If the TCP segmentation from sender to receiver is > altered by an intermediary (resegmenting middlebox), > and a framing-header-containing segment drop or > reordering has occurred such that the receiver is > attempting to locate the next framing header in the > segment stream, then the receiver must examine the > first 8 bytes of each incoming TCP segment payload for > a valid framing header containing valid Key(6B) and > Length(2B) fields. > - A false-positive occurs if, upon resegmentation by a > middlebox, the receiver gets a TCP segment in which > the first 8 bytes of the payload indicate a valid > framing header (the first 6 bytes match the > previously exchanged random key value, and the next > 2 bytes contain a valid length), yet the TCP segment > payload isn't actually a framing header. > - While it is felt that the probability of a > false-positive in these resegmenting-middlebox scenarios > will be sufficiently low, further analysis work may be > may be required in this area. > - Note that this mechanism is NOT a scanning technique > for locating start-of-frame across an arbitrary byte > stream. It only provides an indication of PDU > alignment or not. The first 8 bytes of the TCP segment > payload are examined to determine if the segment > contains the start of a ULP PDU. > > 2) None of the current framing schemes take TCP data integrity > into account. It either needs to be decided: > a) how to detect when a data integrity problem occurs > within a framing header, and what to do about it > (even if it just kills the TCP connection), > b) or that a sufficient level of data integrity needs > to be provided for all protocols running over TCP > via a more holistic approach. > > 3) Do Markers work at 10Gbps > - The feasibility of markers at 10Gbps has been questioned. > It would be beneficial to hear specifics regarding why > Markers won't work at 10Gbps. Markers don't allow for a > no-memory direct data placement NIC since eddy-buffers > are required. So, support for clients with unmodified > TCP stacks comes at a cost, which is the cost of > supporting eddy buffers on the NIC. > - One question is whether the eddy buffers can be contained > entirely in the ASIC or need to be in off-chip memory. > > ---------------- Error Recovery: The Bottom Line ---------------------- > > 1) Information was presented regarding estimated iSCSI header and > data digest error rates, and possible approaches to iSCSI > error recovery. The error rates info is summarized as follows: > > a) "Good Internet" > - 1500 byte MTU / 8192 byte iSCSI PDU > - TCP checksum mismatch 1 in 90,000 > - Checksum escape 1 in 135M to 1 in 10B > - For bandwidth of 30Mbps @ 100msec RTT > > 8 to 600 digest errors per year > > 1 header digest every 2 months to 10 years > - For bandwidth of 300Mbps @ 10msec RTT > > 80 to 6,000 digest errors per year > > 1 to 70 header digest errors per year > > b) "Bad Internet" > - 1500 byte MTU / 8192 byte iSCSI PDU > - TCP checksum mismatch 1 in 11,000 > - Checksum escape 1 in 16M to 1 in 1B > - For bandwidth of 10Mbps @ 100msec RTT > > .5 to 33 digest errors per week > > 26 to 1,650 digest errors per year > > 0.3 to 20 header digest errors per year > - For bandwidth of 100Mbps @ 10msec RTT > > 5 to 335 digest errors per week > > 260 to 16,500 digest errors per year > > 3 to 200 header digest errors per year > > c) 1Gbps iSCSI connection > - assuming current TCP yields 70% bandwidth utilization > - at 100msec > > packet loss less than 1 in 50M > > .5 to 40 digest errors per year > > 1 header digest error every 2..100 years > - at 10msec > > packet loss less than 1 in 500,000 > > 60 to 4,000 digest errors per year > > 1 to 40 header digest errors per year > > d) 10Gbps iSCSI connection > - assuming current TCP yields 70% bandwidth utilization > - at 100msec > > packet loss less than 1 in 5 billion > > 1 digest error every 3 mo to 10 years > > 1 header digest error every 20 to 1000 years > - at 10msec > > packet loss less than 1 in 50M > > 6 to 400 digest errors per year > > 1 header digest error every 3 mo to 12 years > > e) The frequency of framing header corruption escaping the > TCP checksum mechanism is on the order of the frequency > of the iSCSI header escaping, but depends on the > mechanism used as well as the MSS and iSCSI PDU sizes: > - Markers (at 2k intervals) - 1/3 as likely as iSCSI > headers. > - Framing (w/o chunking) - 1.5 times as likely > - Framing (w/ chunks) - 1/2 as likely as iSCSI headers. > These assumed an 8k iSCSI PDU size, except for Framing > (w/o chunking), which assumed a 1k iSCSI PDU to fit in a > single segment. All schemes had a framing header size > of 8 bytes, and assumed an MSS of 1460. > > 2) No definitive conclusions were reached during the F2F in regards > to Error Recovery mechanisms. > - Further work needs to be done in this area. > - Mallikarjun Chadalapaka, Mark Bakke, and others can help > move the work forward in this area. > > 2) It would be valuable to collect information regarding TCP > checksum mismatch rates on production systems. If anyone has > access to fairly busy systems and can collect the following > information, you can forward it to Mark Bakke (mbakke@cisco.com). > You'll want to collect three data items: > a) sysUpTime > b) tcpInSegs - total number of inbound TCP segments > c) tcpInErrs - total number of inbound TCP segments with errors > (most likely checksum mismatches, but some implementations > may count other error discards here as well) > > ---------------- Slide Decks ------------------------------------ > > Sorry, but these materials aren't on the web yet. Hopefully > they will be in the next week or two. I'll email when they are > available on a server somewhere. > > * "iSCSI Framing Presentation" - slides/spreadsheet - Matt Wakeley > * "TCP Framing Discussion" - slides - Jim Wendt > * "Recovering From iSCSI Digest Errors" - slides - Mark Bakke > * "Expected iSCSI digest error rates on Internet connections" > - spreadsheet - Mark Bakke > * CRC and checksum performance - slides - Jonathan Stone > > ---------------- Framing Discussion Summary ---------------------- > > /iSCSI usage scenarios > A wide variance of usage scenarios were strongly represented: > * High-speed short-distance storage LANs > * High-speed long-distance storage WANs > * Multitudes of low-end clients using software iSCSI client > implementations and unmodified software TCP stacks > * Multiple first-generation 1Gbps clients aggregating to > next-generation 10Gbps storage arrays > * A variety of IP networks and paths with potential for both > TCP-level resegmenting middleboxes and dynamic changes in Path MTU. > > /Memory-based solutions > * It was felt that 1Gbps memory-based solutions are feasible and > may be cost-competitive (e.g. there is no usage of direct data > placement nor framing mechanisms in this case) > * There were different opinions regarding whether 10Gbps memory-based > solutions would be cost-competitive or feasible for 10Gbps. > * There was discussion regarding the comparative cost of memory-based > and no-memory iSCSI HBAs and infrastructure relative to Fibre Channel > * There were concerns regarding next-generation 10Gbps storage arrays > that want to support first-generation clients. The next-generation > 10Gbps storage arrays can only implement no-memory solutions if the > first-generation clients were mandated to implement support for > framing (thus making direct data placement possible on the storage > array). > * Hybrid schemes were discussed where a next-generation 10Gbps > storage array would contain a moderate amount of memory to handle > non-framing first-generation clients while using full framing and > direct data placement with no memory buffers for 10Gbps clients > * Matt Wakeley has created a spreadsheet for high-speed memory > subsystem costs > > /Direct data placement alternatives > * Discussion of various levels at which direct data placement > information can ride: > - Above TCP (iSCSI task tags, RDMA protocol) > - At transport (TCP RDMA option) > - Below transport (TAF) > > /iSCSI layering scenarios and evolution > * iSCSI can be layered: > - over normal TCP > - over Markers over normal TCP > - over Framing TCP > - over RDMA+chunking over Framing TCP > * Layering iSCSI over RDMA+chunking doesn't seem likely for > first-generation iSCSI implementations > > /Framing alternatives > * Framing mechanism classes: > - Intervalic (Periodic Markers, Periodically aligned headers, > Fixed size ULP PDUs) > - Framing aware TCP (ala ULP Framing over TCP I-D) > - TCP message boundary indications (Reserved bit, TCP option, > URG pointer, PSH bit, etc) > - Byte stuffing (COBS, 7B/8B, etc) > * Framing mechanism characteristics: > - sender TCP modifications required? > - receiver memory requirements (full TCP receive window, > eddy buffers, IP reassembly buffers) > - level of TCP changes (none, behavioral, header fields) > - support ULP PDU > TCP MSS > - software processing overhead > - hardware implementation complexity > - handle dynamic Path MTU changes > - handle resegmenting TCP middlebox > - require [dynamic] chunking above TCP > - emit short segments more often than typical > - added protocol bytes overhead > - tied to TCP sequence number processing > - increase probability of segment drops > - TCP aesthetics > > /Markers and ULP Framing merge > * Proposal to merge Marker mechanism into current "ULP Framing for > TCP I-D" and have iSCSI mandate implementation of Marker mode > * See "Framing: The Bottom Line" section above > > ---------------- Error Recovery Discussion Summary --------------- > > /Mark Bakke slides and spreadsheet > * Mark presented slides and a spreadsheet discussing: > - expected iSCSI header and data digest error rates given > link bandwidth, RTT, probability of segment drop, and > probability of TCP checksum escape > - recommended iSCSI error handling approaches for Header > digest and data digest errors > - <slides link> > - <spreadsheet link> > > /Discussion regarding iSCSI error recovery complexity > * It was felt that 90% of the recovery complexity already exists > for the sake of session recovery (aftet a TCP connection failure) > and if only "within-command" recovery was eliminated, it wouldn't > substantially simplify the protocol or its specification. > However, this assertion needs to be validated. > * It was felt that complete command recovery would probably be a > dequate for the expected error incidence (not a noticable impact), > but it hasn't been shown how adopting this approach would reduce > complexity. > > /Discussion re: IPSec SA's > * There was some discussion regarding use of IPSec and concerns > that the set of Security Associations would not fit into on-chip > memory, forcing the SAs to be cached in off-chip memory. > > /Jonathan Stone slides > * Jonathan presented data analysis from his soon-to-be-completed > dissertation regarding the nature of empirically-observed > transport-level errors, and the error detection performance of > CRC and checksum algorithms on such. > > ---------------- List of Attendees ----------------------- > > Mark Bakke, Stephen Bailey, Uri Elzur, Somesh Gupta, Randy Haagens, > John Hufferd, Jim Pinkerton, Venkat Rangan, Allyn Romanow, > Costa Sapuntzakis, Julian Satran, Jonathan Stone, Matt Wakeley, > Jim Wendt, Jim Williams > > -------------------------------------------------------------- -- Mark A. Bakke Cisco Systems mbakke@cisco.com 763.398.1054
Home Last updated: Tue Sep 04 01:04:16 2001 6315 messages in chronological order |