iSCSI Data Integrity - Digests

To: ips@ece.cmu.edu
Subject: iSCSI Data Integrity - Digests
From: Michael Krause <krause@cup.hp.com>
Date: Fri, 26 Jan 2001 18:31:31 -0800
Content-Type: multipart/alternative;boundary="=====================_104576853==_.ALT"
Sender: owner-ips@ece.cmu.edu

Use existing / proven CRC algorithms and techniques to provide fast market enablement while avoiding a "reinvention of the wheel" exercise.
Provide strong end-to-end data integrity for both iSCSI PDU header and data payload
CRC is required for all implementations, i.e. strong end-to-end data integrity is not an option as customers will not adopt solutions without such guarantees.

An iSCSI PDU shall be restricted to a single TCP segment. Multiple iSCSI PDU may be present within the same TCP segment but none shall span multiple segments.
Each iSCSI PDU is protected by a trailing 32-bit CRC (Ethernet polynomial), i.e. a single CRC covers the entire iSCSI header and data.

iSCSI PDU header is not modified during transmission. While there has been some discussion of a desire to provide such capabilities in the future, there are no current requirements to support requiring the specification to take this into account at this time. Should this become a requirement in the future, an intermediate endnode supporting iSCSI header modification would need to guarantee strong data integrity within its implementation using any of the well-known / deployed techniques.

Strong end-to-end data integrity using a well-known, proven technology.
Low-cost, high-speed hardware implementations with readily available hardware cores can be created with minimal design complexity.
Only one CRC which can be implemented in software mitigating the performance impacts this iSCSI data integrity would impose.
Ability to accelerate software iSCSI implementations using a slightly modified NIC to perform the CRC calculation / verification for both inbound and outbound data streams. This modified NIC would only require minor understanding of the iSCSI header, i.e. to identify it and locate the CRC within the data stream. The CRC can be verified while coming in off the "wire" or inserted while being placed on the "wire". This technique is well understood since it is very similar to what is implemented by TCP checksum off-load implementations in use today.
- Note: A NIC implementing this functionality could combine the verification of the TCP checksum into a "one-stop" verification operation and silently drop invalid packets or tag them as "bad" for ULP processing.
Solves the framing problem while eliminating the need for future support of "chunking" / RDMA technology. Each PDU header contains sufficient information required for direct data placement providing the same benefits attributed to chunking / RDMA. This will also allow simplified "bridge" solutions to be constructed, e.g. iSCSI-to-InfiniBand, iSCSI-to-SRP, etc.
Eliminates the need to maintain intermediate CRC results (both inbound and outbound) reducing implementation cost / complexity.
Eliminates bandwidth waste by reducing the number of bytes required to guarantee end-to-end data integrity while supporting multiple small PDU per segment (compaction)
Provides improved QoS arbitration control / management - if a PDU were allowed to span multiple segments, then an implementation would need to transmit segments back-to-back (or very close) to deliver strong end-to-end performance / transaction throughput. This may be implementation-specific but is still a tangible benefit for customers.
If an intermediate endnode performs re-segmentation, a PDU may be span multiple segments. This would be detected by a PDU CRC error providing a simple detection mechanism allowing implementations to recover either at the connection or session level.

iSCSI implementations must be able to determine each connection's MSS and create iSCSI PDU that fit within the MSS. Such functionality is available in a variety of TCP implementations today and for hardware implementations.
- For the send-side retransmission problem (i.e. how to delineate packets within a byte stream), a hardware implementation is straight-forward to support since it provide the PDU-segment correlation.
- For a software implementation, the mbuf / mblk encompassing the iSCSI PDU would be marked to indicate whether the associated buffer should be sent within a separate segment or not. This is not common to any TCP implementations to date but is not difficult to implement. It should also be noted that this is an implementation not a TCP protocol issue.
- If a layer 4 intermediate endnode glues together two TCP streams and is not iSCSI aware, the send-side retransmission is a problem. However, it is unclear whether this usage model must be transparently supported by iSCSI, i.e. such an intermediate endnode should be required to be iSCSI aware. This is not unreasonable as most layer 4 intermediate endnodes are providing some value-add service as a function of layer 4; why wouldn't such an endnode provide iSCSI value-add and thus be layer 5 aware.

An iSCSI PDU shall be restricted to a single TCP segment. Multiple iSCSI PDU may be present within the same TCP segment but none shall span multiple segments.
Each iSCSI PDU is protected by two CRCs - one invariant and one variant. The invariant CRC (ICRC) is a 32-bit CRC covering the PDU data and invariant header fields (e.g. address). The variant CRC (VCRC) is either a 16 or 32-bit CRC that covers the entire PDU header, data, and invariant CRC. PDU layout would be: header, data, ICRC, VCRC.
- Note: This scheme is conceptually the same as what is used in InfiniBand providing customers and the industry with a single paradigm and improved technology integration for both compute and storage endnodes.

Supports an intermediated endnode updating iSCSI header fields while supporting strong end-to-end data integrity of all invariant header fields and data. It is critical that all invariant header fields such as target address be protected at all times to avoid silent data corruption / illegal memory access since these fields are used to DMA the data into / from target memory.
- Note: This problem does not exist in IP-based applications today since such implementations do not expose addresses across the wire but use look-up techniques as a function of the header. iSCSI implementations may choose to use a similar technique but at the cost of increased resources / complexity.
Limits the complexity / overhead required to support a separate header CRC - e.g. intermediate byte-stream CRC injection / verification. This simplifies the hardware implementation for full off-load solution as well as provides the ability to create simplified CRC acceleration as described in alternative 1 for software-based iSCSI implementations.
Use of two trailer CRCs does not impact overall end-to-end performance or endnode hardware resources. Implementations are gated more by the memory subsystems / cache coherency overheads than by external wire speed transmission, i.e .the packet will, in general, arrive before one could complete the first few cache line fetch operations. As such, given the single-segment operation, the data can be verified as it comes in off the wire and the memory operations initiated with minimal latency (most operations will be pipeline operations within a few cycles).
An intermediate endnode can provide data integrity checks while data is in-flight and stomp the CRC should it detect an error. This allows packet flow-through to be supported while providing fault isolation and a single for subsequent endnodes to drop invalid packets if they desire.

Invariant header fields must be identified and included within the ICRC calculation adding minor complexity to the overall implementation.

Allow a PDU to span multiple TCP segments.
Implement two CRC: a header CRC and a data CRC.
Do not allow intermediate endnodes to modify the iSCSI header.

Increased implementation complexity and overhead. The header CRC must occur following the header requiring injection / removal within the endnodes. This complexity is compounded for variable header protocols such as iSCSI and is why such a solution has been rejected in other high-speed technologies.
Requires intermediate CRC state to be maintained for both inbound and outbound requests.
Increased QoS scheduling complexity for strong end-to-end application throughput.
Does not solve the framing problem perhaps necessitates the need for a chunking / RDMA solution. This increases solution complexity and creates interoperability / support issues for customers, i.e. options are bad for developers; bad for customers.
Severely limits creating high performance iSCSI software-based implementations perhaps making them impractical as a general purpose implementation. This will limit the potential market for iSCSI solutions.
Note: If an intermediate endnode is allowed to modify the PDU header, then there exists a possibility of silent data corruption since the invariant portions no longer have end-to-end data integrity. This will be a major issue for customers in terms of their ability to adopt iSCSI across a variety of solution spaces, i.e. if there is the potential for silent data corruption, then customers will not deploy iSCSI and will turn to alternatives that provide stronger end-to-end data integrity.

Follow-Ups:
- Re: iSCSI Data Integrity - Digests
  - From: Matt Wakeley <matt_wakeley@agilent.com>
- RE: iSCSI Data Integrity - Digests
  - From: "Douglas Otis" <dotis@sanlight.net>

Prev by Date: Re: iSCSI : Command Ordering Proposal.
Next by Date: Re: iSCSI : Digest Error Problems & CmdSN/ExpCmdSN window issues
Prev by thread: List of current iSCSI drafts
Next by thread: RE: iSCSI Data Integrity - Digests
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:05:40 2001
6315 messages in chronological order