|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"Steph, Not to beat a dead horse, the reason link level CRCs may not be of much help is because of the following. The paper "When the CRC and TCP Checksum Disagree" section 5.1 describes the data transmission path and potential for error introduction at various points in the path. At a layer 3 device upon you have: 1. The existing link-level CRC verified and stripped. 2. The payload (IP packet) DMA'ed into some buffers, preserving the original IP header checksums and TCP checksums. 3. Create a new link-level header. 4. Compute a new CRC. 5. Data sent to the next hop. If an error is introduced (software or hardware) in steps 2 and 3, the new CRC introduced in step 4 isn't of any help. The introduced error can be: 1. In the IP header (such as IP address bytes were munged). 2. In the TCP header (such as the port got corrupted). 3. In the TCP checksum itself. 4. In the payload. Error categories 1 and 2 may cause the packet to be not delivered at all. It is okay if we do not detect these because they are not delivered to the iSCSI processing layer. Error 3 would cause the packet to be rejected. Error 4 should normally catch the error, but at an escape rate of 1 in 10e8 escapes detection. (Actually I'm not sure if given the error bias to the headers, this rate is the rate within the payload of TCP segment). The iSCSI header and data digest is present to detect that escape. In the presence of middle boxes that do more than layer 2 forwarding, (say a box that terminates a TCP connection and re-initiates a new connection) and if the middle box retains the iSCSI header and data digests but only computes a new checksum, the transmission path exposure is similar to 2 and 3 above. The header and data digests will enable detection of that. If the middle box does more than just terminate TCP connections and changes the iSCSI header and recomputes a new iSCSI header digest and leaves the data digest alone, at least the data part is protected, but not the header. If it changes both header and data, there is no protection. In order to get true end-to-end protection, the application needs to apply a separate digest, such as creating a 516-byte data block for every 512-byte sector of data and storing that in the media. So, the escape rate depends quite a bit on number of middle boxes and the exposure of data paths. How much do we rely on middle boxes to never introduce an error during the exposure? Since the referred papers suggest correct end-to-end delivery of TCP segments with checksum errors in them, the presence of exposed paths in the middle boxes has been a factor. Still, rates quoted (1 in 200 million or 1 in 300 million) suggests that it is necessary to have very strong CRC and detection mechanisms, but it may not be necessary to optimize the recovery options, so we are able to recover with the smallest amount of retransmission of data. I haven't studied the two other references on the subject, but again I suspect there is evidence to suggest that errors will creep in at intermediate processing elements. Venkat Rangan Rhapsody Networks Inc. http://www.rhapsodynetworks.com -----Original Message----- From: Stephen Bailey [mailto:steph@cs.uchicago.edu] Sent: Monday, April 09, 2001 11:57 AM To: ips@ece.cmu.edu Subject: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport" > Exactly, I've worked in this context (though its been some years now). > It was true (at one time) that tape had a tractability limit, e.g., > a tape backup of a terabyte was out of the question. Has that changed? I think this is precisely the point. Existing, off-the-shelf SCSI solutions DO NOT presently solve this problem. Both ||SCSI an FCP burp the operation on a expectable, O(days) failure rate. The rate of adoption for the FCP-2 command recovery feature is overwhelming to the point that the tape guys have been talking about end-running the problem with explicitly addressed commands. What we have running iSCSI on TCP is such a drastic improvement in what you can expect from your SCSI service that we can eventually expect a disruptive change. Trying to engineer it to the point where its 2^100 times more disruptive, when we don't really know where it's taking us in the first place is meaningless. [Warning: repetition ahead] TCP + link layer error detection is engineered precisely to ensure reliable data delivery. It's clear from an engineering stand point that it is likely (not guaranteed, what is?) to do this quite well. In spite of much research, it seems like nobody here has come up with a strong indication that TCP + link layer error detection does NOT do its job well. I do not think this is because nobody has ever looked at the problem. The lack of concrete information to support the case that TCP + link layer error detection is inadequate has us chasing our tails. Given the layer iSCSI occupies in the protocol layer cake, if we don't try to solve which is presently assigned to a lower layer, it seems quite comfortable to shim additional checks or recovery, or even a completely different transport substrate underneath if we do discover TCP + link layer error detection is not doing the trick, but it really seems like folly to engineer based upon an assumption that nobody has done a good job documenting. Steph
Home Last updated: Tue Sep 04 01:05:08 2001 6315 messages in chronological order |