|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI: MarkersI'm new to this list, so I should introduce myself. My name is Stuart Cheshire; I'm the author of Consistent Overhead Byte Stuffing (COBS), the framing technique from which COWS derives. I'm not working on any iSCSI product, but if COBS can contribute to iSCSI, then I'm happy to offer a little of my time, as much as I can spare, to help clarify what COWS does and does not do, to help people make an informed decision whether or not COWS is the right solution for iSCSI. ---- Assumption: A high-performance receiver is harder than a high-performance sender. This is because the sender is in control. It knows where the data is coming from in memory, and where it is going to on the network. The sending host knows and can control all aspects of the communication: what order iSCSI messages are delivered onto the wire, how big each one is, and at what time they are sent. If the sender wants to do some kind of housekeeping that prevents it from sending packets for a few milliseconds, then it has the option of doing that without terrible consequences. The receiver has a much harder time. It never knows what packet is going to arrive next, or how big it will be, or where it will be from, or where it will have to go to in memory. Packet loss/corruption/reordering makes things even more unpredictable. A receiver doesn't have the luxury of being able to not receive packets for a few milliseconds if it is busy with something else. For this reason, it makes sense to see what the sender can do to make the receiver's life a little easier. If the receiver could receive each TCP segment and process it in isolation, determining where to place it in memory solely from information within that TCP segment, without reference to data from other TCP segments (which may not have arrived yet), then it would be easier to make a high-performance receiver. What can we do to enable independent segment processing and idempotent direct data placement at the receiver? My first choice would be to add a couple of extra bits to the TCP header; a "start of message" bit and an "end of message" bit. The "start of message" bit indicates that the first byte of TCP data in the segment is also the first byte of an iSCSI message; the "end of message" bit indicates that the last byte of TCP data in the segment is also the last byte of an iSCSI message. When a receiver receives a TCP segment with both bits set, it knows with certainty that it has one (or more) complete iSCSI messages in the TCP segment and can immediately decode enough of the iSCSI message header(s) to determine where in memory to place the data. Unfortunately, adding extra bits to the TCP header is not viable. From a political point of view, trying to change the TCP on-the-wire protocol is a non-starter. From a practical point of view, there are too many routers and firewalls and similar devices that will throw away TCP packets with bits they don't understand. Given that out-of-band framing using header bits is not possible, the alternative is in-band framing using only information in the TCP data stream itself. If we can design our sender to normally send exactly one iSCSI message per TCP segment, and we have a way for our receiver to reliably verify that the received TCP segment contains exactly one iSCSI message, then the receiver can implement idempotent direct data placement for each TCP segment as it is received, without reference to state from previous TCP segments on that connection (which may not have arrived yet). The problem left to solve is how the receiver can reliably verify that the received TCP segment contains exactly one iSCSI message. It can do this by checking to see whether the TCP segment data begins with some special marker pattern, as long as it knows that this special marker pattern cannot appear anywhere within the body of valid iSCSI message data. This necessarily entails processing ("stuffing") the body of the iSCSI message to eliminate inadvertent occurrences of the special marker pattern before sending, and then reversing this transformation to restore the original data after reception. If the receiver finds that the segment does not begin with the special marker pattern, then it knows that the sender segmentation has not been maintained (or it is talking to an old TCP sender that doesn't support sender segmentation) and it has to fall back to treating the TCP data stream as a raw unstructured byte stream, with message boundaries indicated by occurrences of the the marker pattern. The important thing is that the receiver still works correctly, even though the performance will be lower. This prefer-sender-segmentation-but-verify approach is important. If the outgoing data is not processed to guarantee that the special marker pattern cannot occur, then malicious users might be able to subvert the protocol by putting contrived patterns in their data. Remember the days where you could make a user's modem hang up by sending them an email containing the text "+++ATH"? (Apologies to anyone reading this via modem who just had their telephone line hang up.) Another benefit of using in-band framing like this is that we can deploy it immediately using unmodified TCP stacks. In the future we can use enhanced sender TCP implementations that take steps to maintain segment boundaries, and smart receivers will get a performance boost from that, but it is a compatible upgrade that changes only the implementation, not the on-the-wire protocol. Of course, we don't get anything for free. If we want to receiver to be able to determine with 100% certainty that it has received a complete iSCSI message in one TCP segment, then the sender will have to do some work to enable that. This is the cost of COWS. It gives 100% framing certainty, but at the cost of checking the outgoing data for inadvertent occurrences of the special marker pattern, and eliminating them. There's no way for a sender to tell whether the outgoing data contains inadvertent occurrences of the special marker pattern if the sender is not willing to look at the data. On the plus side, the cost of COWS encoding is modest compared to some alternatives. COWS-encoding adds a little header but otherwise doesn't change the size of the outgoing data, ever. No matter how many occurrences of the framing marker pattern are found, the encoded output length is always exactly the same: the length of the input plus the length of the fixed-size framing header (typically two words). If the framing marker pattern is chosen to be something that is rare in normal (non-malicious) data, then in the common-case the encoding step will be a read-only operation: scan the data, determine that it contains no framing markers, set the COWS header to indicate that the data contains no framing markers, and send it. In contrast, when using Fixed Interval Markers, if a marker happens to fall in the middle of the data you are sending, then it creates a 'hole' in the middle of data that used to be contiguous, and the block of outgoing data changes size. On the receiving side, the 'hole' created by the marker has to be repaired in the process of transferring the data into memory. When using Fixed Interval Markers, when a receiver gets a TCP segment that contains no marker, it cannot reliably determine what it is supposed to do with that segment (where to put it in memory) without referring the state from the previous TCP segments of that connection. I don't believe that FIM can provide efficient idempotent direct data placement for inbound TCP segments, because you can't rely on any given received segment containing a marker via which the receiver can verify that the segment contains a complete iSCSI message. In summary: My first choice would be to modify the TCP protocol to support preservation of upper-level message boundaries. Given that this is not possible, I think COWS provdes a good alternative. Stuart Cheshire <cheshire@apple.com> * Wizard Without Portfolio, Apple Computer * Chairman, IETF ZEROCONF * www.stuartcheshire.org
Home Last updated: Sat Jan 12 18:17:54 2002 8376 messages in chronological order |