|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] iSCSI: RE: Framing DiscussionI think I now understand the assumptions regarding the single-bit TCP option for indicating the presence of a PDU header in the current segment (thanks Costa). It boils down to a hardware implementation that is tuned for the best case (i.e. small amount of re-assembly memory on the NIC to park fragmented PDU's with the assumption that the next aligned PDU is coming shortly) and a method for dropping things into software when we are talking to an ill-behaved NIC. Okay, I'll buy that. Now, I have another dumb question. For direct data placement, the discussions have been centered mostly around the need for alignment when parsing PDU's on the receiving iSCSI TOE. One potential issue that has not been discussed is the problem of how to handle re-transmission on the sending iSCSI TOE. From previous discussions, I am assuming that our goal is to avoid having a network BWDP worth of memory on the NIC. The receiver can avoid this memory by recovering PDU alignment in the TCP stream and using the self-describing headers in the wire protocol (either iSCSI offsets or a RDMA shim layer) to put the data directly in the buffer cache. On the sending side, we can DMA directly from iSCSI descriptor CDB's into the TCP pipe using a hardware path. But, unless we keep all of those un-acked TCP segment buffers around in the NIC, it will be difficult to recover the context when we have to re-transmit. Let's suppose that we have an iSCSI TCP connection in which we have multiple outstanding I/O's. Thus, the byte stream has interleaved within it commands and data from different I/O's. When we detect a dropped segment either through normal TCP congestion or via SACK, how do we map the missing byte block to the appropriate context? If we keep the segments around, then we could match the missing segment easily and re-transmit. But that would require the NIC to implement a BWDP's worth of transmit buffer memory. To have the iSCSI TOE re-transmit directly from the buffer cache, it seems that we would need some sort of context that would allow us to map a byte window to a specific, meaningful point somewhere in the middle of a CDB context. Essentially, you need enough context to be able to re-construct the TCP fifo since the memory in this fifo has since been effectively re-allocated. Maybe this isn't too hard, but it sure sounds like a difficult problem for hardware to solve. But, as the software folks around here keep telling me, "it's just gates" ;-) -Wayland
Home Last updated: Tue Sep 04 01:06:01 2001 6315 messages in chronological order |