|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI: RE: Framing DiscussionWayland, Let's ignore for the first approximation anything that is not data. Our assumption when outlining iSCSI was that all send data (including most of the iSCSI headers can be rebuilt from context). The only context information a sender has to keep is an association of a sender TCP sequence number with either an iSCSI header or a SCSI buffer address and length. The details will certainly vary with implementation. In a loosely integrated solution the TCP stack will keep this context for you. In a tightly integrated solution you will have the TCP, on resend, call-back the iSCSI layer to rebuild headers and restate data addresses in which case iSCSI has to keep in its tables some relation between the TCP sequence number and a specific data piece (Task Tag, Data-packet-# etc.). Regards, Julo Wayland Jeong <wayland@troikanetworks.com> on 26/12/2000 22:08:41 Please respond to Wayland Jeong <wayland@troikanetworks.com> To: Wayland Jeong <wayland@troikanetworks.com>, ips@ece.cmu.edu cc: Subject: iSCSI: RE: Framing Discussion I think I now understand the assumptions regarding the single-bit TCP option for indicating the presence of a PDU header in the current segment (thanks Costa). It boils down to a hardware implementation that is tuned for the best case (i.e. small amount of re-assembly memory on the NIC to park fragmented PDU's with the assumption that the next aligned PDU is coming shortly) and a method for dropping things into software when we are talking to an ill-behaved NIC. Okay, I'll buy that. Now, I have another dumb question. For direct data placement, the discussions have been centered mostly around the need for alignment when parsing PDU's on the receiving iSCSI TOE. One potential issue that has not been discussed is the problem of how to handle re-transmission on the sending iSCSI TOE. From previous discussions, I am assuming that our goal is to avoid having a network BWDP worth of memory on the NIC. The receiver can avoid this memory by recovering PDU alignment in the TCP stream and using the self-describing headers in the wire protocol (either iSCSI offsets or a RDMA shim layer) to put the data directly in the buffer cache. On the sending side, we can DMA directly from iSCSI descriptor CDB's into the TCP pipe using a hardware path. But, unless we keep all of those un-acked TCP segment buffers around in the NIC, it will be difficult to recover the context when we have to re-transmit. Let's suppose that we have an iSCSI TCP connection in which we have multiple outstanding I/O's. Thus, the byte stream has interleaved within it commands and data from different I/O's. When we detect a dropped segment either through normal TCP congestion or via SACK, how do we map the missing byte block to the appropriate context? If we keep the segments around, then we could match the missing segment easily and re-transmit. But that would require the NIC to implement a BWDP's worth of transmit buffer memory. To have the iSCSI TOE re-transmit directly from the buffer cache, it seems that we would need some sort of context that would allow us to map a byte window to a specific, meaningful point somewhere in the middle of a CDB context. Essentially, you need enough context to be able to re-construct the TCP fifo since the memory in this fifo has since been effectively re-allocated. Maybe this isn't too hard, but it sure sounds like a difficult problem for hardware to solve. But, as the software folks around here keep telling me, "it's just gates" ;-) -Wayland
Home Last updated: Tue Sep 04 01:06:01 2001 6315 messages in chronological order |