|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI: RE: Framing Discussion[ stuff deleted ] >> Let's suppose that we have an iSCSI TCP connection in which we >> have multiple outstanding I/O's. Thus, the byte stream has >> interleaved within it commands and data from different I/O's. >> When we detect a dropped segment either through normal TCP congestion >> or via SACK, how do we map the missing byte block to the >> appropriate context? If we keep the segments around, then we >> could match the missing segment easily and re-transmit. But that would >> require the NIC to implement a BWDP's worth of transmit buffer memory. >> >> To have the iSCSI TOE re-transmit directly from the buffer cache, it seems >> that we would need some sort of context that would allow us to map a byte >> window to a specific, meaningful point somewhere in the middle of a CDB >> context. Essentially, you need enough context to be able to re-construct >> the TCP fifo since the memory in this fifo has since been effectively >> re-allocated. Maybe this isn't too hard, but it sure sounds like >> a difficult problem for hardware to solve. But, as the software >> folks around here keep telling me, "it's just gates" ;-) > > Yes, multiple I/O's and interleaved data streams require a context manager > who maps the missing segment back to its large exchange table to determine > how to retransmit the dropped segment. No, Wayland, I would not do it in > hardware. It is all in microcode. The microcode size is actually not that > big. On the contrary, the exchange table can be a few hundred KB's. All you > need is a very very fast microengine with small number of gates, a true > RISC. Please keep asking the "dumb" questions. I am mostly impressed by > your questions. > Thanks for the reply. You'll have to let me know which questions don't impress you ;-) Yes, I am assuming that the re-transmit process will be handled in firmware/micro-code. It's still gates though, they just happen to be in someone's uP core ;-) It still seems like a tough problem in the general case. Let's assume a worst case scenario. The iSCSI PDU size is greater than the TCP MSS and the network MTU and you are talking to a firewall that is re-packaging your TCP stream. No matter how much you try to send nicely aligned PDU's, the firewall is going to take your less than MSS size TCP segments and package them up so that you get full-size TCP segments by the time it hits the target. The target detects a missing segment and keeps the left edge of the window constant for three consecutive ACK's. Furthermore, we are using the SACK option in TCP to optimize our performance over LFN's. Thus, we are presented with the exact blocks that are missing. Unfortunately, these missing blocks have fragments of PDU's from different I/O's (could be command, could be data). Even worse, since we chose a PDU size greater than MSS, some segments might be part of a PDU that does not contain an iSCSI header and does contains a digest covering the entire PDU. Yeesh!! Thank goodness all I have to do is drop-in an embedded processor into our chip. I'll let the firmware folks deal with this problem. Certainly, this path does not have to be high-performance since we are going into congestion control anyway, but we have to deal with it. We can keep a context stack for the current open TCP sessions which contain mappings of TCP sequence numbers to specific CDB context locations (either command or offset within the gather list). We can keep this stack as deep as the maximum number of outstanding (i.e. not ack'd) TCP segments which for Randy's example (1.25Gbs and RTT of 100ms) is not too bad (about 8K entries). We can recover the contexts needed to re-build this missing TCP segment and re-construct entire PDU(s) so that any necessary digests can be re-calculated. We can then stage this data in memory somewhere and pull-out the exact TCP block that we need to re-send. Lovely. I'm not saying it's impossible, but I am saying that implementing Fibre Channel looks like a walk in the park compared to this stuff. BTW, regarding the current iSCSI draft. I didn't see a Login/Text key associated with negotiating the iSCSI PDU size. Is it assumed that an iSCSI implementation should handle any PDU size? > Y.P. Cheng, Connectom Solutions. > -Wayland
Home Last updated: Tue Sep 04 01:06:01 2001 6315 messages in chronological order |