|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI: No FramingIn message <ED8EDD517E0AA84FA2C36C8D6D205C1301CBF2C5@alfred.xiotech.com>, "Peglar, Robert" writes: >The original thread began with a question (paraphrased) about '...what >applications could consume a 10G pipe for long periods of time'. I answered >that question - disk-disk backup and subsystem replication. Even disk-to-disk applications or backup applications really want approximately BW*RTT worth of buffering. Hugh Holbrook's recent Stanford PhD thesis traces the conventional wisdom back to an email from Van Jacobson to the e2e list in 1990. It's reasonably well-known in the TCP community that TCP slow-start generates spiky traffic. It leads to bursts of high buffer occupancy (e.g., at the point where the exponential rampup switches to congestion avoidance.) Indeed, that was the motivation behind TCP-Vegas, and the recent work on TCP pacing. The whole debate over framing/marking only makes sense if one views outboard NIC buffering of RTT*BW as very expensive (e.g., forcing a design from onchip RAM to external SRAM). Adding framing of iSCSI PDUs allows the NIC to continue doing direct data placement into host buffers, accomodating the BW*RTT of TCP buffering in "cheap" host RAM rather than "expensive" NIC RAM. But you can't get away from providing the buffers. Not unless you are also willing to artificially restrict throughput. If iSCSI doesn't provide some form of framing, then what can a NIC on a MAN with medium BW*RTT do, if it sees a drop? It has only a few choices: 1. start buffering data outboard, hoping that TCP fast-retransmit will send the missing segment(s) before the outboard buffers are exhausted; 2. Give up on direct data placment, and start delivering packets to host memory, any old how --at the cost of SW reassembly and alignment problems, and a software CRC, once the missing segment is recovered. 3. Start dropping packets, and pay a huge performance cost. There are some important caveats around the BW*RTT: if we can *guarantee* that the iSCSI NICs are never the bottleneck point, or that TCP never tries to reach the true link BW*RTT (due to undersized windows), then one can get away with less. (See Hugh Holbrook's thesis for more concrete details). But the lesson to take away is that even in relatively well-behaved LANs, TCP *by design* is always oscillating around overloading the available buffers, causing a drop, then backing off. See, for example, Figure 2 of the paper by Janey Hoe which introduced "New Reno"; or Fig. 2 and 3 of the paper by Floyd and Fall. New Reno avoids the long-timeouts between each drop, but the drops themselves still occur. Moral: TCP can require significant buffering even on quite modest networks. It __may__ be worth keeping framing, so that host NICs can do more of that buffering in host memory rather than outboard; and so they can continue performing DDP rather than software reassembly and software CRC checking. Storage devices are another issue again. References: Van Jacobson, modified TCP congestion avoidance algorithm. Email to end2end@isi.edu, April 1990. L Brakmo, , S O'Malley, L Peterson, TCP Vegas: new techniques for congestion detection and control, SIGCOMM 94. J Kulik, R Coulter, D Rockwell, and C Partridge, A simulation study of paced TCP. BBN TEchnical Memorandum 1218, BBN, August 1999. J Hoe, Improving the Startup Behaviour of a Congestion Control Scheme for TCP, ACM SIGCOMM 1996, S Floyd and K Fall, Simulation-based comparisons of Tahoe, Reno, and SACK TCP, Comp. COmm. Review no 6 v 3, April 1996. H Holbrook. A Channel Model for Multicast. PhD Dissertation. Department of Computer Science. Stanford University. August, 2001. http://dsg.stanford.edu/~holbrook/thesis.ps{,.gz}. (See Chapter 5.) (Holbrook cites Aggrawal, Savage, and Anderson, INFOCOMM 2000, on the downsides of TCP pacing; but I haven't read that. The PILC draft on link designs touch the same issue, but the throughput equations cited there factor out buffer size.) >FC is not sufficient. Storage-to-storage needs all the advantages as well >as that which iSCSI has to offer the host-storage model. But it will still need approximately BW*RTT of buffering, even for low-delay LANS. Or performance will fall off a cliff under "congestion" -- e.g., each time some other iSCSI flow starts up, begins competing for the same TCP endpoint buffers, on the same iSCSI device, and triggering a burst of TCP loss events for the storage-to-storage flow.
Home Last updated: Wed Feb 06 01:18:16 2002 8661 messages in chronological order |