|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: TCP (and SCTP) sucks on high speed networksMatt I think you are assuming (maybe rightly) in this example that the advertized window size exactly matches the bandwidth-delay product. (Only True if CWND is limiting factor due to loss due to congestion) However if the advertized window was - say - four times the BDP and CWND was similiar since there was no packet loss due to congestion recently, then losing one packet will cut the window down to one half what it was, and send a new packet for each duplicate ack received. The window size should now be twice BDP and throughput not be reduced, since CWND is twice the BDP and duplicate acks are arriving. Losing another packet however, causes the data rate on the line to drop, hence congestion avoidance. So I think losing one packet due to CRC type error can be handled. It's when Slow start kicks in that the line gets underutilized, and if it kicks in it means that you are into a congestion situation most likely. I have not seen any document mandating that the max receiver side advertized window size must match the RTT, or at least mandating that it should not be greater than it, or that it should track it in some way. Your right though in stating that TCP did assume every duplicate ack to indicate packet loss due to CONGESTION. In effect my understanding is that the fast re-transmit, fast recovery, congestion avoidance algorithm a kind of assumes that the lost frame is due to re-ordering (not congestion). Thus we can use this scheme to handle any single packet loss and not affect throughput. This can defeat the congestion avoidance algorithm in the single packet loss case, If I understand it's workings correctly in this case. I think that the advertized window in the gigabit, vast range of RTT world we're entering requires that there be an algorithm which links the RTT estimate to the max advertized window size. I have not seen any such algorithm discussed. But then I may have not read the right RFC's /publications etc. Part of the problem is that the RTTs in both directions may be different. This might be detected though by looking at the line utilization % on the receive port in some instances. This is an area of TCP that I would like to see examined/researched/discussed w.r.t iSCSI and the requirement for efficent memory usage on TOE type implementations. Of course if CWND is just at the BDP for the line due to congestion, then, of course what you say is true. But thats not what we're talking about here. If I've misunderstood something here then please correct me. URG POINTER/ Framing. What's the basis for leaving this in the spec ?. Surely you would want something better. Again I say that I believe that there is not a big memory issue on the LAN, and thus not a big cost issue. If iSCSI is not successful in the LAN I fail to see how it will be successful at all. The disaster recovery MAN link has not got a huge memory requirement either. The general problem of 10G links half way around the world.....let's solve that when iSCSI is successful in the LAN and customers have a real problem paying $100/$200 for memory for their 10G iSCSI adaptor connecting their clear channel link between US and Europe. Dick Gahan 3Com Matt Wakeley <matt_wakeley@agilent.com> on 01/12/2000 07:44:09 Please respond to Matt Wakeley <matt_wakeley@agilent.com> Sent by: Matt Wakeley <matt_wakeley@agilent.com> To: end2end-interest@ISI.EDU, ips@ece.cmu.edu cc: (Dick Gahan/IE/3Com) Subject: TCP (and SCTP) sucks on high speed networks TCP's "congestion avoidance" algorithms are not compatible with high speed, long distance networks. The "cut transmit rate in half on packet loss and increase the rate additively" algorithm will simply not work. Consider a 10Gbs link to a destination half way around the world. A packet drop due to link errors (not congestion or infrastructure products) can be expected about every 20 seconds. However, with a RTT of 100ms (not even across the continent), if a TCP connection is operating at 10Gbs, the packet drop (due to link error) will drop the rate to 5Gbs. It will take 4 *MINUTES* for TCP to ramp back up to 10Gbps. Therefore, there needs to be a change to TCP's congestion avoidance algorithm for future high speed networks. Since SCTP is based on the same algorithms, it is doomed to the same fate. -Matt PLANET PROJECT will connect millions of people worldwide through the combined technology of 3Com and the Internet. Find out more and register now at http://www.planetproject.com
Home Last updated: Tue Sep 04 01:06:14 2001 6315 messages in chronological order |