Re: TCP (and SCTP) sucks on high speed networks

To: Matt Wakeley <matt_wakeley@agilent.com>
Subject: Re: TCP (and SCTP) sucks on high speed networks
From: "Dick Gahan" <Dick_Gahan@eur.3com.com>
Date: Fri, 1 Dec 2000 17:15:54 +0000
cc: end2end-interest@ISI.EDU, ips@ece.cmu.edu
Content-Disposition: inline
Content-type: text/plain; charset=us-ascii
Sender: owner-ips@ece.cmu.edu



Matt
     I think you are assuming (maybe rightly) in this example that the
advertized window size exactly matches
     the bandwidth-delay product. (Only True if CWND is limiting factor due to
loss due to congestion)

     However if the advertized window was - say - four times the BDP  and CWND
was similiar
     since there was no packet loss due to congestion recently, then losing one
packet
     will cut the window down to one half what it was, and send a new packet for
each duplicate ack
     received. The window size should now be twice BDP and throughput not be
reduced,
     since CWND is twice the BDP and duplicate acks are arriving.

     Losing another packet however, causes the data rate on the line to drop,
hence congestion
     avoidance.

     So I think losing one packet due to CRC type error can be handled.  It's
when Slow start kicks in
     that the line gets underutilized, and if it kicks in it means that you are
into a congestion situation
     most likely.

     I have not seen any document mandating that the max receiver side
                advertized window size must match the RTT, or at
     least mandating that it should not be greater than it, or that it should
track it in some way.

     Your right though in stating that TCP did  assume every duplicate ack to
indicate packet loss due to
     CONGESTION. In effect my understanding is that the fast re-transmit, fast
recovery, congestion
     avoidance algorithm a kind of  assumes that the lost frame is due to
re-ordering (not congestion).
     Thus we can use this scheme to handle any single packet loss and not affect
throughput.
     This can defeat the congestion avoidance algorithm in the single packet
loss case, If
     I understand it's workings correctly in this case.

     I think that the advertized window in the gigabit, vast range of RTT world
we're entering requires that
     there be an algorithm  which links the RTT estimate to the max advertized
window size.
     I have not seen any such algorithm discussed. But then I may have not read
the right RFC's
     /publications etc. Part of the problem is that the RTTs in both directions
may be different.
     This might be detected though by looking at the line utilization % on the
receive port in some instances.

     This is an area of TCP that I would like to see
examined/researched/discussed w.r.t iSCSI and the
     requirement for efficent memory usage on TOE type implementations.

     Of course if CWND is just at the BDP for the line due to congestion, then,
of course what you say is
     true. But thats not what we're talking about here.

     If I've misunderstood something here then please correct me.


     URG POINTER/ Framing.

     What's the basis for leaving this in the spec ?. Surely you would want
something better.
     Again I say that I believe that there is not a big memory issue on the LAN,
and thus not a big cost
     issue. If iSCSI is not successful in the LAN I fail to see how it will be
successful at all.
     The disaster recovery MAN link has not got a huge memory requirement
either.

     The general problem of 10G links half way around the world.....let's solve
that when iSCSI
     is successful in the LAN and customers have a real problem paying $100/$200
for memory
     for their 10G iSCSI adaptor connecting their clear channel link between US
and Europe.


Dick Gahan
3Com






Matt Wakeley <matt_wakeley@agilent.com> on 01/12/2000 07:44:09

Please respond to Matt Wakeley <matt_wakeley@agilent.com>

Sent by:  Matt Wakeley <matt_wakeley@agilent.com>


To:   end2end-interest@ISI.EDU, ips@ece.cmu.edu
cc:    (Dick Gahan/IE/3Com)
Subject:  TCP (and SCTP) sucks on high speed networks




TCP's "congestion avoidance" algorithms are not compatible with high speed,
long distance networks.  The "cut transmit rate in half on packet loss and
increase the rate additively" algorithm will simply not work.

Consider a 10Gbs link to a destination half way around the world.  A packet
drop due to link errors (not congestion or infrastructure products) can be
expected about every 20 seconds.  However, with a RTT of 100ms (not even
across the continent), if a TCP connection is operating at 10Gbs, the packet
drop (due to link error) will drop the rate to 5Gbs.  It will take 4 *MINUTES*
for TCP to ramp back up to 10Gbps.

Therefore, there needs to be a change to TCP's congestion avoidance algorithm
for future high speed networks.  Since SCTP is based on the same algorithms,
it is doomed to the same fate.

-Matt






PLANET PROJECT will connect millions of people worldwide through the combined
technology of 3Com and the Internet. Find out more and register now at
http://www.planetproject.com

Prev by Date: RE: Urgent as Framing Hint?
Next by Date: Re: TCP (and SCTP) sucks on high speed networks
Prev by thread: iFCP: Tunneling Example (Was iFCP on the agenda)
Next by thread: Re: TCP (and SCTP) sucks on high speed networks
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:06:14 2001
6315 messages in chronological order