RE: Concensus Call on Urgent Pointer.

To: "Dick Gahan" <Dick_Gahan@eur.3com.com>, "Michael Krause" <krause@cup.hp.com>
Subject: RE: Concensus Call on Urgent Pointer.
From: "Douglas Otis" <dotis@sanlight.net>
Date: Mon, 27 Nov 2000 11:01:13 -0800
Cc: <ips@ece.cmu.edu>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;charset="iso-8859-1"
Importance: Normal
In-Reply-To: <802569A1.003FAA93.00@notesmta.eur.3com.com>
Sender: owner-ips@ece.cmu.edu

Dick,

High TCP checksum failures are not due to Ethernet equipment.  DSLAM
equipment would be an example of such errors with this unprotected pattern
sensitive network interface.  Should SCSI transport be used over DSL, data
corruption could result.  This concerns use of pointer options depending on
16 bit TCP checksums.  Many have agreed TCP urgent pointers are not a viable
means of marking data due to a single urgent pointer variable within TCP and
the resulting coalescence, despite Matt Wakeley, John Hufferd, and Julian
Satran's assurances.  In high bandwidth connections, TCP transmit queues
easily exceed 64k bytes.

The PUSH flag is also not usable as a record mark as indicated in several
RFCs.  The TCP RDMA
http://www.ietf.org/internet-drafts/draft-csapuntz-tcprdma-00.txt
does not provide protection against pointer corruption and, in addition,
there is no information as to a modification to TCP API to support SCSI data
placement over a persistent connection.  How are RIDs shared with the stack?
How are RIDs linked to encapsulated data?  The desire to place SCSI
encapsulated data arriving from TCP segments out of sequence represents
application level processing even for a simple act of placing data into
appropriate application space.  Such out of sequence processing represents a
major departure from TCP to fulfill this desire to minimize memory
utilization.  TCP does not provide for parsing data into SCSI structures
ahead of the application nor does it provide a means of parsing these
structures following missing segments.

Doug


> Michael
>
>      I done the calculations for 100Mbits/sec to 100Gbps for the 6 switch
> topology you describe and
>      50KM distance. The data is below. (I know it's a bit of rock
> fetching but
> what the hell!)
>
>      A couple of points to clear up from your previous email.
>
>      1/ I had allowed 25uS end system processing delay - you
> suggested 4 - 5us.
>          Hardware can process a lot faster than even what you
> suggest but even
> when the
>          hardware has finished processing an incoming header and
> wants to send
>          an ACK in reply, the outbound port can be busy sending a
> maximum frame
> => delay of 12uS.
>
>          Secondly - we only send an ACK for every second frame
> hence another
> 12uS possible.
>
>      2/ Can you give me the source of the error rates you suggest I use.
>
> >the packet drop rate due to either corruption (very low rates should be
> >assumed - at least 10E-9 to 10E-12) or congestion which ranges
> from 1-3% in
> >the Internet (depending upon which set of measurements you go
> by) and will
>
>      The bit error rate on worst case copper at Gbps is 10E-10 =>
> PACKET drop is
>  10e-6 i.e.
>      1 in a million(1500bytes). We've verified similiar (but
> better) rates in
> our Lab's using broadcom phi's.
>
>      3/ The packet drop rate due to congestion - in a LAN
> certainly - can be
> reduced to 0 by
>           turning on flow control. In general flow control can be
> bad as it's
> easy to think
>           of scenarios where lockup happens, but in certain
> instances it can be
> useful to turn
>           it on.
>
>      4/ Vern's numbers w.r.t. corrupt packets on FDDI which were not being
> caught by CRC is interesting.
>           We've ran our 12 port gigabit switches for days at full
> line rate with
>  a random scattering of packet
>           sizes and did not see any internal packet loss or
> corruption. We'ed
> normally run such data
>           corruption tests in flow control mode to allow us
> notice if even one
> packet went missing over days and
>           why it went missing.
>           So in a LAN scenario the maximum loss rate can be
> configured to be
> related to the bit error rate.
>           This is also possible for a MAN environment or rather a disaster
> recovery link at least.
>
>      5/Vern - the formula you had in your email is interesting -
> Can you give me
>  a pointer to its derivation.
>          I would also like to see formulas which had the line
> rate and window
> size as parameters.
>          The formula in your email aims to obtain a theoretical
> max throughput.
> This is interesting. However
>          on a Gigabit line if the error rate is 0 the throughput
> is not infinite
>  , it is 1Gbps.
>                     Also, on a Gbps line if the packet error rate
> is 10E-6 and
> the window is of infinite size then
>          the throughput is not affected much.
>
>      6/ From the data below the astute reader will have noticed:-
>
>           a/ Keeping 10 trunked 100mbit/sec lines saturated
> requires more memory
>  over same distances
>                           using store and forward switching
> elements than 1 Gbps
>  line.
>           b/ Keeping 10 trunked 1Gbps lines saturated requires
> more memory than
> 1x10Gbps line.
>                           But the differences are reducing as the
> delay due to
> the cable becomes more evident.
>           c/Between 10Gbps and 100Gbps the memory requirements compare.
>           d/Cut through switching was interesting at 10Mbps when
> the cut-through
>  time was 30-40uS
>                           compared to 1.2mS store and forward
> delay for max size
>  packet. At 100Mbps it
>                is not interesting - 30-40uS (completely empty
> switch) versus
> 120uS store and forward
>                type delay.
>           e/I've put the switch delay at 0.5 the packet store and
> forward delay.
>  For true switches that
>               attempt to handle congestion cases (small bursts of
> congestion at
> least) then internal
>              switching must be faster than the line rate. However I've not
> looked in detail or measured
>                         any multiport 10G or 100G switches.
>
>      7/ I still don't see the required memory space cost as a big
> issue for
> hardware implementations
>          since h/w can process acks fast and keep window small as
> possible. TOE
> chips that support
>          iSCSI and OTHER protocols at 1Gbps over TCP will have to have
> appropriate memory on board
>                    in any case for those OTHER protocols. It's
> cheaper to have
> one TOE supporting BOTH than
>         TWO TOE adaptoes - one with some fractional smaller
> amount of memory and
>  another fully
>         fledged TOE adaptor for general purpose TCP connections.
>
>      7/My take on seeing the bunch of emails from the list and
> Randall Stewarts
> work leaves me with the
>                     impression that the debate has shifted from a
> MUST - MAY
> debate to a MAY  - DELETE debate.
>          I think that should be the question i.e. should this be
> deleted or left
>  in as a MAY.
>
> Dick Gahan
> 3Com
>
> 0.1  0.1  1    1    10   10   100  100   Protocol bandwidth in Gbps
>
> 1500 1500 1500 1500 1500 1500 1500 1500  Max frame size in bytes
> 120  120  12   12   1.2  1.2  0.12 0.12  time to transmit frame in uS
>
> 0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5   Signal speed on cable w.r.t c
>
> 700  50000     700  50000     700  50000      700  50000
> Total length of
> cable in METERS
> 5    333  5    333  5    333  5    333   Total Delay due to cable in uS
>
> 7    7    7    7    7    7    7    7     Number of forwarding occurences
> 840  840  84   84   8.4  8.4  0.84 0.84  Total delay due to forwarding
> 6    6    6    6    6    6    6    6     Number of switches
> 60   60   6    6    0.6  0.6  0.06 0.06  Switch latency in uS -
> half pkt time
> 360  360  36   36   3.6  3.6  0.36 0.36  total switching latency delay uS
>
> 240  240  24   24   2.4  2.4  0.24 0.24  End system TCP delay  -
> min + a bit =>
> 2 pkts worth
>
> 1445 1773 149  477  19   348  6    335   Total one way delay
> 2889 3547 297  955  38   695  12   670   Round Trip delay uS
> 24   30   25   80   32   580  102  5580  Max Ethernet 1500 byte frames on
> Wire/Window
> 36   44   37   119  48   869  153  8369  Bandwidth/Delay product in KB
>
>           1E-10     1E-10     1E-12      1E-12     1E-12
> 1E-12     Bit error
>  rate per link (10G and 100G is fibre only)
>
> 1.20E-061.20E-061.20E-081.20E-081.20E-081.20E-08Packet Error
>  rate per link for 1500 bytes
>
> 8.40E-068.40E-068.40E-088.40E-088.40E-088.40E-08Packet Error rate
>  end to end due to line loss alone
>
>
>
> PLANET PROJECT will connect millions of people worldwide through
> the combined
> technology of 3Com and the Internet. Find out more and register now at
> http://www.planetproject.com
>
>
>

References:
- RE: Concensus Call on Urgent Pointer.
  - From: "Dick Gahan" <Dick_Gahan@eur.3com.com>

Prev by Date: Re: TCP limitations (was Re: ISCSI: Urgent Flag requirement violates TCP.)
Next by Date: RE: FCIP: RE: iFCP
Prev by thread: RE: Concensus Call on Urgent Pointer.
Next by thread: RE: Concensus Call on Urgent Pointer.
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:06:17 2001
6315 messages in chronological order