RE: Performance of iSCSI, FCIP and iFCP

To: "Victor Firoiu" <vfiroiu@nortelnetworks.com>, "IPS Reflector" <ips@ece.cmu.edu>
Subject: RE: Performance of iSCSI, FCIP and iFCP
From: "Somesh Gupta" <someshg@yahoo.com>
Date: Sun, 14 Jan 2001 00:06:10 -0800
Cc: "Franco Travostino" <travos@nortelnetworks.com>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;charset="US-ASCII"
Importance: Normal
In-Reply-To: <3A5BB1BF.27CEED03@nortelnetworks.com>
Reply-To: <someshg@yahoo.com>
Sender: owner-ips@ece.cmu.edu

Franco,

Thanks for getting us to exercise the under-utilized math part of
our brains :-)

I wanted to echo the comments that Stephen made. What I think
you are saying is that

 If a single packet drop occurs no more frequently than the time
 to recover (the rate of the connection which existed before the
 error occured), then a quarter of the bandwidth during the
 recovery period is lost. In the simplistic case where an error
 occurs with just the correct frequency, you loose a quarter of
 the available bandwidth.

 If you have N connections sharing the same bandwidth, then the
 time to recover is reduced to 1/N and since the rate on each
 connection is (1/N), the lost rate is proportional to (1/N)^2.

 A minor quibble with the math. I assume (considering this low
 rate) that the error is likely due to transmission errors which 
 is proportional to the number of bytes sent, the (1/n)^2 is not
 quite true as the number of bytes sent will be larger so errors
 will occor somewhat more frequently as a function of time.

The really interesting equation to me is to look at the lost
oportunity in isolation of the number of connections (i.e.
you can scale the numbers any which way)

D = C/8*I/4 = C^2*RTT^2/(256*M) 

As long as RTTs are small, and/or data rate (C) is not too large,
we don't loose too large a percentage of the available bandwidth.

However, if both become large which is what we face, and keep 
becoming larger, then we keep getting closer and closer to the
1/4 loss rate, unless we keep increasing the number of connections
to compensate.

In a perfect world, you would be able to convince the IETF to
use the product of rate and RTT as a factor to adjust the A and
the M part in AIMD algorithm to achieve the same result. It
seems a little bit of waste of energy to get around the math 
by using multiple connections (BTW - are there any published
papers on experiences with multiple connections for a single
data stream in a high performance environment - would love to
learn from someone else's mistakes).

Somesh


-----Original Message-----
From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
Victor Firoiu
Sent: Tuesday, January 09, 2001 4:50 PM
To: IPS Reflector
Cc: Franco Travostino
Subject: Performance of iSCSI, FCIP and iFCP



Hi,

we would like to add to the recent discussion on IP storage protocols a
performance point of view showing significant differences between the
protocols discussed.  Any comment would be highly appreciated.

Briefly, transporting a set of storage connections between two storage
enclaves using multiple TCP sessions (iSCSI and iFCP) provides
significantly higher aggregate average throughput than transporting
the same set of storage connections over a single TCP session (FCIP),
the difference being proportional to the square of the number of TCP
sessions.  This is due to the way TCP congestion control reacts to
packet losses.

Here we have two questions: why are we talking about packet losses and
why is this difference in throughput.

There are several reasons for packet losses: network congestion,
link errors and network errors.  

Network congestion is pervasive in current IP networks, where the only
way to control congestion is through dropping packets.  Traffic
engineering, admission control and bandwidth reservation are currently
in early stages of definition.  DiffServ supporting QoS infrastructure
will not be widely available in the near future (especially when we
realize that it is not a simple matter of asserting the EF PHB bit as
if it were the old IP ToS; it rather needs network services and
supporting SLA negotiation).  The DiffServ EF PHB RFC is currently
under redefinition.  On the other hand, if such supporting QoS
infrastructure were indeed available and pervasive, today, why would
we need TCP to begin with?

Even in a perfectly engineered network, link errors occur.  If we take
the Fibre Channel objective of 10^-12 Bit Error Rate, for a 10Gb/s
link, this is one error every 100 seconds.

Network errors also occur with significant frequency in IP networks.
Jonathan Stone and Craig Partridge recently reported in Sigcomm 2000
that network errors caught by TCP checksum occur with significant
frequency (between one packet in 1100 and 1 in 32000) and without link
CRC catching it.

For the second question, TCP throughput is impacted by each packet
loss.  Following TCP's congestion control algorithm existent in all
major implementations (Tahoe, Reno, New-Reno, SACK) each packet loss
results in the TCP sender's congestion window being reduced to half of
its current value, and therefore (assuming constant Round Trip Time),
TCP's throughput is halved.  After that, the window increases by
roughly one packet every two Round Trip Times (assuming the popular
Delayed-Acknowledgement algorithm).

The temporary decrease in TCP's rate translates into an amount of data
missing transmission opportunity.  As we show later, for N storage
connections sharing an IP "pipe" of rate E, the amount of data missing
the opportunity to be transmitted due to a packet loss is
D(N) = E^2/(N^2)*RTT^2/(256*M)
in the case of iSCSI and iFCP, and
D(1) = E^2*RTT^2/(256*M) = D(N)*N^2
in the case of FCIP
where
RTT = Round Trip Time
M = packet size

For example, for a set of N=100 connections totaling E=10Gb/s,
RTT=10ms, M=1500B, the data not transmitted in time due to a packet
loss for iSCSI or iFCP is D(N)=2.6MB.  For the same set transported
over one TCP session as in FCIP, the data not sent in time is
D(1)= 26GB, a 10,000 fold increase.

The time interval for TCP to recover its sending rate to its initial
value after a packet loss is I(N)= 0.833seconds in the case of iSCSI
and iFCP, and I(1)=83.3seconds in the case of FCIP.

Observe that in the case of FCIP, the time to recover its rate,
I(1)=83.3s, is of the same order of magnitude as the time between two
packet losses due exclusively to the link Bit Error Rate.  In other
words, a packet losse occurs almost immediately after TCP has
recovered its rate.  This means that FCIP delivers on average about
3/4 of the required 10Gb/s rate, since 1/4 of rate is lost during the
time TCP rate increases linearly from 1/2 to full rate.  (More
precisely, the effective rate is 8.27Gb/s because 1/4 of rate is lost
during 83.3s, and the time between two errors is now 120.825s due to
decreased sending rate).  By comparison, iSCSI or iFCP deliver
approximately 9.99979Gb/s (i.e., lost 1/4 of one TCP full rate of
100Mb/s during 0.833s out of a 100s interval).

In conclusion, from a performance point of view, transporting
storage connections (SCSI or Fibre Channel) on multiple TCP sessions
is much more effective than tunneling through a single TCP
session, and the difference is proportional to the square of the
number of TCP sessions.


The math.

For a TCP session to sustain a rate of C bits/second, the TCP's
congestion window W (measured in number packets) has to be
W=RTT*C/(8*M)
where 
RTT = Round Trip Time in seconds
M = packet size in Bytes

The time needed by the TCP sender to recover from a single packet
loss and have its sending rate reach the previous C value is
I = 2*RTT*W/2 = RTT*W = RTT^2*C/(8*M)

The total amount of data (in Bytes) missing the opportunity to be
transmitted in this time interval I is  
D = C/8*I/4 = C^2*RTT^2/(256*M) 

If we consider a set of N storage connections sharing an IP "pipe" of
rate E, they can be transported in N TCP sessions, as in iSCSI or
iFCP.  Assuming all connections equal, each TCP session sends at a
rate of E/N.  One packet loss impacts only one TCP session, and thus,
the total amount of data missing the opportunity to be transmitted due
to a packet loss is D(N) = E^2/(N^2)*RTT^2/(256*M)

On the other hand, if the same set of N storage connections is
transported in one TCP session, as in FCIP, the total amount of data
losing the opportunity to be transmitted due to a packet loss is 
D(1) = E^2*RTT^2/(256*M) = D(N)*N^2.

For more details on TCP performance see for example:
"Modeling TCP Reno Performance: A Simple Model and its Empirical
Validation." J. Padhye, V. Firoiu, D. Towsley and J. Kurose, IEEE/ACM
Transactions on Networking, April 2000.


Franco Travostino, Victor Firoiu
-----
Content Internetworking Lab, Technology Center
Nortel Networks, Inc.
600 Technology Park
Billerica, MA 01821 USA

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

Follow-Ups:
- Re: Performance of iSCSI, FCIP and iFCP
  - From: "Victor Firoiu" <vfiroiu@nortelnetworks.com>

References:
- Performance of iSCSI, FCIP and iFCP
  - From: "Victor Firoiu" <vfiroiu@nortelnetworks.com>

Prev by Date: RE: iFCP as an IP Storage Work Item
Next by Date: Re: iSCSI : Abort Task Response violates SAM-2.
Prev by thread: Re: Performance of iSCSI, FCIP and iFCP
Next by thread: Re: Performance of iSCSI, FCIP and iFCP
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:05:51 2001
6315 messages in chronological order