SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: Flow Control



    Just to clarify on point [7], I was not referring to TCP flow control,
    but iSCSI command flow control. I thought I saw someone proposing
    a slow start sort of command flow control mechanism.
    
    I am a proponent of using TCP as is without any changes (at least
    in the first phase).
    
    Somesh
    
    > -----Original Message-----
    > From: Douglas Otis [mailto:dotis@sanlight.net]
    > Sent: Monday, October 09, 2000 11:14 AM
    > To: GUPTA,SOMESH (HP-Cupertino,ex1); IPS@ece.cmu.edu
    > Subject: RE: iSCSI: Flow Control
    > 
    > 
    > Somesh,
    > 
    > > Hi all,
    > >
    > > Assuming that we have consensus, especially on [1] below (minimum
    > > connections is 1), I think we should try and resolve the flow
    > > control issue.
    > >
    > > It seems to me that there is sufficient consensus that command
    > > flow control is needed -
    > >
    > > [1]   To enable fastest possible flow of commands given the
    > >       capabilities of the target & initiator, and accomodating
    > >       increased latencies of IP networks
    > 
    > The credit scheme as I have recommended would be carried 
    > within each frame
    > an not just within the response PDU to reduce latency of control.
    > 
    > > [2]   To significantly minimize the queue full condition. And to
    > >       provide a recovery mechanism at the iSCSI level when command
    > >       overflow happens at the target.
    > 
    > The initiator should regulate the number of outstanding 
    > commands.  This
    > regulation will not impact performance at the device level.
    > 
    > > [3] Some of the debate seems to be around whether the 
    > credit mechanism
    > >     should be static or dynamic.
    > 
    > The credit scheme that I have recommended would be dynamic.
    > 
    > > I believe that static is a subset of
    > > dynamic (where you never change the value being advertised). I don't
    > > disagree with Charles when he says that it will take experimentation
    > > to determine how to best adjust the credit dynamically. However,
    > > it is important to provide for it in the protocol so that when a
    > > vendor does figure out how best to adjust the credit, they have a
    > > protocol mechanism to do so. Even though it is an implementation
    > > that provides full rate performance, it is the protocol that
    > > enables it (take TCP window scaling option e.g.).
    > >
    > > [4] Another question that comes up is - Should the credit be per
    > >     connection or per session (multiple connections)?
    > 
    > As the transport's primary function is to provide aggregation 
    > down to the
    > medium, then it would not be either on the connection, nor 
    > the end point as
    > it is now.  It should be at the medium as recommended.
    > 
    > > The current draft does provide for a session wide "flow control"
    > > through MaxCmdRn. I believe that it is better to have flow
    > > control on a per connection basis. This enables each connection
    > > (which might be different NICs) to operate independently of
    > > each other. Having a session wide flow control would cause
    > > sync points in both the initiator and the target.
    > >
    > > Also a smaller field could be used if it is just to indicate
    > > a credit window.
    > 
    > The credit window should not be carried per connection as you 
    > suggest.  The
    > medium is what needs to be controlled.
    > 
    > > [5] The credit should be a "pretty good effort" and not a 
    > "guarantee".
    > >
    > > This allows smart targets to overcommitt as the number of initiators
    > > logged in increases (while reducing the credit available to the
    > > initiators) and increase the credit and reduce overcommittment as
    > > the number of initiators logged in decreases.
    > >
    > > Some mechanism is required to recover from the infrequent case where
    > > command buffers get exhausted and have to be thrown away.
    > 
    > As the credit scheme that I recommended provides the highest 
    > resolution of
    > control as well as implements a reduction acknowledgement, 
    > there should be
    > little reason to toss commands or frames.
    > 
    > > [6] I would recommend that iSCSI provide a way to recover from
    > > command overflow and also maintain ordering.
    > >
    > > The current proposal does not have a drop notification. It has
    > > an ack mechanism (ExpCmdRn). I think for the purpose of drop
    > > notification, it is better to be able to indicate the range of
    > > commands dropped. TCP acks do tell me which commands
    > > reached the target, and command responses tell me which 
    > were processed.
    > >
    > > When a target suffers from command exhaustion, it could behave
    > > in 2 different ways - one is to drop all the commands it receives
    > > till it detects a retransmission. In this case it would send a drop
    > > notification of all commands it receives till it starts receiving
    > > the command from where the drop started.
    > 
    > If the initiator restricts commands, then there would never be a drop
    > requirement.  In addition, such limit on outstanding commands does not
    > represent a practical constraint on performance.
    > 
    > > The other would be to store all the commands it is able to provide
    > > buffers for and provide NAKs for only those that it has dropped.
    > > This would be more efficient.
    > >
    > > In this case, we should also agree on what the semantics of the
    > > processing of the out or order commands are. Should they be
    > > processed only when the gaps are filled? Or can they be processed
    > > in any order?
    > 
    > As TCP does not provide for out of sequence processing, there 
    > is little
    > concern within this transport.  Only when substantial buffers 
    > are remaining,
    > would out of sequence processing become useful.  As these 
    > buffers should be
    > at the device, and as such handling is already defined at the 
    > device, no
    > further definitions are required.
    > 
    > > [7] There was some discussion of whether we should propose a slow
    > > start algorithm or a fast start algorithm.
    > >
    > > I think we should a fast start algorithm at this level. At TCP
    > > level, the slow start algorithm is important because the two
    > > ends are unaware of the state of the network and have to probe it.
    > > At the iSCSI level, the target should be reasonably knowledgable
    > > about the its own state and be able to provide a credit or
    > > reduce/increase it per login as the conditions change (hopefully
    > > with some hysteresis built in).
    > 
    > This is not TCP.  Why use TCP if you wish to modify TCP?  Resist
    > re-engineering TCP. On a LAN, this is not a problem and on a 
    > WAN, this is a
    > required feature of TCP.
    > 
    > > [8] On flow control of immediate data, should we first work out
    > > the command flow control and then turn our efforts to the
    > > data flow control?
    > >
    > > Once we can agree on some of the basic issues, then it should be
    > > relatively easy to work out the credit indication/numbering
    > > details etc.
    > 
    > To adapt to different flow control schemes, the encapsulation 
    > should be a
    > separate documentation from flow control and have flow 
    > control either as a
    > separate control PDU or as a prefix defined within the 
    > flow-control draft.
    > This would remove the load on having one person define 
    > everything and allow
    > the control mechanism to change without damaging 
    > encapsulation.  I would add
    > that service management should also have the same split in documents.
    > 
    > Doug
    > 
    > 
    > >
    > > Somesh
    > >
    > > > -----Original Message-----
    > > > From: Black_David@emc.com [mailto:Black_David@emc.com]
    > > > Sent: Wednesday, October 04, 2000 5:13 PM
    > > > To: ips@ece.cmu.edu
    > > > Subject: iSCSI sessions: Step 2
    > > >
    > > >
    > > > With my WG co-chair hat on, it's time to call
    > > > consensus on some of this ...
    > > >
    > > > Late last week, I sent the "Let's try again" message
    > > > on iSCSI sessions, and since then I've only seen
    > > > one thread of comments to it from a combination of
    > > > Matt Wakeley and Doug Otis.  The important content
    > > > of that thread is Matt renewing his position that
    > > > more than one connection ought to be REQUIRED.  Lest
    > > > this seem like annoyance, Matt deserves credit for
    > > > being patient with the WG's indirect progress towards
    > > > consensus that made it necessary for him to renew his
    > > > objection on multiple occasions.  As I read Matt's
    > > > email, it looks like a good flow control solution
    > > > for the single TCP connection iSCSI session case
    > > > might satisfy him, but the flow control discussion
    > > > is still ongoing.
    > > >
    > > > In any case, I am stating the following two items
    > > > as WG rough consensus, over Matt's renewed objection
    > > > in the first case:
    > > >
    > > > [1] Multiple TCP connections per iSCSI session
    > > > 	remain OPTIONAL.
    > > > [2] Multiple TCP connections per iSCSI session
    > > > 	will be specified as part of the base
    > > > 	iSCSI protocol.
    > > >
    > > > Given that it's two months after the Pittsburgh meeting
    > > > I hope the rough consensus will hold on these items;
    > > > anyone other than Matt should object to me directly,
    > > > if necessary, I'll (reluctantly) reopen these issues
    > > > one more time (yes, this is a hint).
    > > >
    > > > Moving on to the topic of models for multiple connection
    > > > sessions, let me start by trying to winnow the approaches
    > > > to Asymmetric sessions before taking up Asymmetric vs.
    > > > Symmetric again.  Four approaches to Asymmetric sessions
    > > > have been discussed.  I have not seen anyone other than
    > > > Pierre Labat support his Balanced model in which a single
    > > > stream of control moves from TCP connection to TCP connection
    > > > within a session. Therefore I believe it is the WG
    > > > rough consensus that:
    > > >
    > > > [3] The Balanced Asymmetric model in which a single
    > > > 	control stream moves from TCP connection to TCP
    > > > 	connection in an iSCSI session will not be pursued.
    > > >
    > > > Similarly, I saw no objections to the note at the end of
    > > > Julian's email, indicating that the Collapsed Asymmetric
    > > > model in which data is allowed on the command connection
    > > > even when there are multiple TCP connections in an iSCSI
    > > > session is technically inferior to both the Pure Asymmetric
    > > > and Symmetric models. Therefore I believe it is the WG
    > > > rough consensus that:
    > > >
    > > > [4] The Collapsed Asymmetric model in which data is allowed
    > > > 	on the command connection in multiple connection
    > > > 	iSCSI sessions will not be pursued.
    > > >
    > > > The Pure Asymmetric model was originally described as
    > > > requiring two TCP connections per session.  Kalman Meth
    > > > proposed a modification to it that allowed it to use a
    > > > single connection for both command and data.  Between
    > > > Kalman being the originator of the Pure Asymmetric model,
    > > > lack of objection to his proposal, and rough consensus [2]
    > > > above, I believe it to be the WG rough consensus that:
    > > >
    > > > [5] The Pure Asymmetric model will only be considered
    > > > 	in the modified form that allows an iSCSI session
    > > > 	to contain a single TCP connection on which both
    > > > 	command and data flow.
    > > >
    > > > If all five of the above consensuses (consensii?) hold,
    > > > that would be serious progress.  Objections to these
    > > > should be sent to the list, except that I would ask
    > > > Pierre Labat not to object to [3] in the absence of
    > > > other objections to it.
    > > >
    > > > Now comes the hard part - Symmetric vs. modified
    > > > Pure Symmetric (modified by [5] above).  There are
    > > > over 1000 email messages in my mailbox for the ips
    > > > mailing list for the past two months, and I freely
    > > > admit to not having reviewed them in detail.  I suggested
    > > > in the "Let's try again" email that more weight should
    > > > be given to those working on implementations, especially
    > > > hardware, and have not seen any objections to that
    > > > suggestion.  My impression is that the opinion of such
    > > > people has been in favor of the Symmetric model -
    > > > Matt Wakeley (Agilent), and Somesh Gupta (HP) come
    > > > to mind as examples.  I'm not confident that this is
    > > > the WG consensus, but it appears to me that the
    > > > WG is headed in that direction.  Please comment on
    > > > this - the absence of comments/objections will be
    > > > taken as a sign of agreement.
    > > >
    > > > There has been no comment on the error recovery issue
    > > > since my email.  Given this and the prior statements that
    > > > TCP solves many of the tape error scenarios that are motivating
    > > > FCP error recovery, I think the authors of the next version
    > > > of the iSCSI draft are entitled to use their best technical
    > > > judgement in determining how much error recovery to specify
    > > > across multiple TCP connections in an iSCSI session, and
    > > > the WG will review it when the next version of the draft
    > > > appears.
    > > >
    > > > We might be getting close to the end of the session issues.
    > > > Carefully considered comments are encouraged, but I'd ask
    > > > everyone to consider their comments carefully before sending
    > > > them, given our past experiences with this set of issues.
    > > >
    > > > Thanks,
    > > > --David
    > > >
    > > > ---------------------------------------------------
    > > > David L. Black, Senior Technologist
    > > > EMC Corporation, 42 South St., Hopkinton, MA  01748
    > > > +1 (508) 435-1000 x75140     FAX: +1 (508) 497-8500
    > > > black_david@emc.com       Mobile: +1 (978) 394-7754
    > > > ---------------------------------------------------
    > > >
    > >
    > 
    


Home

Last updated: Tue Sep 04 01:06:45 2001
6315 messages in chronological order