SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: Flow Control



    
    
    Somesh,
    
    I agree that memory at NICs could be a problem. As you mention mostly
    targets, I wonder if there
    are no simple solutions that greatly alleviate the need for memory at
    target adapters.
    I assume that you are talking about simple cheap boxes - for the large
    boxes several 10s of MB
    are just a fraction of their caches!
    
    On the other hand on host adapters memory comes more at a premium and there
    are no simple solutions.
    
    I still fail to see what kind of command flow control can alleviate the
    data flow problem.
    
    Regards,
    Julo
    
    
    
    "GUPTA,SOMESH (HP-Cupertino,ex1)" <somesh_gupta@am.exch.hp.com> on
    09/10/2000 17:37:42
    
    Please respond to "GUPTA,SOMESH (HP-Cupertino,ex1)"
          <somesh_gupta@am.exch.hp.com>
    
    To:   Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu
    cc:
    Subject:  RE: iSCSI: Flow Control
    
    
    
    
    Julian,
    
    comments below.
    
    > -----Original Message-----
    > From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com]
    > Sent: Monday, October 09, 2000 2:38 AM
    > To: ips@ece.cmu.edu
    > Subject: RE: iSCSI: Flow Control
    >
    >
    >
    >
    > Somesh,
    >
    > I kept quiet on this - but as it risks to get unnecessarily
    > complex IMHO I
    > can't anymore.
    >
    > I am not altogether convinced that there is a consensus on
    > flow control.
    > Let us reiterate the reasons for wanting command flow control:
    >
    > - for long latency pipes you want the to ship  commands and
    > data ahead of
    > time to keep the pipes full
    > - but you want also to avoid the command queueing situation
    > in which you
    > can be forced to drop commands and refill the queue.
    > - you want to keep all devices as busy as possible
    >
    > The last item as well as the whole SCSI queuing issue is best
    > taken care at
    > the SCSI layer - as
    > it is the only one that might need to keep per-LU-state.
    >
    > For the first two items - excepts for some artifacts - observe that
    > commands are not a significant
    > consumer of either bandwidth or target resources. A high
    > number of commands
    > in transit
    > will readily keep the pipes full if they are followed by data
    > and pose no
    > strain on a target
    > where they can be queued at the iSCSI layer.
    
    It sort of depends on the implementation model whether this is an
    issue or not. The aspect of the implementation that has the most
    impact in this area is whether the adapter provides data buffering or
    not. If the adapter does provide data buffering to the tune of
    window size (or in the range), then yes, it is not an issue.
    However, this has its own set of problems including cost.
    A solution that depends on NIC memory will be at a disadvantage
    compared to FC and parallel SCSI.
    
    In the adapters are not providing buffering, and assuming that
    commands and buffers use seperate memory, the target would have to
    post command buffers and data buffers to the NIC considering somewhat
    the worst case - and on every connection (not accounting for the
    worst case but some fraction - after all every connection cannot run
    at full speed at the same time). And the target may have multiple
    adapters. The targets could ultimately even be disk drives.
    
    What flow control is doing is enabling the target to be in control
    of the flow between the initiator and the target. In a bad way, it
    provides the full benefit of the TCP window only when the target
    is ready and able to source/sink data at that rate - both sides
    knowing where the data is going.
    
    >
    > Data will be flow-controlled by the target limits for
    > immediate data and
    > the TCP windows
    > and by simple conservative ordering rules we can avoid both
    > deadlock and
    > throwing away data.
    >
    > What you are suggesting us to look into - flow controlling
    > per connection -
    > is - I am afraid
    > not adding to much.
    
    It was never my goal to make a fundamental contribution :-) and
    I won't mind throwing it out if it can be shown that it is not
    needed when iSCSI adapters do not have memory.
    >
    > And last - but not least - if you implement sessions with one
    > connection -
    > and use multiple sessions
    > you can flow control every connection but then you have to add a wedge
    > driver to do load
    > distribution.
    
    Again this statement perhaps has implmentation assumptions built in.
    Consider e.g. multiple "pull iSCSI NICs" on the initiator. If there is flow
    control per connection, the host can distribute SCSI commands across the
    NICs (assuming each handles one connection to the target) as the SCSI
    command layer generates the commands and then have no further interaction
    with the adapters on sending the commands/associated data till the
    command completion is received. If the flow control is per session,
    then what will happen is that a session wide value of maxcmdRn is received
    on a single NIC (different values will be received on different NICs).
    To ensure that all NICs follow appropriate behavior based on this value
    will require either communicating this value to all the NICs (those
    blocked will need it), or the host holding back the command posting
    beyond maxcmdRn and posting them only when the window opens up.
    
    For the target the problem is (assuming command flow control is needed),
    that it does not have to coordinate buffer availability across multiple
    NICs which is a good thing.
    
    >
    > Regards,
    > Julo
    
    Somesh
    >
    >
    >
    > "GUPTA,SOMESH (HP-Cupertino,ex1)" <somesh_gupta@am.exch.hp.com> on
    > 09/10/2000 03:36:46
    >
    > Please respond to "GUPTA,SOMESH (HP-Cupertino,ex1)"
    >       <somesh_gupta@am.exch.hp.com>
    >
    > To:   IPS@ece.cmu.edu
    > cc:    (bcc: Julian Satran/Haifa/IBM)
    > Subject:  RE: iSCSI: Flow Control
    >
    >
    >
    >
    > Hi all,
    >
    > Assuming that we have consensus, especially on [1] below (minimum
    > connections is 1), I think we should try and resolve the flow
    > control issue.
    >
    > It seems to me that there is sufficient consensus that command
    > flow control is needed -
    >
    > [1]   To enable fastest possible flow of commands given the
    >       capabilities of the target & initiator, and accomodating
    >       increased latencies of IP networks
    >
    > [2]   To significantly minimize the queue full condition. And to
    >       provide a recovery mechanism at the iSCSI level when command
    >       overflow happens at the target.
    >
    > [3] Some of the debate seems to be around whether the credit mechanism
    >     should be static or dynamic.
    >
    > I believe that static is a subset of
    > dynamic (where you never change the value being advertised). I don't
    > disagree with Charles when he says that it will take experimentation
    > to determine how to best adjust the credit dynamically. However,
    > it is important to provide for it in the protocol so that when a
    > vendor does figure out how best to adjust the credit, they have a
    > protocol mechanism to do so. Even though it is an implementation
    > that provides full rate performance, it is the protocol that
    > enables it (take TCP window scaling option e.g.).
    >
    > [4] Another question that comes up is - Should the credit be per
    >     connection or per session (multiple connections)?
    >
    > The current draft does provide for a session wide "flow control"
    > through MaxCmdRn. I believe that it is better to have flow
    > control on a per connection basis. This enables each connection
    > (which might be different NICs) to operate independently of
    > each other. Having a session wide flow control would cause
    > sync points in both the initiator and the target.
    >
    > Also a smaller field could be used if it is just to indicate
    > a credit window.
    >
    > [5] The credit should be a "pretty good effort" and not a "guarantee".
    >
    > This allows smart targets to overcommitt as the number of initiators
    > logged in increases (while reducing the credit available to the
    > initiators) and increase the credit and reduce overcommittment as
    > the number of initiators logged in decreases.
    >
    > Some mechanism is required to recover from the infrequent case where
    > command buffers get exhausted and have to be thrown away.
    >
    > [6] I would recommend that iSCSI provide a way to recover from
    > command overflow and also maintain ordering.
    >
    > The current proposal does not have a drop notification. It has
    > an ack mechanism (ExpCmdRn). I think for the purpose of drop
    > notification, it is better to be able to indicate the range of
    > commands dropped. TCP acks do tell me which commands
    > reached the target, and command responses tell me which were
    > processed.
    >
    > When a target suffers from command exhaustion, it could behave
    > in 2 different ways - one is to drop all the commands it receives
    > till it detects a retransmission. In this case it would send a drop
    > notification of all commands it receives till it starts receiving
    > the command from where the drop started.
    >
    > The other would be to store all the commands it is able to provide
    > buffers for and provide NAKs for only those that it has dropped.
    > This would be more efficient.
    >
    > In this case, we should also agree on what the semantics of the
    > processing of the out or order commands are. Should they be
    > processed only when the gaps are filled? Or can they be processed
    > in any order?
    >
    > [7] There was some discussion of whether we should propose a slow
    > start algorithm or a fast start algorithm.
    >
    > I think we should a fast start algorithm at this level. At TCP
    > level, the slow start algorithm is important because the two
    > ends are unaware of the state of the network and have to probe it.
    > At the iSCSI level, the target should be reasonably knowledgable
    > about the its own state and be able to provide a credit or
    > reduce/increase it per login as the conditions change (hopefully
    > with some hysteresis built in).
    >
    > [8] On flow control of immediate data, should we first work out
    > the command flow control and then turn our efforts to the
    > data flow control?
    >
    > Once we can agree on some of the basic issues, then it should be
    > relatively easy to work out the credit indication/numbering
    > details etc.
    >
    > Somesh
    >
    > > -----Original Message-----
    > > From: Black_David@emc.com [mailto:Black_David@emc.com]
    > > Sent: Wednesday, October 04, 2000 5:13 PM
    > > To: ips@ece.cmu.edu
    > > Subject: iSCSI sessions: Step 2
    > >
    > >
    > > With my WG co-chair hat on, it's time to call
    > > consensus on some of this ...
    > >
    > > Late last week, I sent the "Let's try again" message
    > > on iSCSI sessions, and since then I've only seen
    > > one thread of comments to it from a combination of
    > > Matt Wakeley and Doug Otis.  The important content
    > > of that thread is Matt renewing his position that
    > > more than one connection ought to be REQUIRED.  Lest
    > > this seem like annoyance, Matt deserves credit for
    > > being patient with the WG's indirect progress towards
    > > consensus that made it necessary for him to renew his
    > > objection on multiple occasions.  As I read Matt's
    > > email, it looks like a good flow control solution
    > > for the single TCP connection iSCSI session case
    > > might satisfy him, but the flow control discussion
    > > is still ongoing.
    > >
    > > In any case, I am stating the following two items
    > > as WG rough consensus, over Matt's renewed objection
    > > in the first case:
    > >
    > > [1] Multiple TCP connections per iSCSI session
    > >    remain OPTIONAL.
    > > [2] Multiple TCP connections per iSCSI session
    > >    will be specified as part of the base
    > >    iSCSI protocol.
    > >
    > > Given that it's two months after the Pittsburgh meeting
    > > I hope the rough consensus will hold on these items;
    > > anyone other than Matt should object to me directly,
    > > if necessary, I'll (reluctantly) reopen these issues
    > > one more time (yes, this is a hint).
    > >
    > > Moving on to the topic of models for multiple connection
    > > sessions, let me start by trying to winnow the approaches
    > > to Asymmetric sessions before taking up Asymmetric vs.
    > > Symmetric again.  Four approaches to Asymmetric sessions
    > > have been discussed.  I have not seen anyone other than
    > > Pierre Labat support his Balanced model in which a single
    > > stream of control moves from TCP connection to TCP connection
    > > within a session. Therefore I believe it is the WG
    > > rough consensus that:
    > >
    > > [3] The Balanced Asymmetric model in which a single
    > >    control stream moves from TCP connection to TCP
    > >    connection in an iSCSI session will not be pursued.
    > >
    > > Similarly, I saw no objections to the note at the end of
    > > Julian's email, indicating that the Collapsed Asymmetric
    > > model in which data is allowed on the command connection
    > > even when there are multiple TCP connections in an iSCSI
    > > session is technically inferior to both the Pure Asymmetric
    > > and Symmetric models. Therefore I believe it is the WG
    > > rough consensus that:
    > >
    > > [4] The Collapsed Asymmetric model in which data is allowed
    > >    on the command connection in multiple connection
    > >    iSCSI sessions will not be pursued.
    > >
    > > The Pure Asymmetric model was originally described as
    > > requiring two TCP connections per session.  Kalman Meth
    > > proposed a modification to it that allowed it to use a
    > > single connection for both command and data.  Between
    > > Kalman being the originator of the Pure Asymmetric model,
    > > lack of objection to his proposal, and rough consensus [2]
    > > above, I believe it to be the WG rough consensus that:
    > >
    > > [5] The Pure Asymmetric model will only be considered
    > >    in the modified form that allows an iSCSI session
    > >    to contain a single TCP connection on which both
    > >    command and data flow.
    > >
    > > If all five of the above consensuses (consensii?) hold,
    > > that would be serious progress.  Objections to these
    > > should be sent to the list, except that I would ask
    > > Pierre Labat not to object to [3] in the absence of
    > > other objections to it.
    > >
    > > Now comes the hard part - Symmetric vs. modified
    > > Pure Symmetric (modified by [5] above).  There are
    > > over 1000 email messages in my mailbox for the ips
    > > mailing list for the past two months, and I freely
    > > admit to not having reviewed them in detail.  I suggested
    > > in the "Let's try again" email that more weight should
    > > be given to those working on implementations, especially
    > > hardware, and have not seen any objections to that
    > > suggestion.  My impression is that the opinion of such
    > > people has been in favor of the Symmetric model -
    > > Matt Wakeley (Agilent), and Somesh Gupta (HP) come
    > > to mind as examples.  I'm not confident that this is
    > > the WG consensus, but it appears to me that the
    > > WG is headed in that direction.  Please comment on
    > > this - the absence of comments/objections will be
    > > taken as a sign of agreement.
    > >
    > > There has been no comment on the error recovery issue
    > > since my email.  Given this and the prior statements that
    > > TCP solves many of the tape error scenarios that are motivating
    > > FCP error recovery, I think the authors of the next version
    > > of the iSCSI draft are entitled to use their best technical
    > > judgement in determining how much error recovery to specify
    > > across multiple TCP connections in an iSCSI session, and
    > > the WG will review it when the next version of the draft
    > > appears.
    > >
    > > We might be getting close to the end of the session issues.
    > > Carefully considered comments are encouraged, but I'd ask
    > > everyone to consider their comments carefully before sending
    > > them, given our past experiences with this set of issues.
    > >
    > > Thanks,
    > > --David
    > >
    > > ---------------------------------------------------
    > > David L. Black, Senior Technologist
    > > EMC Corporation, 42 South St., Hopkinton, MA  01748
    > > +1 (508) 435-1000 x75140     FAX: +1 (508) 497-8500
    > > black_david@emc.com       Mobile: +1 (978) 394-7754
    > > ---------------------------------------------------
    > >
    >
    >
    >
    
    
    
    


Home

Last updated: Tue Sep 04 01:06:45 2001
6315 messages in chronological order