SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: Flow Control



    John,
    
    I really meant to refer to packet buffer memory. The NICs will
    always need memory for states and also some minimum additional
    memory.
    
    Somesh
    
    > -----Original Message-----
    > From: John Hufferd/San Jose/IBM [mailto:hufferd@us.ibm.com]
    > Sent: Monday, October 09, 2000 9:27 AM
    > To: ips@ece.cmu.edu
    > Subject: RE: iSCSI: Flow Control
    > 
    > 
    > 
    > Somesh Gupta,
    > I have been hearing from a number of high performance NIC 
    > vendors that they
    > expect to use, of course some memory on the NIC, but the 
    > major amounts of
    > memory will be located in the System's normal Processors 
    > memory.  They have
    > told me, this is not a real problem, because with a 
    > reasonable amount of
    > NIC memory, and also by using the Processor Memory as needed, 
    > they do not
    > think that they have a significant problem.  (Now most of 
    > these vendors are
    > trying to do various types of optimizations, and 
    > accelerations including
    > DMA directly into the target processor's memory.)
    > 
    > Now, I am not in the NIC business, so what I am doing is 
    > reflecting what I
    > have been told.
    > 
    > .
    > .
    > .
    > John L. Hufferd
    > Senior Technical Staff Member (STSM)
    > IBM/SSG San Jose Ca
    > (408) 256-0403, Tie: 276-0403
    > Internet address: hufferd@us.ibm.com
    > 
    > 
    > "GUPTA,SOMESH (HP-Cupertino,ex1)" 
    > <somesh_gupta@am.exch.hp.com>@ece.cmu.edu
    > on 10/09/2000 07:37:42 AM
    > 
    > Sent by:  owner-ips@ece.cmu.edu
    > 
    > 
    > To:   Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu
    > cc:
    > Subject:  RE: iSCSI: Flow Control
    > 
    > 
    > 
    > Julian,
    > 
    > comments below.
    > 
    > > -----Original Message-----
    > > From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com]
    > > Sent: Monday, October 09, 2000 2:38 AM
    > > To: ips@ece.cmu.edu
    > > Subject: RE: iSCSI: Flow Control
    > >
    > >
    > >
    > >
    > > Somesh,
    > >
    > > I kept quiet on this - but as it risks to get unnecessarily
    > > complex IMHO I
    > > can't anymore.
    > >
    > > I am not altogether convinced that there is a consensus on
    > > flow control.
    > > Let us reiterate the reasons for wanting command flow control:
    > >
    > > - for long latency pipes you want the to ship  commands and
    > > data ahead of
    > > time to keep the pipes full
    > > - but you want also to avoid the command queueing situation
    > > in which you
    > > can be forced to drop commands and refill the queue.
    > > - you want to keep all devices as busy as possible
    > >
    > > The last item as well as the whole SCSI queuing issue is best
    > > taken care at
    > > the SCSI layer - as
    > > it is the only one that might need to keep per-LU-state.
    > >
    > > For the first two items - excepts for some artifacts - observe that
    > > commands are not a significant
    > > consumer of either bandwidth or target resources. A high
    > > number of commands
    > > in transit
    > > will readily keep the pipes full if they are followed by data
    > > and pose no
    > > strain on a target
    > > where they can be queued at the iSCSI layer.
    > 
    > It sort of depends on the implementation model whether this is an
    > issue or not. The aspect of the implementation that has the most
    > impact in this area is whether the adapter provides data buffering or
    > not. If the adapter does provide data buffering to the tune of
    > window size (or in the range), then yes, it is not an issue.
    > However, this has its own set of problems including cost.
    > A solution that depends on NIC memory will be at a disadvantage
    > compared to FC and parallel SCSI.
    > 
    > In the adapters are not providing buffering, and assuming that
    > commands and buffers use seperate memory, the target would have to
    > post command buffers and data buffers to the NIC considering somewhat
    > the worst case - and on every connection (not accounting for the
    > worst case but some fraction - after all every connection cannot run
    > at full speed at the same time). And the target may have multiple
    > adapters. The targets could ultimately even be disk drives.
    > 
    > What flow control is doing is enabling the target to be in control
    > of the flow between the initiator and the target. In a bad way, it
    > provides the full benefit of the TCP window only when the target
    > is ready and able to source/sink data at that rate - both sides
    > knowing where the data is going.
    > 
    > >
    > > Data will be flow-controlled by the target limits for
    > > immediate data and
    > > the TCP windows
    > > and by simple conservative ordering rules we can avoid both
    > > deadlock and
    > > throwing away data.
    > >
    > > What you are suggesting us to look into - flow controlling
    > > per connection -
    > > is - I am afraid
    > > not adding to much.
    > 
    > It was never my goal to make a fundamental contribution :-) and
    > I won't mind throwing it out if it can be shown that it is not
    > needed when iSCSI adapters do not have memory.
    > >
    > > And last - but not least - if you implement sessions with one
    > > connection -
    > > and use multiple sessions
    > > you can flow control every connection but then you have to 
    > add a wedge
    > > driver to do load
    > > distribution.
    > 
    > Again this statement perhaps has implmentation assumptions built in.
    > Consider e.g. multiple "pull iSCSI NICs" on the initiator. If 
    > there is flow
    > control per connection, the host can distribute SCSI commands 
    > across the
    > NICs (assuming each handles one connection to the target) as the SCSI
    > command layer generates the commands and then have no further 
    > interaction
    > with the adapters on sending the commands/associated data till the
    > command completion is received. If the flow control is per session,
    > then what will happen is that a session wide value of 
    > maxcmdRn is received
    > on a single NIC (different values will be received on different NICs).
    > To ensure that all NICs follow appropriate behavior based on 
    > this value
    > will require either communicating this value to all the NICs (those
    > blocked will need it), or the host holding back the command posting
    > beyond maxcmdRn and posting them only when the window opens up.
    > 
    > For the target the problem is (assuming command flow control 
    > is needed),
    > that it does not have to coordinate buffer availability 
    > across multiple
    > NICs which is a good thing.
    > 
    > >
    > > Regards,
    > > Julo
    > 
    > Somesh
    > >
    > >
    > >
    > > "GUPTA,SOMESH (HP-Cupertino,ex1)" <somesh_gupta@am.exch.hp.com> on
    > > 09/10/2000 03:36:46
    > >
    > > Please respond to "GUPTA,SOMESH (HP-Cupertino,ex1)"
    > >       <somesh_gupta@am.exch.hp.com>
    > >
    > > To:   IPS@ece.cmu.edu
    > > cc:    (bcc: Julian Satran/Haifa/IBM)
    > > Subject:  RE: iSCSI: Flow Control
    > >
    > >
    > >
    > >
    > > Hi all,
    > >
    > > Assuming that we have consensus, especially on [1] below (minimum
    > > connections is 1), I think we should try and resolve the flow
    > > control issue.
    > >
    > > It seems to me that there is sufficient consensus that command
    > > flow control is needed -
    > >
    > > [1]   To enable fastest possible flow of commands given the
    > >       capabilities of the target & initiator, and accomodating
    > >       increased latencies of IP networks
    > >
    > > [2]   To significantly minimize the queue full condition. And to
    > >       provide a recovery mechanism at the iSCSI level when command
    > >       overflow happens at the target.
    > >
    > > [3] Some of the debate seems to be around whether the 
    > credit mechanism
    > >     should be static or dynamic.
    > >
    > > I believe that static is a subset of
    > > dynamic (where you never change the value being advertised). I don't
    > > disagree with Charles when he says that it will take experimentation
    > > to determine how to best adjust the credit dynamically. However,
    > > it is important to provide for it in the protocol so that when a
    > > vendor does figure out how best to adjust the credit, they have a
    > > protocol mechanism to do so. Even though it is an implementation
    > > that provides full rate performance, it is the protocol that
    > > enables it (take TCP window scaling option e.g.).
    > >
    > > [4] Another question that comes up is - Should the credit be per
    > >     connection or per session (multiple connections)?
    > >
    > > The current draft does provide for a session wide "flow control"
    > > through MaxCmdRn. I believe that it is better to have flow
    > > control on a per connection basis. This enables each connection
    > > (which might be different NICs) to operate independently of
    > > each other. Having a session wide flow control would cause
    > > sync points in both the initiator and the target.
    > >
    > > Also a smaller field could be used if it is just to indicate
    > > a credit window.
    > >
    > > [5] The credit should be a "pretty good effort" and not a 
    > "guarantee".
    > >
    > > This allows smart targets to overcommitt as the number of initiators
    > > logged in increases (while reducing the credit available to the
    > > initiators) and increase the credit and reduce overcommittment as
    > > the number of initiators logged in decreases.
    > >
    > > Some mechanism is required to recover from the infrequent case where
    > > command buffers get exhausted and have to be thrown away.
    > >
    > > [6] I would recommend that iSCSI provide a way to recover from
    > > command overflow and also maintain ordering.
    > >
    > > The current proposal does not have a drop notification. It has
    > > an ack mechanism (ExpCmdRn). I think for the purpose of drop
    > > notification, it is better to be able to indicate the range of
    > > commands dropped. TCP acks do tell me which commands
    > > reached the target, and command responses tell me which were
    > > processed.
    > >
    > > When a target suffers from command exhaustion, it could behave
    > > in 2 different ways - one is to drop all the commands it receives
    > > till it detects a retransmission. In this case it would send a drop
    > > notification of all commands it receives till it starts receiving
    > > the command from where the drop started.
    > >
    > > The other would be to store all the commands it is able to provide
    > > buffers for and provide NAKs for only those that it has dropped.
    > > This would be more efficient.
    > >
    > > In this case, we should also agree on what the semantics of the
    > > processing of the out or order commands are. Should they be
    > > processed only when the gaps are filled? Or can they be processed
    > > in any order?
    > >
    > > [7] There was some discussion of whether we should propose a slow
    > > start algorithm or a fast start algorithm.
    > >
    > > I think we should a fast start algorithm at this level. At TCP
    > > level, the slow start algorithm is important because the two
    > > ends are unaware of the state of the network and have to probe it.
    > > At the iSCSI level, the target should be reasonably knowledgable
    > > about the its own state and be able to provide a credit or
    > > reduce/increase it per login as the conditions change (hopefully
    > > with some hysteresis built in).
    > >
    > > [8] On flow control of immediate data, should we first work out
    > > the command flow control and then turn our efforts to the
    > > data flow control?
    > >
    > > Once we can agree on some of the basic issues, then it should be
    > > relatively easy to work out the credit indication/numbering
    > > details etc.
    > >
    > > Somesh
    > >
    > > > -----Original Message-----
    > > > From: Black_David@emc.com [mailto:Black_David@emc.com]
    > > > Sent: Wednesday, October 04, 2000 5:13 PM
    > > > To: ips@ece.cmu.edu
    > > > Subject: iSCSI sessions: Step 2
    > > >
    > > >
    > > > With my WG co-chair hat on, it's time to call
    > > > consensus on some of this ...
    > > >
    > > > Late last week, I sent the "Let's try again" message
    > > > on iSCSI sessions, and since then I've only seen
    > > > one thread of comments to it from a combination of
    > > > Matt Wakeley and Doug Otis.  The important content
    > > > of that thread is Matt renewing his position that
    > > > more than one connection ought to be REQUIRED.  Lest
    > > > this seem like annoyance, Matt deserves credit for
    > > > being patient with the WG's indirect progress towards
    > > > consensus that made it necessary for him to renew his
    > > > objection on multiple occasions.  As I read Matt's
    > > > email, it looks like a good flow control solution
    > > > for the single TCP connection iSCSI session case
    > > > might satisfy him, but the flow control discussion
    > > > is still ongoing.
    > > >
    > > > In any case, I am stating the following two items
    > > > as WG rough consensus, over Matt's renewed objection
    > > > in the first case:
    > > >
    > > > [1] Multiple TCP connections per iSCSI session
    > > >    remain OPTIONAL.
    > > > [2] Multiple TCP connections per iSCSI session
    > > >    will be specified as part of the base
    > > >    iSCSI protocol.
    > > >
    > > > Given that it's two months after the Pittsburgh meeting
    > > > I hope the rough consensus will hold on these items;
    > > > anyone other than Matt should object to me directly,
    > > > if necessary, I'll (reluctantly) reopen these issues
    > > > one more time (yes, this is a hint).
    > > >
    > > > Moving on to the topic of models for multiple connection
    > > > sessions, let me start by trying to winnow the approaches
    > > > to Asymmetric sessions before taking up Asymmetric vs.
    > > > Symmetric again.  Four approaches to Asymmetric sessions
    > > > have been discussed.  I have not seen anyone other than
    > > > Pierre Labat support his Balanced model in which a single
    > > > stream of control moves from TCP connection to TCP connection
    > > > within a session. Therefore I believe it is the WG
    > > > rough consensus that:
    > > >
    > > > [3] The Balanced Asymmetric model in which a single
    > > >    control stream moves from TCP connection to TCP
    > > >    connection in an iSCSI session will not be pursued.
    > > >
    > > > Similarly, I saw no objections to the note at the end of
    > > > Julian's email, indicating that the Collapsed Asymmetric
    > > > model in which data is allowed on the command connection
    > > > even when there are multiple TCP connections in an iSCSI
    > > > session is technically inferior to both the Pure Asymmetric
    > > > and Symmetric models. Therefore I believe it is the WG
    > > > rough consensus that:
    > > >
    > > > [4] The Collapsed Asymmetric model in which data is allowed
    > > >    on the command connection in multiple connection
    > > >    iSCSI sessions will not be pursued.
    > > >
    > > > The Pure Asymmetric model was originally described as
    > > > requiring two TCP connections per session.  Kalman Meth
    > > > proposed a modification to it that allowed it to use a
    > > > single connection for both command and data.  Between
    > > > Kalman being the originator of the Pure Asymmetric model,
    > > > lack of objection to his proposal, and rough consensus [2]
    > > > above, I believe it to be the WG rough consensus that:
    > > >
    > > > [5] The Pure Asymmetric model will only be considered
    > > >    in the modified form that allows an iSCSI session
    > > >    to contain a single TCP connection on which both
    > > >    command and data flow.
    > > >
    > > > If all five of the above consensuses (consensii?) hold,
    > > > that would be serious progress.  Objections to these
    > > > should be sent to the list, except that I would ask
    > > > Pierre Labat not to object to [3] in the absence of
    > > > other objections to it.
    > > >
    > > > Now comes the hard part - Symmetric vs. modified
    > > > Pure Symmetric (modified by [5] above).  There are
    > > > over 1000 email messages in my mailbox for the ips
    > > > mailing list for the past two months, and I freely
    > > > admit to not having reviewed them in detail.  I suggested
    > > > in the "Let's try again" email that more weight should
    > > > be given to those working on implementations, especially
    > > > hardware, and have not seen any objections to that
    > > > suggestion.  My impression is that the opinion of such
    > > > people has been in favor of the Symmetric model -
    > > > Matt Wakeley (Agilent), and Somesh Gupta (HP) come
    > > > to mind as examples.  I'm not confident that this is
    > > > the WG consensus, but it appears to me that the
    > > > WG is headed in that direction.  Please comment on
    > > > this - the absence of comments/objections will be
    > > > taken as a sign of agreement.
    > > >
    > > > There has been no comment on the error recovery issue
    > > > since my email.  Given this and the prior statements that
    > > > TCP solves many of the tape error scenarios that are motivating
    > > > FCP error recovery, I think the authors of the next version
    > > > of the iSCSI draft are entitled to use their best technical
    > > > judgement in determining how much error recovery to specify
    > > > across multiple TCP connections in an iSCSI session, and
    > > > the WG will review it when the next version of the draft
    > > > appears.
    > > >
    > > > We might be getting close to the end of the session issues.
    > > > Carefully considered comments are encouraged, but I'd ask
    > > > everyone to consider their comments carefully before sending
    > > > them, given our past experiences with this set of issues.
    > > >
    > > > Thanks,
    > > > --David
    > > >
    > > > ---------------------------------------------------
    > > > David L. Black, Senior Technologist
    > > > EMC Corporation, 42 South St., Hopkinton, MA  01748
    > > > +1 (508) 435-1000 x75140     FAX: +1 (508) 497-8500
    > > > black_david@emc.com       Mobile: +1 (978) 394-7754
    > > > ---------------------------------------------------
    > > >
    > >
    > >
    > >
    > 
    > 
    > 
    


Home

Last updated: Tue Sep 04 01:06:45 2001
6315 messages in chronological order