RE: iSCSI: Flow Control

To: John Hufferd/San Jose/IBM <hufferd@us.ibm.com>, ips@ece.cmu.edu
Subject: RE: iSCSI: Flow Control
From: "GUPTA,SOMESH (HP-Cupertino,ex1)" <somesh_gupta@am.exch.hp.com>
Date: Mon, 9 Oct 2000 10:41:50 -0600
Content-Type: text/plain;charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu
John,

I really meant to refer to packet buffer memory. The NICs will
always need memory for states and also some minimum additional
memory.

Somesh

> -----Original Message-----
> From: John Hufferd/San Jose/IBM [mailto:hufferd@us.ibm.com]
> Sent: Monday, October 09, 2000 9:27 AM
> To: ips@ece.cmu.edu
> Subject: RE: iSCSI: Flow Control
> 
> 
> 
> Somesh Gupta,
> I have been hearing from a number of high performance NIC 
> vendors that they
> expect to use, of course some memory on the NIC, but the 
> major amounts of
> memory will be located in the System's normal Processors 
> memory.  They have
> told me, this is not a real problem, because with a 
> reasonable amount of
> NIC memory, and also by using the Processor Memory as needed, 
> they do not
> think that they have a significant problem.  (Now most of 
> these vendors are
> trying to do various types of optimizations, and 
> accelerations including
> DMA directly into the target processor's memory.)
> 
> Now, I am not in the NIC business, so what I am doing is 
> reflecting what I
> have been told.
> 
> .
> .
> .
> John L. Hufferd
> Senior Technical Staff Member (STSM)
> IBM/SSG San Jose Ca
> (408) 256-0403, Tie: 276-0403
> Internet address: hufferd@us.ibm.com
> 
> 
> "GUPTA,SOMESH (HP-Cupertino,ex1)" 
> <somesh_gupta@am.exch.hp.com>@ece.cmu.edu
> on 10/09/2000 07:37:42 AM
> 
> Sent by:  owner-ips@ece.cmu.edu
> 
> 
> To:   Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu
> cc:
> Subject:  RE: iSCSI: Flow Control
> 
> 
> 
> Julian,
> 
> comments below.
> 
> > -----Original Message-----
> > From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com]
> > Sent: Monday, October 09, 2000 2:38 AM
> > To: ips@ece.cmu.edu
> > Subject: RE: iSCSI: Flow Control
> >
> >
> >
> >
> > Somesh,
> >
> > I kept quiet on this - but as it risks to get unnecessarily
> > complex IMHO I
> > can't anymore.
> >
> > I am not altogether convinced that there is a consensus on
> > flow control.
> > Let us reiterate the reasons for wanting command flow control:
> >
> > - for long latency pipes you want the to ship  commands and
> > data ahead of
> > time to keep the pipes full
> > - but you want also to avoid the command queueing situation
> > in which you
> > can be forced to drop commands and refill the queue.
> > - you want to keep all devices as busy as possible
> >
> > The last item as well as the whole SCSI queuing issue is best
> > taken care at
> > the SCSI layer - as
> > it is the only one that might need to keep per-LU-state.
> >
> > For the first two items - excepts for some artifacts - observe that
> > commands are not a significant
> > consumer of either bandwidth or target resources. A high
> > number of commands
> > in transit
> > will readily keep the pipes full if they are followed by data
> > and pose no
> > strain on a target
> > where they can be queued at the iSCSI layer.
> 
> It sort of depends on the implementation model whether this is an
> issue or not. The aspect of the implementation that has the most
> impact in this area is whether the adapter provides data buffering or
> not. If the adapter does provide data buffering to the tune of
> window size (or in the range), then yes, it is not an issue.
> However, this has its own set of problems including cost.
> A solution that depends on NIC memory will be at a disadvantage
> compared to FC and parallel SCSI.
> 
> In the adapters are not providing buffering, and assuming that
> commands and buffers use seperate memory, the target would have to
> post command buffers and data buffers to the NIC considering somewhat
> the worst case - and on every connection (not accounting for the
> worst case but some fraction - after all every connection cannot run
> at full speed at the same time). And the target may have multiple
> adapters. The targets could ultimately even be disk drives.
> 
> What flow control is doing is enabling the target to be in control
> of the flow between the initiator and the target. In a bad way, it
> provides the full benefit of the TCP window only when the target
> is ready and able to source/sink data at that rate - both sides
> knowing where the data is going.
> 
> >
> > Data will be flow-controlled by the target limits for
> > immediate data and
> > the TCP windows
> > and by simple conservative ordering rules we can avoid both
> > deadlock and
> > throwing away data.
> >
> > What you are suggesting us to look into - flow controlling
> > per connection -
> > is - I am afraid
> > not adding to much.
> 
> It was never my goal to make a fundamental contribution :-) and
> I won't mind throwing it out if it can be shown that it is not
> needed when iSCSI adapters do not have memory.
> >
> > And last - but not least - if you implement sessions with one
> > connection -
> > and use multiple sessions
> > you can flow control every connection but then you have to 
> add a wedge
> > driver to do load
> > distribution.
> 
> Again this statement perhaps has implmentation assumptions built in.
> Consider e.g. multiple "pull iSCSI NICs" on the initiator. If 
> there is flow
> control per connection, the host can distribute SCSI commands 
> across the
> NICs (assuming each handles one connection to the target) as the SCSI
> command layer generates the commands and then have no further 
> interaction
> with the adapters on sending the commands/associated data till the
> command completion is received. If the flow control is per session,
> then what will happen is that a session wide value of 
> maxcmdRn is received
> on a single NIC (different values will be received on different NICs).
> To ensure that all NICs follow appropriate behavior based on 
> this value
> will require either communicating this value to all the NICs (those
> blocked will need it), or the host holding back the command posting
> beyond maxcmdRn and posting them only when the window opens up.
> 
> For the target the problem is (assuming command flow control 
> is needed),
> that it does not have to coordinate buffer availability 
> across multiple
> NICs which is a good thing.
> 
> >
> > Regards,
> > Julo
> 
> Somesh
> >
> >
> >
> > "GUPTA,SOMESH (HP-Cupertino,ex1)" <somesh_gupta@am.exch.hp.com> on
> > 09/10/2000 03:36:46
> >
> > Please respond to "GUPTA,SOMESH (HP-Cupertino,ex1)"
> >       <somesh_gupta@am.exch.hp.com>
> >
> > To:   IPS@ece.cmu.edu
> > cc:    (bcc: Julian Satran/Haifa/IBM)
> > Subject:  RE: iSCSI: Flow Control
> >
> >
> >
> >
> > Hi all,
> >
> > Assuming that we have consensus, especially on [1] below (minimum
> > connections is 1), I think we should try and resolve the flow
> > control issue.
> >
> > It seems to me that there is sufficient consensus that command
> > flow control is needed -
> >
> > [1]   To enable fastest possible flow of commands given the
> >       capabilities of the target & initiator, and accomodating
> >       increased latencies of IP networks
> >
> > [2]   To significantly minimize the queue full condition. And to
> >       provide a recovery mechanism at the iSCSI level when command
> >       overflow happens at the target.
> >
> > [3] Some of the debate seems to be around whether the 
> credit mechanism
> >     should be static or dynamic.
> >
> > I believe that static is a subset of
> > dynamic (where you never change the value being advertised). I don't
> > disagree with Charles when he says that it will take experimentation
> > to determine how to best adjust the credit dynamically. However,
> > it is important to provide for it in the protocol so that when a
> > vendor does figure out how best to adjust the credit, they have a
> > protocol mechanism to do so. Even though it is an implementation
> > that provides full rate performance, it is the protocol that
> > enables it (take TCP window scaling option e.g.).
> >
> > [4] Another question that comes up is - Should the credit be per
> >     connection or per session (multiple connections)?
> >
> > The current draft does provide for a session wide "flow control"
> > through MaxCmdRn. I believe that it is better to have flow
> > control on a per connection basis. This enables each connection
> > (which might be different NICs) to operate independently of
> > each other. Having a session wide flow control would cause
> > sync points in both the initiator and the target.
> >
> > Also a smaller field could be used if it is just to indicate
> > a credit window.
> >
> > [5] The credit should be a "pretty good effort" and not a 
> "guarantee".
> >
> > This allows smart targets to overcommitt as the number of initiators
> > logged in increases (while reducing the credit available to the
> > initiators) and increase the credit and reduce overcommittment as
> > the number of initiators logged in decreases.
> >
> > Some mechanism is required to recover from the infrequent case where
> > command buffers get exhausted and have to be thrown away.
> >
> > [6] I would recommend that iSCSI provide a way to recover from
> > command overflow and also maintain ordering.
> >
> > The current proposal does not have a drop notification. It has
> > an ack mechanism (ExpCmdRn). I think for the purpose of drop
> > notification, it is better to be able to indicate the range of
> > commands dropped. TCP acks do tell me which commands
> > reached the target, and command responses tell me which were
> > processed.
> >
> > When a target suffers from command exhaustion, it could behave
> > in 2 different ways - one is to drop all the commands it receives
> > till it detects a retransmission. In this case it would send a drop
> > notification of all commands it receives till it starts receiving
> > the command from where the drop started.
> >
> > The other would be to store all the commands it is able to provide
> > buffers for and provide NAKs for only those that it has dropped.
> > This would be more efficient.
> >
> > In this case, we should also agree on what the semantics of the
> > processing of the out or order commands are. Should they be
> > processed only when the gaps are filled? Or can they be processed
> > in any order?
> >
> > [7] There was some discussion of whether we should propose a slow
> > start algorithm or a fast start algorithm.
> >
> > I think we should a fast start algorithm at this level. At TCP
> > level, the slow start algorithm is important because the two
> > ends are unaware of the state of the network and have to probe it.
> > At the iSCSI level, the target should be reasonably knowledgable
> > about the its own state and be able to provide a credit or
> > reduce/increase it per login as the conditions change (hopefully
> > with some hysteresis built in).
> >
> > [8] On flow control of immediate data, should we first work out
> > the command flow control and then turn our efforts to the
> > data flow control?
> >
> > Once we can agree on some of the basic issues, then it should be
> > relatively easy to work out the credit indication/numbering
> > details etc.
> >
> > Somesh
> >
> > > -----Original Message-----
> > > From: Black_David@emc.com [mailto:Black_David@emc.com]
> > > Sent: Wednesday, October 04, 2000 5:13 PM
> > > To: ips@ece.cmu.edu
> > > Subject: iSCSI sessions: Step 2
> > >
> > >
> > > With my WG co-chair hat on, it's time to call
> > > consensus on some of this ...
> > >
> > > Late last week, I sent the "Let's try again" message
> > > on iSCSI sessions, and since then I've only seen
> > > one thread of comments to it from a combination of
> > > Matt Wakeley and Doug Otis.  The important content
> > > of that thread is Matt renewing his position that
> > > more than one connection ought to be REQUIRED.  Lest
> > > this seem like annoyance, Matt deserves credit for
> > > being patient with the WG's indirect progress towards
> > > consensus that made it necessary for him to renew his
> > > objection on multiple occasions.  As I read Matt's
> > > email, it looks like a good flow control solution
> > > for the single TCP connection iSCSI session case
> > > might satisfy him, but the flow control discussion
> > > is still ongoing.
> > >
> > > In any case, I am stating the following two items
> > > as WG rough consensus, over Matt's renewed objection
> > > in the first case:
> > >
> > > [1] Multiple TCP connections per iSCSI session
> > >    remain OPTIONAL.
> > > [2] Multiple TCP connections per iSCSI session
> > >    will be specified as part of the base
> > >    iSCSI protocol.
> > >
> > > Given that it's two months after the Pittsburgh meeting
> > > I hope the rough consensus will hold on these items;
> > > anyone other than Matt should object to me directly,
> > > if necessary, I'll (reluctantly) reopen these issues
> > > one more time (yes, this is a hint).
> > >
> > > Moving on to the topic of models for multiple connection
> > > sessions, let me start by trying to winnow the approaches
> > > to Asymmetric sessions before taking up Asymmetric vs.
> > > Symmetric again.  Four approaches to Asymmetric sessions
> > > have been discussed.  I have not seen anyone other than
> > > Pierre Labat support his Balanced model in which a single
> > > stream of control moves from TCP connection to TCP connection
> > > within a session. Therefore I believe it is the WG
> > > rough consensus that:
> > >
> > > [3] The Balanced Asymmetric model in which a single
> > >    control stream moves from TCP connection to TCP
> > >    connection in an iSCSI session will not be pursued.
> > >
> > > Similarly, I saw no objections to the note at the end of
> > > Julian's email, indicating that the Collapsed Asymmetric
> > > model in which data is allowed on the command connection
> > > even when there are multiple TCP connections in an iSCSI
> > > session is technically inferior to both the Pure Asymmetric
> > > and Symmetric models. Therefore I believe it is the WG
> > > rough consensus that:
> > >
> > > [4] The Collapsed Asymmetric model in which data is allowed
> > >    on the command connection in multiple connection
> > >    iSCSI sessions will not be pursued.
> > >
> > > The Pure Asymmetric model was originally described as
> > > requiring two TCP connections per session.  Kalman Meth
> > > proposed a modification to it that allowed it to use a
> > > single connection for both command and data.  Between
> > > Kalman being the originator of the Pure Asymmetric model,
> > > lack of objection to his proposal, and rough consensus [2]
> > > above, I believe it to be the WG rough consensus that:
> > >
> > > [5] The Pure Asymmetric model will only be considered
> > >    in the modified form that allows an iSCSI session
> > >    to contain a single TCP connection on which both
> > >    command and data flow.
> > >
> > > If all five of the above consensuses (consensii?) hold,
> > > that would be serious progress.  Objections to these
> > > should be sent to the list, except that I would ask
> > > Pierre Labat not to object to [3] in the absence of
> > > other objections to it.
> > >
> > > Now comes the hard part - Symmetric vs. modified
> > > Pure Symmetric (modified by [5] above).  There are
> > > over 1000 email messages in my mailbox for the ips
> > > mailing list for the past two months, and I freely
> > > admit to not having reviewed them in detail.  I suggested
> > > in the "Let's try again" email that more weight should
> > > be given to those working on implementations, especially
> > > hardware, and have not seen any objections to that
> > > suggestion.  My impression is that the opinion of such
> > > people has been in favor of the Symmetric model -
> > > Matt Wakeley (Agilent), and Somesh Gupta (HP) come
> > > to mind as examples.  I'm not confident that this is
> > > the WG consensus, but it appears to me that the
> > > WG is headed in that direction.  Please comment on
> > > this - the absence of comments/objections will be
> > > taken as a sign of agreement.
> > >
> > > There has been no comment on the error recovery issue
> > > since my email.  Given this and the prior statements that
> > > TCP solves many of the tape error scenarios that are motivating
> > > FCP error recovery, I think the authors of the next version
> > > of the iSCSI draft are entitled to use their best technical
> > > judgement in determining how much error recovery to specify
> > > across multiple TCP connections in an iSCSI session, and
> > > the WG will review it when the next version of the draft
> > > appears.
> > >
> > > We might be getting close to the end of the session issues.
> > > Carefully considered comments are encouraged, but I'd ask
> > > everyone to consider their comments carefully before sending
> > > them, given our past experiences with this set of issues.
> > >
> > > Thanks,
> > > --David
> > >
> > > ---------------------------------------------------
> > > David L. Black, Senior Technologist
> > > EMC Corporation, 42 South St., Hopkinton, MA  01748
> > > +1 (508) 435-1000 x75140     FAX: +1 (508) 497-8500
> > > black_david@emc.com       Mobile: +1 (978) 394-7754
> > > ---------------------------------------------------
> > >
> >
> >
> >
> 
> 
>
Prev by Date: RE: iSCSI Naming and Discovery
Next by Date: RE: iSCSI: Flow Control
Prev by thread: RE: iSCSI: Flow Control
Next by thread: RE: iSCSI: Flow Control
Index(es):
- Date
- Thread
Home
Last updated: Tue Sep 04 01:06:45 2001
6315 messages in chronological order