|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI: Flow ControlJohn, I really meant to refer to packet buffer memory. The NICs will always need memory for states and also some minimum additional memory. Somesh > -----Original Message----- > From: John Hufferd/San Jose/IBM [mailto:hufferd@us.ibm.com] > Sent: Monday, October 09, 2000 9:27 AM > To: ips@ece.cmu.edu > Subject: RE: iSCSI: Flow Control > > > > Somesh Gupta, > I have been hearing from a number of high performance NIC > vendors that they > expect to use, of course some memory on the NIC, but the > major amounts of > memory will be located in the System's normal Processors > memory. They have > told me, this is not a real problem, because with a > reasonable amount of > NIC memory, and also by using the Processor Memory as needed, > they do not > think that they have a significant problem. (Now most of > these vendors are > trying to do various types of optimizations, and > accelerations including > DMA directly into the target processor's memory.) > > Now, I am not in the NIC business, so what I am doing is > reflecting what I > have been told. > > . > . > . > John L. Hufferd > Senior Technical Staff Member (STSM) > IBM/SSG San Jose Ca > (408) 256-0403, Tie: 276-0403 > Internet address: hufferd@us.ibm.com > > > "GUPTA,SOMESH (HP-Cupertino,ex1)" > <somesh_gupta@am.exch.hp.com>@ece.cmu.edu > on 10/09/2000 07:37:42 AM > > Sent by: owner-ips@ece.cmu.edu > > > To: Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu > cc: > Subject: RE: iSCSI: Flow Control > > > > Julian, > > comments below. > > > -----Original Message----- > > From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com] > > Sent: Monday, October 09, 2000 2:38 AM > > To: ips@ece.cmu.edu > > Subject: RE: iSCSI: Flow Control > > > > > > > > > > Somesh, > > > > I kept quiet on this - but as it risks to get unnecessarily > > complex IMHO I > > can't anymore. > > > > I am not altogether convinced that there is a consensus on > > flow control. > > Let us reiterate the reasons for wanting command flow control: > > > > - for long latency pipes you want the to ship commands and > > data ahead of > > time to keep the pipes full > > - but you want also to avoid the command queueing situation > > in which you > > can be forced to drop commands and refill the queue. > > - you want to keep all devices as busy as possible > > > > The last item as well as the whole SCSI queuing issue is best > > taken care at > > the SCSI layer - as > > it is the only one that might need to keep per-LU-state. > > > > For the first two items - excepts for some artifacts - observe that > > commands are not a significant > > consumer of either bandwidth or target resources. A high > > number of commands > > in transit > > will readily keep the pipes full if they are followed by data > > and pose no > > strain on a target > > where they can be queued at the iSCSI layer. > > It sort of depends on the implementation model whether this is an > issue or not. The aspect of the implementation that has the most > impact in this area is whether the adapter provides data buffering or > not. If the adapter does provide data buffering to the tune of > window size (or in the range), then yes, it is not an issue. > However, this has its own set of problems including cost. > A solution that depends on NIC memory will be at a disadvantage > compared to FC and parallel SCSI. > > In the adapters are not providing buffering, and assuming that > commands and buffers use seperate memory, the target would have to > post command buffers and data buffers to the NIC considering somewhat > the worst case - and on every connection (not accounting for the > worst case but some fraction - after all every connection cannot run > at full speed at the same time). And the target may have multiple > adapters. The targets could ultimately even be disk drives. > > What flow control is doing is enabling the target to be in control > of the flow between the initiator and the target. In a bad way, it > provides the full benefit of the TCP window only when the target > is ready and able to source/sink data at that rate - both sides > knowing where the data is going. > > > > > Data will be flow-controlled by the target limits for > > immediate data and > > the TCP windows > > and by simple conservative ordering rules we can avoid both > > deadlock and > > throwing away data. > > > > What you are suggesting us to look into - flow controlling > > per connection - > > is - I am afraid > > not adding to much. > > It was never my goal to make a fundamental contribution :-) and > I won't mind throwing it out if it can be shown that it is not > needed when iSCSI adapters do not have memory. > > > > And last - but not least - if you implement sessions with one > > connection - > > and use multiple sessions > > you can flow control every connection but then you have to > add a wedge > > driver to do load > > distribution. > > Again this statement perhaps has implmentation assumptions built in. > Consider e.g. multiple "pull iSCSI NICs" on the initiator. If > there is flow > control per connection, the host can distribute SCSI commands > across the > NICs (assuming each handles one connection to the target) as the SCSI > command layer generates the commands and then have no further > interaction > with the adapters on sending the commands/associated data till the > command completion is received. If the flow control is per session, > then what will happen is that a session wide value of > maxcmdRn is received > on a single NIC (different values will be received on different NICs). > To ensure that all NICs follow appropriate behavior based on > this value > will require either communicating this value to all the NICs (those > blocked will need it), or the host holding back the command posting > beyond maxcmdRn and posting them only when the window opens up. > > For the target the problem is (assuming command flow control > is needed), > that it does not have to coordinate buffer availability > across multiple > NICs which is a good thing. > > > > > Regards, > > Julo > > Somesh > > > > > > > > "GUPTA,SOMESH (HP-Cupertino,ex1)" <somesh_gupta@am.exch.hp.com> on > > 09/10/2000 03:36:46 > > > > Please respond to "GUPTA,SOMESH (HP-Cupertino,ex1)" > > <somesh_gupta@am.exch.hp.com> > > > > To: IPS@ece.cmu.edu > > cc: (bcc: Julian Satran/Haifa/IBM) > > Subject: RE: iSCSI: Flow Control > > > > > > > > > > Hi all, > > > > Assuming that we have consensus, especially on [1] below (minimum > > connections is 1), I think we should try and resolve the flow > > control issue. > > > > It seems to me that there is sufficient consensus that command > > flow control is needed - > > > > [1] To enable fastest possible flow of commands given the > > capabilities of the target & initiator, and accomodating > > increased latencies of IP networks > > > > [2] To significantly minimize the queue full condition. And to > > provide a recovery mechanism at the iSCSI level when command > > overflow happens at the target. > > > > [3] Some of the debate seems to be around whether the > credit mechanism > > should be static or dynamic. > > > > I believe that static is a subset of > > dynamic (where you never change the value being advertised). I don't > > disagree with Charles when he says that it will take experimentation > > to determine how to best adjust the credit dynamically. However, > > it is important to provide for it in the protocol so that when a > > vendor does figure out how best to adjust the credit, they have a > > protocol mechanism to do so. Even though it is an implementation > > that provides full rate performance, it is the protocol that > > enables it (take TCP window scaling option e.g.). > > > > [4] Another question that comes up is - Should the credit be per > > connection or per session (multiple connections)? > > > > The current draft does provide for a session wide "flow control" > > through MaxCmdRn. I believe that it is better to have flow > > control on a per connection basis. This enables each connection > > (which might be different NICs) to operate independently of > > each other. Having a session wide flow control would cause > > sync points in both the initiator and the target. > > > > Also a smaller field could be used if it is just to indicate > > a credit window. > > > > [5] The credit should be a "pretty good effort" and not a > "guarantee". > > > > This allows smart targets to overcommitt as the number of initiators > > logged in increases (while reducing the credit available to the > > initiators) and increase the credit and reduce overcommittment as > > the number of initiators logged in decreases. > > > > Some mechanism is required to recover from the infrequent case where > > command buffers get exhausted and have to be thrown away. > > > > [6] I would recommend that iSCSI provide a way to recover from > > command overflow and also maintain ordering. > > > > The current proposal does not have a drop notification. It has > > an ack mechanism (ExpCmdRn). I think for the purpose of drop > > notification, it is better to be able to indicate the range of > > commands dropped. TCP acks do tell me which commands > > reached the target, and command responses tell me which were > > processed. > > > > When a target suffers from command exhaustion, it could behave > > in 2 different ways - one is to drop all the commands it receives > > till it detects a retransmission. In this case it would send a drop > > notification of all commands it receives till it starts receiving > > the command from where the drop started. > > > > The other would be to store all the commands it is able to provide > > buffers for and provide NAKs for only those that it has dropped. > > This would be more efficient. > > > > In this case, we should also agree on what the semantics of the > > processing of the out or order commands are. Should they be > > processed only when the gaps are filled? Or can they be processed > > in any order? > > > > [7] There was some discussion of whether we should propose a slow > > start algorithm or a fast start algorithm. > > > > I think we should a fast start algorithm at this level. At TCP > > level, the slow start algorithm is important because the two > > ends are unaware of the state of the network and have to probe it. > > At the iSCSI level, the target should be reasonably knowledgable > > about the its own state and be able to provide a credit or > > reduce/increase it per login as the conditions change (hopefully > > with some hysteresis built in). > > > > [8] On flow control of immediate data, should we first work out > > the command flow control and then turn our efforts to the > > data flow control? > > > > Once we can agree on some of the basic issues, then it should be > > relatively easy to work out the credit indication/numbering > > details etc. > > > > Somesh > > > > > -----Original Message----- > > > From: Black_David@emc.com [mailto:Black_David@emc.com] > > > Sent: Wednesday, October 04, 2000 5:13 PM > > > To: ips@ece.cmu.edu > > > Subject: iSCSI sessions: Step 2 > > > > > > > > > With my WG co-chair hat on, it's time to call > > > consensus on some of this ... > > > > > > Late last week, I sent the "Let's try again" message > > > on iSCSI sessions, and since then I've only seen > > > one thread of comments to it from a combination of > > > Matt Wakeley and Doug Otis. The important content > > > of that thread is Matt renewing his position that > > > more than one connection ought to be REQUIRED. Lest > > > this seem like annoyance, Matt deserves credit for > > > being patient with the WG's indirect progress towards > > > consensus that made it necessary for him to renew his > > > objection on multiple occasions. As I read Matt's > > > email, it looks like a good flow control solution > > > for the single TCP connection iSCSI session case > > > might satisfy him, but the flow control discussion > > > is still ongoing. > > > > > > In any case, I am stating the following two items > > > as WG rough consensus, over Matt's renewed objection > > > in the first case: > > > > > > [1] Multiple TCP connections per iSCSI session > > > remain OPTIONAL. > > > [2] Multiple TCP connections per iSCSI session > > > will be specified as part of the base > > > iSCSI protocol. > > > > > > Given that it's two months after the Pittsburgh meeting > > > I hope the rough consensus will hold on these items; > > > anyone other than Matt should object to me directly, > > > if necessary, I'll (reluctantly) reopen these issues > > > one more time (yes, this is a hint). > > > > > > Moving on to the topic of models for multiple connection > > > sessions, let me start by trying to winnow the approaches > > > to Asymmetric sessions before taking up Asymmetric vs. > > > Symmetric again. Four approaches to Asymmetric sessions > > > have been discussed. I have not seen anyone other than > > > Pierre Labat support his Balanced model in which a single > > > stream of control moves from TCP connection to TCP connection > > > within a session. Therefore I believe it is the WG > > > rough consensus that: > > > > > > [3] The Balanced Asymmetric model in which a single > > > control stream moves from TCP connection to TCP > > > connection in an iSCSI session will not be pursued. > > > > > > Similarly, I saw no objections to the note at the end of > > > Julian's email, indicating that the Collapsed Asymmetric > > > model in which data is allowed on the command connection > > > even when there are multiple TCP connections in an iSCSI > > > session is technically inferior to both the Pure Asymmetric > > > and Symmetric models. Therefore I believe it is the WG > > > rough consensus that: > > > > > > [4] The Collapsed Asymmetric model in which data is allowed > > > on the command connection in multiple connection > > > iSCSI sessions will not be pursued. > > > > > > The Pure Asymmetric model was originally described as > > > requiring two TCP connections per session. Kalman Meth > > > proposed a modification to it that allowed it to use a > > > single connection for both command and data. Between > > > Kalman being the originator of the Pure Asymmetric model, > > > lack of objection to his proposal, and rough consensus [2] > > > above, I believe it to be the WG rough consensus that: > > > > > > [5] The Pure Asymmetric model will only be considered > > > in the modified form that allows an iSCSI session > > > to contain a single TCP connection on which both > > > command and data flow. > > > > > > If all five of the above consensuses (consensii?) hold, > > > that would be serious progress. Objections to these > > > should be sent to the list, except that I would ask > > > Pierre Labat not to object to [3] in the absence of > > > other objections to it. > > > > > > Now comes the hard part - Symmetric vs. modified > > > Pure Symmetric (modified by [5] above). There are > > > over 1000 email messages in my mailbox for the ips > > > mailing list for the past two months, and I freely > > > admit to not having reviewed them in detail. I suggested > > > in the "Let's try again" email that more weight should > > > be given to those working on implementations, especially > > > hardware, and have not seen any objections to that > > > suggestion. My impression is that the opinion of such > > > people has been in favor of the Symmetric model - > > > Matt Wakeley (Agilent), and Somesh Gupta (HP) come > > > to mind as examples. I'm not confident that this is > > > the WG consensus, but it appears to me that the > > > WG is headed in that direction. Please comment on > > > this - the absence of comments/objections will be > > > taken as a sign of agreement. > > > > > > There has been no comment on the error recovery issue > > > since my email. Given this and the prior statements that > > > TCP solves many of the tape error scenarios that are motivating > > > FCP error recovery, I think the authors of the next version > > > of the iSCSI draft are entitled to use their best technical > > > judgement in determining how much error recovery to specify > > > across multiple TCP connections in an iSCSI session, and > > > the WG will review it when the next version of the draft > > > appears. > > > > > > We might be getting close to the end of the session issues. > > > Carefully considered comments are encouraged, but I'd ask > > > everyone to consider their comments carefully before sending > > > them, given our past experiences with this set of issues. > > > > > > Thanks, > > > --David > > > > > > --------------------------------------------------- > > > David L. Black, Senior Technologist > > > EMC Corporation, 42 South St., Hopkinton, MA 01748 > > > +1 (508) 435-1000 x75140 FAX: +1 (508) 497-8500 > > > black_david@emc.com Mobile: +1 (978) 394-7754 > > > --------------------------------------------------- > > > > > > > > > > > >
Home Last updated: Tue Sep 04 01:06:45 2001 6315 messages in chronological order |