|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI: Flow ControlSomesh, I kept quiet on this - but as it risks to get unnecessarily complex IMHO I can't anymore. I am not altogether convinced that there is a consensus on flow control. Let us reiterate the reasons for wanting command flow control: - for long latency pipes you want the to ship commands and data ahead of time to keep the pipes full - but you want also to avoid the command queueing situation in which you can be forced to drop commands and refill the queue. - you want to keep all devices as busy as possible The last item as well as the whole SCSI queuing issue is best taken care at the SCSI layer - as it is the only one that might need to keep per-LU-state. For the first two items - excepts for some artifacts - observe that commands are not a significant consumer of either bandwidth or target resources. A high number of commands in transit will readily keep the pipes full if they are followed by data and pose no strain on a target where they can be queued at the iSCSI layer. Data will be flow-controlled by the target limits for immediate data and the TCP windows and by simple conservative ordering rules we can avoid both deadlock and throwing away data. What you are suggesting us to look into - flow controlling per connection - is - I am afraid not adding to much. And last - but not least - if you implement sessions with one connection - and use multiple sessions you can flow control every connection but then you have to add a wedge driver to do load distribution. Regards, Julo "GUPTA,SOMESH (HP-Cupertino,ex1)" <somesh_gupta@am.exch.hp.com> on 09/10/2000 03:36:46 Please respond to "GUPTA,SOMESH (HP-Cupertino,ex1)" <somesh_gupta@am.exch.hp.com> To: IPS@ece.cmu.edu cc: (bcc: Julian Satran/Haifa/IBM) Subject: RE: iSCSI: Flow Control Hi all, Assuming that we have consensus, especially on [1] below (minimum connections is 1), I think we should try and resolve the flow control issue. It seems to me that there is sufficient consensus that command flow control is needed - [1] To enable fastest possible flow of commands given the capabilities of the target & initiator, and accomodating increased latencies of IP networks [2] To significantly minimize the queue full condition. And to provide a recovery mechanism at the iSCSI level when command overflow happens at the target. [3] Some of the debate seems to be around whether the credit mechanism should be static or dynamic. I believe that static is a subset of dynamic (where you never change the value being advertised). I don't disagree with Charles when he says that it will take experimentation to determine how to best adjust the credit dynamically. However, it is important to provide for it in the protocol so that when a vendor does figure out how best to adjust the credit, they have a protocol mechanism to do so. Even though it is an implementation that provides full rate performance, it is the protocol that enables it (take TCP window scaling option e.g.). [4] Another question that comes up is - Should the credit be per connection or per session (multiple connections)? The current draft does provide for a session wide "flow control" through MaxCmdRn. I believe that it is better to have flow control on a per connection basis. This enables each connection (which might be different NICs) to operate independently of each other. Having a session wide flow control would cause sync points in both the initiator and the target. Also a smaller field could be used if it is just to indicate a credit window. [5] The credit should be a "pretty good effort" and not a "guarantee". This allows smart targets to overcommitt as the number of initiators logged in increases (while reducing the credit available to the initiators) and increase the credit and reduce overcommittment as the number of initiators logged in decreases. Some mechanism is required to recover from the infrequent case where command buffers get exhausted and have to be thrown away. [6] I would recommend that iSCSI provide a way to recover from command overflow and also maintain ordering. The current proposal does not have a drop notification. It has an ack mechanism (ExpCmdRn). I think for the purpose of drop notification, it is better to be able to indicate the range of commands dropped. TCP acks do tell me which commands reached the target, and command responses tell me which were processed. When a target suffers from command exhaustion, it could behave in 2 different ways - one is to drop all the commands it receives till it detects a retransmission. In this case it would send a drop notification of all commands it receives till it starts receiving the command from where the drop started. The other would be to store all the commands it is able to provide buffers for and provide NAKs for only those that it has dropped. This would be more efficient. In this case, we should also agree on what the semantics of the processing of the out or order commands are. Should they be processed only when the gaps are filled? Or can they be processed in any order? [7] There was some discussion of whether we should propose a slow start algorithm or a fast start algorithm. I think we should a fast start algorithm at this level. At TCP level, the slow start algorithm is important because the two ends are unaware of the state of the network and have to probe it. At the iSCSI level, the target should be reasonably knowledgable about the its own state and be able to provide a credit or reduce/increase it per login as the conditions change (hopefully with some hysteresis built in). [8] On flow control of immediate data, should we first work out the command flow control and then turn our efforts to the data flow control? Once we can agree on some of the basic issues, then it should be relatively easy to work out the credit indication/numbering details etc. Somesh > -----Original Message----- > From: Black_David@emc.com [mailto:Black_David@emc.com] > Sent: Wednesday, October 04, 2000 5:13 PM > To: ips@ece.cmu.edu > Subject: iSCSI sessions: Step 2 > > > With my WG co-chair hat on, it's time to call > consensus on some of this ... > > Late last week, I sent the "Let's try again" message > on iSCSI sessions, and since then I've only seen > one thread of comments to it from a combination of > Matt Wakeley and Doug Otis. The important content > of that thread is Matt renewing his position that > more than one connection ought to be REQUIRED. Lest > this seem like annoyance, Matt deserves credit for > being patient with the WG's indirect progress towards > consensus that made it necessary for him to renew his > objection on multiple occasions. As I read Matt's > email, it looks like a good flow control solution > for the single TCP connection iSCSI session case > might satisfy him, but the flow control discussion > is still ongoing. > > In any case, I am stating the following two items > as WG rough consensus, over Matt's renewed objection > in the first case: > > [1] Multiple TCP connections per iSCSI session > remain OPTIONAL. > [2] Multiple TCP connections per iSCSI session > will be specified as part of the base > iSCSI protocol. > > Given that it's two months after the Pittsburgh meeting > I hope the rough consensus will hold on these items; > anyone other than Matt should object to me directly, > if necessary, I'll (reluctantly) reopen these issues > one more time (yes, this is a hint). > > Moving on to the topic of models for multiple connection > sessions, let me start by trying to winnow the approaches > to Asymmetric sessions before taking up Asymmetric vs. > Symmetric again. Four approaches to Asymmetric sessions > have been discussed. I have not seen anyone other than > Pierre Labat support his Balanced model in which a single > stream of control moves from TCP connection to TCP connection > within a session. Therefore I believe it is the WG > rough consensus that: > > [3] The Balanced Asymmetric model in which a single > control stream moves from TCP connection to TCP > connection in an iSCSI session will not be pursued. > > Similarly, I saw no objections to the note at the end of > Julian's email, indicating that the Collapsed Asymmetric > model in which data is allowed on the command connection > even when there are multiple TCP connections in an iSCSI > session is technically inferior to both the Pure Asymmetric > and Symmetric models. Therefore I believe it is the WG > rough consensus that: > > [4] The Collapsed Asymmetric model in which data is allowed > on the command connection in multiple connection > iSCSI sessions will not be pursued. > > The Pure Asymmetric model was originally described as > requiring two TCP connections per session. Kalman Meth > proposed a modification to it that allowed it to use a > single connection for both command and data. Between > Kalman being the originator of the Pure Asymmetric model, > lack of objection to his proposal, and rough consensus [2] > above, I believe it to be the WG rough consensus that: > > [5] The Pure Asymmetric model will only be considered > in the modified form that allows an iSCSI session > to contain a single TCP connection on which both > command and data flow. > > If all five of the above consensuses (consensii?) hold, > that would be serious progress. Objections to these > should be sent to the list, except that I would ask > Pierre Labat not to object to [3] in the absence of > other objections to it. > > Now comes the hard part - Symmetric vs. modified > Pure Symmetric (modified by [5] above). There are > over 1000 email messages in my mailbox for the ips > mailing list for the past two months, and I freely > admit to not having reviewed them in detail. I suggested > in the "Let's try again" email that more weight should > be given to those working on implementations, especially > hardware, and have not seen any objections to that > suggestion. My impression is that the opinion of such > people has been in favor of the Symmetric model - > Matt Wakeley (Agilent), and Somesh Gupta (HP) come > to mind as examples. I'm not confident that this is > the WG consensus, but it appears to me that the > WG is headed in that direction. Please comment on > this - the absence of comments/objections will be > taken as a sign of agreement. > > There has been no comment on the error recovery issue > since my email. Given this and the prior statements that > TCP solves many of the tape error scenarios that are motivating > FCP error recovery, I think the authors of the next version > of the iSCSI draft are entitled to use their best technical > judgement in determining how much error recovery to specify > across multiple TCP connections in an iSCSI session, and > the WG will review it when the next version of the draft > appears. > > We might be getting close to the end of the session issues. > Carefully considered comments are encouraged, but I'd ask > everyone to consider their comments carefully before sending > them, given our past experiences with this set of issues. > > Thanks, > --David > > --------------------------------------------------- > David L. Black, Senior Technologist > EMC Corporation, 42 South St., Hopkinton, MA 01748 > +1 (508) 435-1000 x75140 FAX: +1 (508) 497-8500 > black_david@emc.com Mobile: +1 (978) 394-7754 > --------------------------------------------------- >
Home Last updated: Tue Sep 04 01:06:46 2001 6315 messages in chronological order |