|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI/iWARP drafts and flow controlOn Tuesday, July 29, 2003, at 08:58 PM, Mallikarjun C. wrote: > As Mike points out, the CmdSN-based flow control > in iSCSI is relevant here. Let me note that the design > team behind the current iSER draft considered this topic > in great detail, but I can now clearly see that the draft > unfortunately does not capture the design rationale very well. > It could be clearer. But even if it were clearer, it would not change the fact that it fails to provide ULP-level flow control for untagged messages. The requirement here is that the ULP provide *flow control* for untagged messages. Control means that the Data Source either has permission to send an untagged message, or it does not. There is an identical flow control issue for RDMA Reads. You either are allowed to send one or you are not. If you are allowed to send an untagged message, you have an expectation that the other side has the resources to handle it. Bugs, under-provisioning and hardware faults are all facts of life. So robust applications are prepared to deal with faults. But faults reflect a *failure*. If an untagged message is allowed, then the Data Source has ever reason to expect that the Data Sink will handle it properly. Failure to do so if a fault on the Data Sink. If an untagged message is not allowed, then the Data Source had no right to send it. It cannot complain if the Data Sink terminates the stream. In this case the Data Source is the one committing the fault. > iSCSI does not provide a PDU-level positive flow control > but instead relies on the CmdSN feature, from which most > of the iSCSI (what DA/iSER call as the) "control-type" PDU traffic > can be precisely estimated (note that only control-type PDUs > are candidates for Send Messages and thus relevant to this > discussion). However, it turns out that there are certain > opcode types that are used very rarely that are not governed > by the CmdSN-based flow control - immediate commands, > SNACK, unsolicited NOP-In, Reject, and Async Messages. There is no requirement that there be an explicit wire-level protocol. Merely that the ULP establishes a mechanism by which the sender knows whether it can send a given untagged message. iSCSI CmdSN flow control already provides this flow control for most iSER packets. So the only issue is establishing rules for the remaining packets. Controlling the flow of *most* packets is *not* flow control. It is somewhat akin to having a strictly balanced budget except for these three funds which are unrestricted. > > > Note that the above does not include the unsolicited Data-out > PDUs since the worst case number of these is precisely known from > CmdSN, but the worst case buffer provisioning for these would > be both unnecessary and extremely expensive in reality. > Under-provisioning of buffers is a local issue, with the caveat that doing it improperly is a fault on the Data Sink's part. There can also be faults from exhaustion of CPU power, hardware faults and plain old software errors.The server is obviously expected to keep these to a minimum. The key distinction that must be made is between granting credits, providing buffers and matching buffers. The classic simple ordered Receive Queue is the one interface that I believe everyone agrees must be supported. With it the Data Sink ULP posts a receive buffer, and thereby grants a credit and pre-assigns a buffer to the QN/MSN. The Shared Receive Queue (proposed in draft-hilland) shares both buffers and credits across a pool. Buffers are assigned to the QN/MSN on an as needed basis. The implementation has an option of filling in buffers for the gap when a high MSN is received, otherwise the buffer is allocated when a portion of it is first received. Credits are consumed when buffers are allocated. Note that Shared Receive Queues only apply to the RDMA Send queue, the RDMA Read queue is not documented, but given that a fixed limit is configured would presumably be a simple ordered Receive Queue. Shared Buffer Pools place buffers in a pool, but assign credits on a per stream basis. If an MSN exceeds the range implied by the credits it is rejected as invalid whether there is a buffer available or not. iSER seems to call for the ability to pool credits across all streams in a session. But it would not necessarily be the same set of streams that you would want to share buffers over. There could be advantages of pooling buffers between sessions, while still tracking credits on a per session basis. In any event, these are all *local* questions. The only *wire* question is whether the Data Source can know whether or not it is legal for it to send a given untagged message. Stating that for message types "x" it is legal as long as the Data Source thinks it has a reason to send "x" is NOT flow control. For each of the "exceptional" types, what is required that a rule be derived on how many of them can be outstanding, and how the sender knows when they are no longer outstanding. If, as claimed, it is a trivial matter for the Data Sink to make these calculations, then it should be easy to enumerate these rules. > The iSER design team thus believed that most storage implementations > will use buffer pools to deal with this reality (as they have always > been), and the rare "fringe" opcode types mentioned above could > easily be dealt with in the statistical provisioning scheme of things, > being > so rare and infrequent. It is totally incorrect for an Upper Layer Protocol to be designed with presumptions as to implementation of the lower layers. If you believe buffer pools are required for the correct functioning of an application using iWARP then you should be arguing for that change to iWARP. Otherwise, the Upper Layer Protocol must be defined so as to rely upon the published protocol and nothing else. iWARP requires the ULP to take responsibility for flow control of untagged messages. Period. > Despite this belief (in fact, even before we are convinced of this > approach), > we did a diligent analysis of a Send Message flow control protocol for > iSER > - the ultimate conclusion was that it's way too much overhead to run > this > protocol, it's slow-to-respond to changing I/O loads, reclaiming of > credits > is a burdensome process, requires RTT delays to announce new credits > etc. > That is based upon the assumption that iSER flow control requires iSER flow control messages. This is not a requirement. A requirement that the Data Source MUST NOT submit more than one connection termination notice upon any given connection would fully flow control that type of message -- with no wire protocol messages being exchanged. > I believe the approach adopted in the current iSER draft is > appropriate, > we do however need to polish the flow control discussion to include > some of the design rationale. Rationale are not constraints upon the sender of untagged messages. Flow *control*, by definition, is a constraint on the sender. The constraint does not have to take the form of dynamically exchanged messages, or even per-session negotiated limits. But it does require that a limit be unambiguously identified. Otherwise it is not flow *control*. Again, this has nothing to do with how many buffers the Data Sink must provision and when. Dynamic binding of buffers is a totally valid strategy, especially if the Data Sink has "low water mark" warnings and processes responsible for responding to those alarms to restock the buffer pool. The point is that failure to provide true flow control *requires* that *all* implementations build such an infrastructure. It is taking a feature that is desirable for high volume servers and making it a de facto requirement for *all* servers. Even those who only intend to support a single client. Caitlin Bestler - cait@asomi.com - http://asomi.com/
Home Last updated: Tue Aug 05 12:46:08 2003 12771 messages in chronological order |