|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI/iWARP drafts and flow controlOn Thursday, July 31, 2003, at 02:42 PM, <pat_thaler@agilent.com> wrote: > All, sorry for the empty reply - I'm not sure how that happened. > > Caitlin, > > You asked some questions about how the other messages are flow > controlled in iSCSI over TCP. The answer is that they aren't flow > controlled. If iSCSI gets a PDU it cannot handle, it drops it and > there are provisions to trigger it to be resent depending on the kind > of recovery level supported. The only control for PDUs to the target > is on non-immediate commands (both SCSI Command and Task Management > Function Requeset PDUs). Note that when unsolicited non-immediate data > is permitted, iSCSI allows the command to generate a command PDU plus > an unknown number of SCSI Data-out PDUs to carry the unsolicted data. > For iSER, we require that the unsolicted SCSI Data-out PDUs be full > when there is enough unsolicted data to fill them (and we created a > key to negotiate that size). Therefore, when operating over iSER the > target does know the maximum number of PDUs that the initiator might > send per SCSI command. > > There is no deadlock in existing iSCSI because there is no flow > control on NOP-In and the target can always send a NOP-In to advance > MaxCmdSN. > > To summarize, in current iSCSI, each opening in CmdSN window allows > from 1 to ? PDUs while in iSCSI over iSER, each opening in CmdSN > window allows from 1 to n PDUs where n is the amount of unsolicited > data divided by data per PDU (rounded up of course). On the contrary, existing iSCSI has buffer flow control. It runs over TCP. The receiving TCP stack declares a buffer window which the sending TCP MUST comply with. (And it SHOULD have enough buffers to match its promises, but that's a separate issue, a TCP stack can under-provision for the same reasons that the ULP finds it valuable). Even if a TCP segment will be recognized by a rototilled receiver, and its payload placed directly into a user buffer, the sending TCP is still flow controlled by the buffer window. The TCP window advertisement is not conditional. It is "I will accept N bytes". Not "n bytes as long as 90% of them can be directly placed.". This results in head-of-line blocking. A limited supply of general purpose buffering can prevent messages from being sent that would have bypassed those buffers. In order to allow DDP to be implemented efficiently, it must be able to assume that it will be able to place data as soon as it accepts a segment/chunk from the LLP for placement. The DDP layer does not do buffering. In order for this to work, the role of SCTP/TCP buffer windows MUST be replaced by ULP flow control. SCTP/TCP buffer windows are designed to ensure that there is a place to accept each received buffer (and to slow down the sender so that this condition can be maintained). Tagged messages have a valid target, or the stream is terminated. There is no condition where a valid tagged message will lack a target buffer. Untagged messages, however consume resources. Without flow control the sender can send messages which will not have a buffer to receive them. A reliable protocol prevents this with flow control. The only change from iSCSI directly over TCP and iSER is that this flow control has been refined to avoid false head-of-line blocking. But doing that requires shifting the mechanics of the flow control from the LLP to the ULP. There is no reason for iSER flow control to stall transmission of any untagged message that would not have been stall by SCTP/TCP buffer windows. In fact, it should be able to avoid false blocking. If iSCSI really required a command to be sent *now*, it would not work over TCP. Since it does, there is obviously a solution where the iSER layer would on occassion stall an untagged message on the transmit side. Your analysis consistently focuses on the receive side. Flow control is not about the receive side, it is about limiting transmit side based upon feedback from the receive side. What has to be done is to accept that constraint, and then determine the most efficient form of feedback available. It works over TCP, which is fairly crude in terms of feedback. Therefore a solution is possible. If you do not want to rely upon implicit buffer freeing, a simple flag could request an explicit ack. It would only be required under special circumstances. If it were required more often then the whole idea that this could have been estimated on the receiving side would be suspect. So far, I haven't questioned that receiver estimation would not work most of the time -- just that doing so is not flow control. It is not a reliable protocol, which means that in the *long run* it will not be robust. Unreliable protocols can be made to work quite well, with amazingly few drops and high performance -- until somebody changes one end radically and/or the network topology. Reliable protocols are supposed to prevent that. > Note also that the CmdSN window is across a session. > If you have connections in a session that are running > over separate RNICs and are using CmdSN for flow control, > each RNIC will have to have access to enough buffers > for the whole window to land on it. > This is a valid reason why the credits cannot always be enforced by the DDP layer. I have already agreed that the DDP layer cannot enforce credit limits if it does not know them, and that there are specialized cases where the ULP would not find it desirable/convenient to share this information. But the *existence* of a limit is independent of whether the receiving DDP is involved in its enforcement. The critical factor is that the Data Source ULP is aware of the limit. > Between these two factors, CmdSN flow control will require over > provisioning buffers much of the time. Perhaps memory is cheap > enough that for an RNIC with a small number of connections this > is acceptable in exchange for using an existing mechanism. On the > other hand, we will have to create a mechanism to handle immediate > commands and other PDUs that aren't covered by CmdSN so it isn't > clear to me whether this is the right answer. The downside is > overprovisioning buffers because of sessions spanning adapters and > because each command might be a write with unsolicited data but many > commands are reads. The upside is that CmdSN window can be managed > to respond to changes in load while one has a less responsive simple > mechanism to deal with the rest of the traffic. > Just as with TCP/SCTP, actual provisioning of buffers is independent of the advertised flow control. With the caveat that the advertised flow control is expected to be reasonably reliable. But buffers can be under-provisioned with amazing accurately at any protocol layer. > What isn't flow controlled by iSCSI: > initiator to target: > immediate command PDUs - existing iSCSI allows for the target to > drop these if it gets more than it can handle and the initiator > can only count on buffering for two, but the initiator can send > more than that and hope the target has buffering. One can't count > on how many of these there might be. An iSCSI can drop these under a properly flow controlled iSER as well. But it has to receive the requests first. Are they being delivered over a reliable protocol or not? Deciding to "drop" a command at the ULP layer just means that the buffer is returned to the pool quickly. It does not mean that there didn't need to be a buffer to receive the command. > > Is there a mechanism to disable flow control when the > receiver doesn't require it, e.g. large shared buffer > pool with statistical provisioning? > That would be an argument for allowing a session to explicitly negotiate these "extraneous" credits. If you have a large shared buffer, simply grant more credits.
Home Last updated: Thu Aug 07 14:19:22 2003 12787 messages in chronological order |