|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re:iSCSI/iWARP drafts and flow controlThe proposed mapping of iSCSI onto iWARP offers an inadequate solution to the problem of flow control. iWARP shifts responsibility for flow control to the ULP. In doing so, it allows ULP-specific pacing based upon number of requests-in-flight rather than relying the bottleneck of transport buffering to flow control the application. The session is no longer throttled by the availability of buffers suitable for any message. This topic is covered in section 4.5 of the RDMAP/DDP Applicability statement (http://www.ietf.org/draft-ietf-rddp-applicability-00.txt) There are two excellent examples of ULP solutions to pacing untagged messages: DAFS and the mapping of RPC over iWARP for NFS. The latter offers the following section on flow control: 3.3. Flow Control It is critical to provide flow control for an RDMA connection. RDMA receive operations will fail if a pre-posted receive buffer is not available to accept an incoming RDMA Send. Such errors are fatal to the connection. This is a departure from conventional TCP/IP networking where buffers are allocated dynamically on an as-needed basis, and pre-posting is not required. It is not practical to provide for fixed credit limits at the RPC server. Fixed limits scale poorly, since posted buffers are dedicated to the associated connection until consumed by receive operations. Additionally for protocol correctness, the server must be able to reply whether or not a new buffer can be posted to accept future receives. Flow control is implemented as a simple request/grant protocol in the transport header associated with each RPC message. The transport header for RPC CALL messages contains a requested credit value for the server, which may be dynamically adjusted by the caller to match its expected needs. The transport header for the RPC REPLY messages provide the granted result, which may have any value except it may not be zero when no in-progress operations are present at the server, since such a value would result in deadlock. The value may be adjusted up or down at each opportunity to match the server's needs or policies. While RPC CALLs may complete in any order, the current flow control limit at the RPC server is known to the RPC client from the Send ordering properties. It is always the most recent server granted credits minus the number of requests in flight. I believe this is quite a contrast with the iSCSI/iWARP proposal: 10.1 Flow Control for RDMA Send Message Types RDMAP Send Message Types are used by the iSER Layer to transfer iSCSI control-type PDUs. Each RDMAP Send Message Type consumes an Untagged Buffer at the Data Sink. However, neither the RDMAP layer nor the iSER Layer provides an explicit flow control mechanism for the RDMAP Send Message Types. Therefore, the iSER Layer SHOULD provision enough Untagged buffers for handling incoming RDMAP Send Message Types to prevent a buffer underrun condition at the RDMAP layer. If a buffer underrun happens, it may result in the termination of the connection. An implementation may choose to satisfy this requirement by using a common buffer pool shared across multiple connections, with usage limits on a per connection basis and usage limits on the buffer pool itself. In such an implementation, exceeding the buffer usage limit for a connection or the buffer pool itself may trigger interventions from the iSER Layer to replenish the buffer pool and/or to isolate the connection causing the problem. Stating that the iSER Layer "SHOULD" provision enough Untagged buffers is an interesting use of the IETF "SHOULD". Implementations are *guaranteed* to have a valid reason to break the "SHOULD", they do not have enough information to comply. The Upper Layer Protocol has failed to provide it. How is the target supposed to estimate how many untagged messages the initiator will presume it is capable of handling? Or vise versa? How? Provision enough buffers to match your physical line rate under the worst case scenarios? Even if you're an economy model? Guess? Keep a table by model number? Limit yourself to one untagged message in flight? Even if you are supposed to be a high performance model? Keep trying until you crash the connection? True interoperability is not based upon tweaking or fine-tuning to match the peers. Peers work together because the protocol has enabled any peer to work with any other compliant peer. Period. Guestimating has nothing to do with it. Fortunately, establishing a credit protocol that is compatible with normal iSCSI interactions is easily done. Generically an RDMA-capable ULP flow control strategy requires three things: 1) An initial credit level. This can be established during connection/stream establishment just as is proposed for RDMA Read Credits. 2) A credit is consumed for each untagged message sent, exactly as sending each RDMA Read Request consumes an RDMA Read credit. 3) The ULP reply restores credits. With RDMA Reads this is a simple one-to-one process. DAFS also uses has each reply replenish the credit that the request it is responding to drained. The NFS/RPC protocol allows the RPC layer to explicitly vary the number of credits restored in each untagged message. The only special requirement that I can see is that there may be a sequence of untagged messages that are not individually acknowledged. That can be taken care of by the following rules: -- A ULP response to a ULP request implies that all prior ULP requests have been processed, even if they did not warrant an explicit response. -- A ULP response restores credits for itself and for any other "phantom" responses that it implies. -- If a ULP needs to send a sequence of untagged messages that will not be acknowledge which will drain the credits, it needs to insert an untagged message that will be acknowledge. Any form of echoed NOP or Ping could be used. Caitlin Bestler - cait@asomi.com - http://asomi.com/
Home Last updated: Tue Aug 05 12:46:10 2003 12771 messages in chronological order |