|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: ISCSI: flow controlAt 11:05 AM 9/25/00 -0400, Black_David@emc.com wrote: >Mike, > > > In essence, this is what InfiniBand does and others have been > > advocating. When the ACK (SCSI response) is returned it encodes a credit > > to inform the sender of how many receives buffers (available command queue > > > slots) have been posted. > >Could you post, or provide a pointer to a self-contained specification of >that >mechanism? If this is a pointer to InfiniBand specs, a heads-up on any >intellectual property issues is in order. The V1.0 InfiniBand spec is about to be made public and as such, I would refer people to it to understand the specifics of that architecture. The problem being addressed in InfiniBand and here is rather generic in nature - how to avoid overflowing a receive queue using a credit scheme. InfiniBand's scheme is unique in terms of the specifics (encoding, ACK message formats, etc.) to it but the essence is the same. I'll try to paraphrase the scheme here in more general purpose terms - if a RFC draft is required, let me know. This credit scheme is implemented as follows: (1) Responder encodes a N-bit credit within the ACK (iSCSI response) message. Credits are absolute values, i.e. one "snapshots" and encodes the current responder's credit value to return in the ACK message. If the endnode does not support credits the requester shall assume an infinite value. (2) Credits are on a per connection or per session basis. Simplicity favors the per connection basis but if the session layer is load balancing commands across multiple connections and given the completion processing and resource management for commands is at the session layer, it may not be a performance / implementation inhibitor to implement this within the session layer itself. In general, this can be implemented across multiple ports or multiple NICs, entirely in software or hardware or a mix with minimal overhead. (3) Requester maintains a current credit count and decrements this value for each outstanding request. When new credit is received, the requester updates its credit window and determines whether new requests may be injected into the network. (4) If a requester does not receive any credits for a period of time and there are no outstanding requests, it may probe the responder by issuing a single request. The responder may respond with a RNR NAK or an ACK with a credit update. This prevents deadlock. Ideally, one would allow an unsolicited ACK to be sent by the responder when new credit arrives and there are no outstanding requests being processed. The advantage for unsolicited ACKs is simplicity - the requester never generates an operation without credit and the responder only returns credit thus making the implementation simpler for both sides. (5) Responder increments its credit value each time a receive descriptor / command queue element is posted / available. Again this value may be per connection or per session depending upon the resource / coherency strategy pursued. (6) To support long-distance implementation, one would like to stretch the number of credits under the assumption that a number of responses are also in-flight at a given time. If this is implemented, then a RNR NAK / QUEUE FULL algorithm is needed as is an unsolicited ACK / grant credit message. An implementation would need to understand the dynamic rate of commands completions and perform optimistic calculations for what this stretched "credit" window is. When it receives a RNR NAK / QUEUE FULL message, it would reduce the injection rate by a moderate amount (avoid large oscillations) - some modeling would be needed to understand what this reduction would be. (7) Requester's can transmit requests that do not consume responder resources, e.g. RDMA READ, RDMA WRITE without immediate data, etc. >A concern that has been raised in this discussion is how credit >information relates to the concurrency and ordering (esp. lack thereof) of >transmission and processing of SCSI commands and the transmission of >responses. My understanding of the FCP approach to buffer management (and >I assume InfiniBand is similar) is that traffic cannot be sent unless the >sender knows that there is space in the receiver's buffer to accommodate >it (i.e., the sender has a credit or credits indicating space in the >receiver's buffer). In general, this is correct for InfiniBand - one cannot initiate a SEND operation unless credit is available. It should be kept in mind that InfiniBand was designed for the data center, i.e. 300 meters for a given link instance. As such, some optimizations were made that may not be acceptable w.r.t. this workgroup's focus. >This implies is that if for some reason the receiver stalled, all the >in-flight commands and data could be successfully received. In contrast, >I've seen discussion on this list of long distance connections in which >there is potentially more traffic in flight than the receiver could >accommodate if the receiver stopped. I believe that whether to allow this >is an open issue, but the underlying cause is valid - there is a desire to >use iSCSI in situations where the initiator to target coupling is looser >(in this case, due to distance) than is typical for SCSI and Fibre Channel. Mike
Home Last updated: Tue Sep 04 01:07:03 2001 6315 messages in chronological order |