RE: ISCSI: flow control

To: Black_David@emc.com
Subject: RE: ISCSI: flow control
From: Michael Krause <krause@cup.hp.com>
Date: Tue, 26 Sep 2000 15:53:53 -0700
Cc: ips@ece.cmu.edu
Content-Type: text/plain; charset="us-ascii"; format=flowed
In-Reply-To: <0F31E5C394DAD311B60C00E029101A0704100FD4@corpmx9.isus.emc.com>
Sender: owner-ips@ece.cmu.edu

At 11:05 AM 9/25/00 -0400, Black_David@emc.com wrote:
>Mike,
>
> > In essence, this is what InfiniBand does and others have been
> > advocating.  When the ACK (SCSI response) is returned it encodes a credit
> > to inform the sender of how many receives buffers (available command queue
>
> > slots) have been posted.
>
>Could you post, or provide a pointer to a self-contained specification of
>that
>mechanism?  If this is a pointer to InfiniBand specs, a heads-up on any
>intellectual property issues is in order.

The V1.0 InfiniBand spec is about to be made public and as such, I would 
refer people to it to understand the specifics of that architecture.  The 
problem being addressed in InfiniBand and here is rather generic in nature 
- how to avoid overflowing a receive queue using a credit 
scheme.  InfiniBand's scheme is unique in terms of the specifics (encoding, 
ACK message formats, etc.) to it but the essence is the same.  I'll try to 
paraphrase the scheme here in more general purpose terms - if a RFC draft 
is required, let me know.

This credit scheme is implemented as follows:

(1) Responder encodes a N-bit credit within the ACK (iSCSI response) 
message.  Credits are absolute values, i.e. one "snapshots" and encodes the 
current responder's credit value to return in the ACK message.  If the 
endnode does not support credits the requester shall assume an infinite 
value.

(2) Credits are on a per connection or per session basis.  Simplicity 
favors the per connection basis but if the session layer is load balancing 
commands across multiple connections and given the completion processing 
and resource management for commands is at the session layer, it may not be 
a performance / implementation inhibitor to implement this within the 
session layer itself.  In general, this can be implemented across multiple 
ports or multiple NICs, entirely in software or hardware or a mix with 
minimal overhead.

(3) Requester maintains a current credit count and decrements this value 
for each outstanding request.  When new credit is received, the requester 
updates its credit window and determines whether new requests may be 
injected into the network.

(4) If a requester does not receive any credits for a period of time and 
there are no outstanding requests, it may probe the responder by issuing a 
single request.  The responder may respond with a RNR NAK or an ACK with a 
credit update.  This prevents deadlock.  Ideally, one would allow an 
unsolicited ACK to be sent by the responder when new credit arrives and 
there are no outstanding requests being processed.  The advantage for 
unsolicited ACKs is simplicity - the requester never generates an operation 
without credit and the responder only returns credit thus making the 
implementation simpler for both sides.

(5) Responder increments its credit value each time a receive descriptor / 
command queue element is posted / available.  Again this value may be per 
connection or per session depending upon the resource / coherency strategy 
pursued.

(6) To support long-distance implementation, one would like to stretch the 
number of credits under the assumption that a number of responses are also 
in-flight at a given time.  If this is implemented, then a RNR NAK / QUEUE 
FULL algorithm is needed as is an unsolicited ACK / grant credit 
message.  An implementation would need to understand the dynamic rate of 
commands completions and perform optimistic calculations for what this 
stretched "credit" window is.  When it receives a RNR NAK / QUEUE FULL 
message, it would reduce the injection rate by a moderate amount (avoid 
large oscillations) - some modeling would be needed to understand what this 
reduction would be.

(7) Requester's can transmit requests that do not consume responder 
resources, e.g. RDMA READ, RDMA WRITE without immediate data, etc.

>A concern that has been raised in this discussion is how credit 
>information relates to the concurrency and ordering (esp. lack thereof) of 
>transmission and processing of SCSI commands and the transmission of 
>responses.  My understanding of the FCP approach to buffer management (and 
>I assume InfiniBand is similar) is that traffic cannot be sent unless the 
>sender knows that there is space in the receiver's buffer to accommodate 
>it (i.e., the sender has a credit or credits indicating space in the 
>receiver's buffer).

In general, this is correct for InfiniBand - one cannot initiate a SEND 
operation unless credit is available.  It should be kept in mind that 
InfiniBand was designed for the data center, i.e. 300 meters for a given 
link instance.  As such, some optimizations were made that may not be 
acceptable w.r.t. this workgroup's focus.

>This implies is that if for some reason the receiver stalled, all the 
>in-flight commands and data could be successfully received.  In contrast, 
>I've seen discussion on this list of long distance connections in which 
>there is potentially more traffic in flight than the receiver could 
>accommodate if the receiver stopped.  I believe that whether to allow this 
>is an open issue, but the underlying cause is valid - there is a desire to 
>use iSCSI in situations where the initiator to target coupling is looser 
>(in this case, due to distance) than is typical for SCSI and Fibre Channel.

Mike

Follow-Ups:
- RE: ISCSI: flow control
  - From: "Douglas Otis" <dotis@sanlight.net>

References:
- RE: ISCSI: flow control
  - From: Black_David@emc.com

Prev by Date: New List: rdma@cisco.com: to discuss RDMA
Next by Date: RE: iSCSI: Session Partial Resolution
Prev by thread: RE: ISCSI: flow control
Next by thread: RE: ISCSI: flow control
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:07:03 2001
6315 messages in chronological order