SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: ISCSI: flow control



    At 11:05 AM 9/25/00 -0400, Black_David@emc.com wrote:
    >Mike,
    >
    > > In essence, this is what InfiniBand does and others have been
    > > advocating.  When the ACK (SCSI response) is returned it encodes a credit
    > > to inform the sender of how many receives buffers (available command queue
    >
    > > slots) have been posted.
    >
    >Could you post, or provide a pointer to a self-contained specification of
    >that
    >mechanism?  If this is a pointer to InfiniBand specs, a heads-up on any
    >intellectual property issues is in order.
    
    The V1.0 InfiniBand spec is about to be made public and as such, I would 
    refer people to it to understand the specifics of that architecture.  The 
    problem being addressed in InfiniBand and here is rather generic in nature 
    - how to avoid overflowing a receive queue using a credit 
    scheme.  InfiniBand's scheme is unique in terms of the specifics (encoding, 
    ACK message formats, etc.) to it but the essence is the same.  I'll try to 
    paraphrase the scheme here in more general purpose terms - if a RFC draft 
    is required, let me know.
    
    This credit scheme is implemented as follows:
    
    (1) Responder encodes a N-bit credit within the ACK (iSCSI response) 
    message.  Credits are absolute values, i.e. one "snapshots" and encodes the 
    current responder's credit value to return in the ACK message.  If the 
    endnode does not support credits the requester shall assume an infinite 
    value.
    
    (2) Credits are on a per connection or per session basis.  Simplicity 
    favors the per connection basis but if the session layer is load balancing 
    commands across multiple connections and given the completion processing 
    and resource management for commands is at the session layer, it may not be 
    a performance / implementation inhibitor to implement this within the 
    session layer itself.  In general, this can be implemented across multiple 
    ports or multiple NICs, entirely in software or hardware or a mix with 
    minimal overhead.
    
    (3) Requester maintains a current credit count and decrements this value 
    for each outstanding request.  When new credit is received, the requester 
    updates its credit window and determines whether new requests may be 
    injected into the network.
    
    (4) If a requester does not receive any credits for a period of time and 
    there are no outstanding requests, it may probe the responder by issuing a 
    single request.  The responder may respond with a RNR NAK or an ACK with a 
    credit update.  This prevents deadlock.  Ideally, one would allow an 
    unsolicited ACK to be sent by the responder when new credit arrives and 
    there are no outstanding requests being processed.  The advantage for 
    unsolicited ACKs is simplicity - the requester never generates an operation 
    without credit and the responder only returns credit thus making the 
    implementation simpler for both sides.
    
    (5) Responder increments its credit value each time a receive descriptor / 
    command queue element is posted / available.  Again this value may be per 
    connection or per session depending upon the resource / coherency strategy 
    pursued.
    
    (6) To support long-distance implementation, one would like to stretch the 
    number of credits under the assumption that a number of responses are also 
    in-flight at a given time.  If this is implemented, then a RNR NAK / QUEUE 
    FULL algorithm is needed as is an unsolicited ACK / grant credit 
    message.  An implementation would need to understand the dynamic rate of 
    commands completions and perform optimistic calculations for what this 
    stretched "credit" window is.  When it receives a RNR NAK / QUEUE FULL 
    message, it would reduce the injection rate by a moderate amount (avoid 
    large oscillations) - some modeling would be needed to understand what this 
    reduction would be.
    
    (7) Requester's can transmit requests that do not consume responder 
    resources, e.g. RDMA READ, RDMA WRITE without immediate data, etc.
    
    >A concern that has been raised in this discussion is how credit 
    >information relates to the concurrency and ordering (esp. lack thereof) of 
    >transmission and processing of SCSI commands and the transmission of 
    >responses.  My understanding of the FCP approach to buffer management (and 
    >I assume InfiniBand is similar) is that traffic cannot be sent unless the 
    >sender knows that there is space in the receiver's buffer to accommodate 
    >it (i.e., the sender has a credit or credits indicating space in the 
    >receiver's buffer).
    
    In general, this is correct for InfiniBand - one cannot initiate a SEND 
    operation unless credit is available.  It should be kept in mind that 
    InfiniBand was designed for the data center, i.e. 300 meters for a given 
    link instance.  As such, some optimizations were made that may not be 
    acceptable w.r.t. this workgroup's focus.
    
    >This implies is that if for some reason the receiver stalled, all the 
    >in-flight commands and data could be successfully received.  In contrast, 
    >I've seen discussion on this list of long distance connections in which 
    >there is potentially more traffic in flight than the receiver could 
    >accommodate if the receiver stopped.  I believe that whether to allow this 
    >is an open issue, but the underlying cause is valid - there is a desire to 
    >use iSCSI in situations where the initiator to target coupling is looser 
    >(in this case, due to distance) than is typical for SCSI and Fibre Channel.
    
    
    Mike
    
    


Home

Last updated: Tue Sep 04 01:07:03 2001
6315 messages in chronological order