SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    iSCSI: RE: Framing Discussion



    I think I now understand the assumptions regarding the single-bit TCP option
    for indicating the presence of a PDU header in the current segment (thanks
    Costa). It boils down to a hardware implementation that is tuned for the
    best case (i.e. small amount of re-assembly memory on the NIC to park
    fragmented PDU's with the assumption that the next aligned PDU is coming
    shortly) and a method for dropping things into software when we are talking
    to an ill-behaved NIC. Okay, I'll buy that.
    
    Now, I have another dumb question. For direct data placement, the
    discussions have been centered mostly around the need for alignment when
    parsing PDU's on the receiving iSCSI TOE. One potential issue that has not
    been discussed is the problem of how to handle re-transmission on the
    sending iSCSI TOE. 
    
    From previous discussions, I am assuming that our goal is to avoid having a
    network BWDP worth of memory on the NIC. The receiver can avoid this memory
    by recovering PDU alignment in the TCP stream and using the self-describing
    headers in the wire protocol (either iSCSI offsets or a RDMA shim layer) to
    put the data directly in the buffer cache. On the sending side, we can DMA
    directly from iSCSI descriptor CDB's into the TCP pipe using a hardware
    path. But, unless we keep all of those un-acked TCP segment buffers around
    in the NIC, it will be difficult to recover the context when we have to
    re-transmit. 
    
    Let's suppose that we have an iSCSI TCP connection in which we have multiple
    outstanding I/O's. Thus, the byte stream has interleaved within it commands
    and data from different I/O's. When we detect a dropped segment either
    through normal TCP congestion or via SACK, how do we map the missing byte
    block to the appropriate context? If we keep the segments around, then we
    could match the missing segment easily and re-transmit. But that would
    require the NIC to implement a BWDP's worth of transmit buffer memory. 
    
    To have the iSCSI TOE re-transmit directly from the buffer cache, it seems
    that we would need some sort of context that would allow us to map a byte
    window to a specific, meaningful point somewhere in the middle of a CDB
    context. Essentially, you need enough context to be able to re-construct the
    TCP fifo since the memory in this fifo has since been effectively
    re-allocated. Maybe this isn't too hard, but it sure sounds like a difficult
    problem for hardware to solve. But, as the software folks around here keep
    telling me, "it's just gates" ;-)
    
    -Wayland
    


Home

Last updated: Tue Sep 04 01:06:01 2001
6315 messages in chronological order