SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: Avoiding deadlock in iSCSI



    
    
    Somesh,
    
    There is a misunderstanding. You are required to keep order only on a given
    connection.
    SMP or no SMP they are represented by ONE data structure (I hope). You are
    not implying order across connection as it has no meaning to the receiver.
    
    Julo
    
    somesh_gupta@hp.com on 12/09/2000 21:33:31
    
    Please respond to somesh_gupta@hp.com
    
    To:   David.Robinson@EBay.Sun.COM, ips@ece.cmu.edu
    cc:    (bcc: Julian Satran/Haifa/IBM)
    Subject:  RE: Avoiding deadlock in iSCSI
    
    
    
    
    
    
    > -----Original Message-----
    > From: David.Robinson@EBay.Sun.COM [mailto:David.Robinson@EBay.Sun.COM]
    > Sent: Monday, September 11, 2000 6:36 PM
    > To: ips@ece.cmu.edu
    > Subject: RE: Avoiding deadlock in iSCSI
    >
    >
    > Thanks for the information, I think part of my confusion is
    > the difference
    > in mapping SCSI from a datagram protocol onto a reliable
    > stream protocol.
    > In a datagram protocol if the data is sent without the receiver's
    > cooperation, the receiver's buffers may not be adequate and the
    > data must get discarded.  Credits and RTT can be used to handle
    > this case.
    >
    > With a reliable stream transport like TCP, you don't get into
    > this situation because the receiver will never open the TCP
    > window beyond it's buffer capacity. For low amounts of buffering
    > it might not be as efficient as using RTT, but there is no
    > correctness or deadlock issues. Because each sender has its own
    > connection and own flow control they are independantly handled.
    > Likewise with seperate data connections, each is also flow controlled
    > so "unsolicted" data is not an issue.  "Overflow" conditions simply
    > never occur.
    >
    > The only major design criteria is that the sender MUST maintain
    > ordering of data sent on any connection.  Data Dn MUST always be
    > sent before data Dm where n < m. In particular, if unsolicated
    > data and RTT is mixed, the sender cannot send data Dm before it
    > has recieved an RTT for data Dn if both are to use the same
    > connection.
    
    If we are using multiple connections, this requirement has
    ramifications on the implementation on SMP systems. The initiator
    will be sending command on one connection, and data on another
    connection, which are two different actions. To ensure ordering
    would require taking a lock across the two actions (or otherwise
    ensuring that they always happen on the same CPU). This would
    be quite expensive actually.
    
    >
    >    -David
    >
    > > I think people have been meaning "unsolicited data" to
    > really mean data sent
    > > to a receiver without that receiver having first indicated
    > that there is
    > > enough buffering to hold the data.  For initiators acting
    > as receivers they
    > > have to verify this before they initiate the command (not
    > enough space for
    > > the whole command?  Then break up the command.)  For
    > Targets this requires
    > > something like a credit mechanism with RTTs being used.
    > >
    > > So there is an "unsolicited command" problem and an
    > "unsolicited data"
    > > problem.  In both cases the sender creates the problem by not first
    > > reserving with the receiver enough resources for the commands/data.
    > >
    > > In the command case there is no SCSI mechanism to reserve
    > resources (QUEUE
    > > FULL is used to indicate overflows).  Historically it has
    > been assumed that
    > > queues of commands do not overflow often in practice.  In
    > reality initiators
    > > have often artificially limited the number of commands they
    > are willing to
    > > try and queue at the target in order to avoid this rejection (a loss
    > > opportunity in my mind).
    > >
    > > In the data case there is no "DATA QUEUE FULL" - instead,
    > an explicit credit
    > > model of some sort is used to indicate the receiver has
    > reserved space for
    > > the data (REQs in parallel SCSI, BB credits in Fibre
    > Channel).  In this case
    > > the assumption was that data overflows would occur a lot otherwise.
    > >
    > > You can solve these problems by rejecting the overflow
    > cleanly (as SCSI does
    > > with commands), which is low latency and works well under
    > light loads.  Or
    > > you can do credits.  Credits add latency, or get you into
    > the problem of
    > > credit allocation, which can be optimized for light load
    > (over allocate
    > > credits) or heavy loads (allocate only what you have), but
    > not both at once.
    > >
    > > Historically, SCSI has used rejection for commands and
    > credits for data,
    > > optimized for heavy loads.  But this is only a T10 given
    > rule, not a God
    > > given rule (although some of us who have served on T10 can
    > get that confused
    > > at times :-)).
    > >
    > > Hope this helps.
    > >
    > > Jim
    > >
    > >
    > >
    > >
    > >
    > > cases there are well known mechanisms to reserve the
    > >
    > > -----Original Message-----
    > > From: David Robinson [mailto:David.Robinson@EBay.Sun.COM]
    > > Sent: Monday, September 11, 2000 3:35 PM
    > > To: ips@ece.cmu.edu
    > > Subject: Re: Avoiding deadlock in iSCSI
    > >
    > >
    > > I think in following this discussion the terminology has been
    > > confusing me.  When I read "unsolicited data" I interpreted that
    > > to mean data for which no command has yet been sent. In general
    > > I consider that to be a bug and the receiver should just drop the
    > > data on the floor.  The only possible scenerio where it might
    > > not be a bug is if a command was sent on one connection and the
    > > data on the data connection arrived first, thus it is unsolicited.
    > > My first assumption is that the sender would not send commands
    > > C1 and C2 and data D2 and D1 on the same connection. Doing that
    > > creates nasty ordering problems we want to avoid.  So if the
    > > receiver simply allows the data connection TCP window to shrink
    > > the unsolicted data will flow control to a stop until the command
    > > queue catches up.  With multiple data connections, some may flow
    > > control but the active command will be able to make progress on
    > > one connection. This may not be the most efficient mechanism but
    > > it is "safe".  Preferably the data will either follow the command
    > > on the same data/command connection or the sender will request a
    > > RTT (aka R2T). It is also a sender bug to request a connection
    > > for data transfer that it has already sent "unsolicited" data.
    > >
    > > Unless my assumptions and definitions are wrong, I don't
    > see the issue.
    > >
    > >  -David
    > >
    > > > The problem:
    > > >
    > > > iSCSI, as currently spec'ed, allows SCSI commands and data to be
    > > > interleaved fairly freely on a TCP connection. A target that stops
    > > > reading from a TCP connection to avoid reading more
    > command packets
    > > > also prevents itself from reading data packets.  Those
    > data packets
    > > > may be criticial to making progress on the currently executing
    > > > command.
    > > >
    > > > Note the issue appears with one TCP connection for
    > control and data
    > > > and even appears in many of the multiple connection schemes.
    > > >
    > > > Data in iSCSI comes in two forms:
    > > >
    > > >     1) solicited - data requested by target via RTT
    > > >                  - data requested by initiator via a SCSI command
    > > >     2) unsolicited - data sent by initiator without having
    > received an
    > > RTT
    > > >
    > > > The analysis below assumes that unsolicited data travels
    > over the same
    > > > TCP connection as SCSI commands. Otherwise, you run the
    > risk of receiving
    > > > unsolicited data before the relevant SCSI command (thus making
    > > > implementations more complex).
    > > >
    > > > Four solutions:
    > > >
    > > > 1) Don't overflow the command queue (i.e. use credits)
    > > >     - and what do you do if a misbehaving initiator overflows
    > > >         your command queue anyway? Drop the connection?
    > > >
    > > >     - requires you to reserve resources per initiator. some people
    > > >         may want to overcommit
    > > >
    > > > 2) Allow dropping of SCSI commands when queue fills
    > > >     - how do you clean up after a dropped SCSI command?
    > > >         - there may be other commands in the pipeline
    > > >
    > > >     One approach: On command drop, the target enters an error
    > > >     state. While in the error state, all newly received commands
    > > >     terminate with an error until the initiator explicitly clears
    > > >     the error state using a "clear error state" message.
    > > >
    > > >     You might think that TASK SET FULL and ACA mechanisms from SCSI
    > > >         could be used to attack this problem. However,
    > TASK SET FULL
    > > errors
    > > >     don't trigger ACA (in my reading of the SAM). Also, ACA is only
    > > >     triggered by the current enabled command, not by random commands
    > > >     entered into the task set.
    > > >
    > > > 3) Put solicited data on a dedicated TCP connection. Require that
    > > > unsolicited data MUST follow the command, ideally in the
    > same iSCSI
    > > > PDU
    > > >
    > > > 4) (Do it like NFS) Make all transfers from initiator to target
    > > > unsolicited. Make sure unsolicited data follows the command
    > > > immediately.
    > > >
    > > >
    > > > Of all the options, #1 and #4 sound the easiest to
    > implement. #2 is more
    > > > sophisticated than #1. #3 is just plain clever but that's
    > rarely a good
    > > > thing. :)  #4 has large ramifications on current SCSI
    > target designs.
    > > >
    > > > -Costa
    >
    
    
    
    
    


Home

Last updated: Tue Sep 04 01:07:18 2001
6315 messages in chronological order