|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: Avoiding deadlock in iSCSIThere appear to be some problems with our understanding of the ordering of SCSI commands. The architected ordering is ONLY with respect to the command stream from a single initiator to a single logical unit, and then only when specified by an ordering attribute. Streams among different logical units from the same initiator are never required to provide ordering. Streams to the same logical unit from different initiators are never required to be ordered. Streams of task management activities are never required to be ordered with respect to commands. The SCSI wedge drivers, OS file systems, RAIDs, and raw access drivers such as database file systems all deal very well with that architecture and exploit the throughput gains that are associated with selectively restricted ordering. Certainly, if all aspects of all streams are explicitly ordered, the architectural requirement will be forced to be met. However, such restrictions are a significant limitation on the flexibility of the implementation and are certainly not required by the SCSI architecture. Requiring them in iSCSI is probably equally limiting. Bob > -----Original Message----- > From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com] > Sent: Tuesday, September 12, 2000 11:11 PM > To: ips@ece.cmu.edu > Subject: RE: Avoiding deadlock in iSCSI > > > > > Somesh, > > There is a misunderstanding. You are required to keep order > only on a given > connection. > SMP or no SMP they are represented by ONE data structure (I > hope). You are > not implying order across connection as it has no meaning to > the receiver. > > Julo > > somesh_gupta@hp.com on 12/09/2000 21:33:31 > > Please respond to somesh_gupta@hp.com > > To: David.Robinson@EBay.Sun.COM, ips@ece.cmu.edu > cc: (bcc: Julian Satran/Haifa/IBM) > Subject: RE: Avoiding deadlock in iSCSI > > > > > > > > -----Original Message----- > > From: David.Robinson@EBay.Sun.COM > [mailto:David.Robinson@EBay.Sun.COM] > > Sent: Monday, September 11, 2000 6:36 PM > > To: ips@ece.cmu.edu > > Subject: RE: Avoiding deadlock in iSCSI > > > > > > Thanks for the information, I think part of my confusion is > > the difference > > in mapping SCSI from a datagram protocol onto a reliable > > stream protocol. > > In a datagram protocol if the data is sent without the receiver's > > cooperation, the receiver's buffers may not be adequate and the > > data must get discarded. Credits and RTT can be used to handle > > this case. > > > > With a reliable stream transport like TCP, you don't get into > > this situation because the receiver will never open the TCP > > window beyond it's buffer capacity. For low amounts of buffering > > it might not be as efficient as using RTT, but there is no > > correctness or deadlock issues. Because each sender has its own > > connection and own flow control they are independantly handled. > > Likewise with seperate data connections, each is also flow > controlled > > so "unsolicted" data is not an issue. "Overflow" conditions simply > > never occur. > > > > The only major design criteria is that the sender MUST maintain > > ordering of data sent on any connection. Data Dn MUST always be > > sent before data Dm where n < m. In particular, if unsolicated > > data and RTT is mixed, the sender cannot send data Dm before it > > has recieved an RTT for data Dn if both are to use the same > > connection. > > If we are using multiple connections, this requirement has > ramifications on the implementation on SMP systems. The initiator > will be sending command on one connection, and data on another > connection, which are two different actions. To ensure ordering > would require taking a lock across the two actions (or otherwise > ensuring that they always happen on the same CPU). This would > be quite expensive actually. > > > > > -David > > > > > I think people have been meaning "unsolicited data" to > > really mean data sent > > > to a receiver without that receiver having first indicated > > that there is > > > enough buffering to hold the data. For initiators acting > > as receivers they > > > have to verify this before they initiate the command (not > > enough space for > > > the whole command? Then break up the command.) For > > Targets this requires > > > something like a credit mechanism with RTTs being used. > > > > > > So there is an "unsolicited command" problem and an > > "unsolicited data" > > > problem. In both cases the sender creates the problem > by not first > > > reserving with the receiver enough resources for the > commands/data. > > > > > > In the command case there is no SCSI mechanism to reserve > > resources (QUEUE > > > FULL is used to indicate overflows). Historically it has > > been assumed that > > > queues of commands do not overflow often in practice. In > > reality initiators > > > have often artificially limited the number of commands they > > are willing to > > > try and queue at the target in order to avoid this > rejection (a loss > > > opportunity in my mind). > > > > > > In the data case there is no "DATA QUEUE FULL" - instead, > > an explicit credit > > > model of some sort is used to indicate the receiver has > > reserved space for > > > the data (REQs in parallel SCSI, BB credits in Fibre > > Channel). In this case > > > the assumption was that data overflows would occur a lot > otherwise. > > > > > > You can solve these problems by rejecting the overflow > > cleanly (as SCSI does > > > with commands), which is low latency and works well under > > light loads. Or > > > you can do credits. Credits add latency, or get you into > > the problem of > > > credit allocation, which can be optimized for light load > > (over allocate > > > credits) or heavy loads (allocate only what you have), but > > not both at once. > > > > > > Historically, SCSI has used rejection for commands and > > credits for data, > > > optimized for heavy loads. But this is only a T10 given > > rule, not a God > > > given rule (although some of us who have served on T10 can > > get that confused > > > at times :-)). > > > > > > Hope this helps. > > > > > > Jim > > > > > > > > > > > > > > > > > > cases there are well known mechanisms to reserve the > > > > > > -----Original Message----- > > > From: David Robinson [mailto:David.Robinson@EBay.Sun.COM] > > > Sent: Monday, September 11, 2000 3:35 PM > > > To: ips@ece.cmu.edu > > > Subject: Re: Avoiding deadlock in iSCSI > > > > > > > > > I think in following this discussion the terminology has been > > > confusing me. When I read "unsolicited data" I interpreted that > > > to mean data for which no command has yet been sent. In general > > > I consider that to be a bug and the receiver should just drop the > > > data on the floor. The only possible scenerio where it might > > > not be a bug is if a command was sent on one connection and the > > > data on the data connection arrived first, thus it is > unsolicited. > > > My first assumption is that the sender would not send commands > > > C1 and C2 and data D2 and D1 on the same connection. Doing that > > > creates nasty ordering problems we want to avoid. So if the > > > receiver simply allows the data connection TCP window to shrink > > > the unsolicted data will flow control to a stop until the command > > > queue catches up. With multiple data connections, some may flow > > > control but the active command will be able to make progress on > > > one connection. This may not be the most efficient mechanism but > > > it is "safe". Preferably the data will either follow the command > > > on the same data/command connection or the sender will request a > > > RTT (aka R2T). It is also a sender bug to request a connection > > > for data transfer that it has already sent "unsolicited" data. > > > > > > Unless my assumptions and definitions are wrong, I don't > > see the issue. > > > > > > -David > > > > > > > The problem: > > > > > > > > iSCSI, as currently spec'ed, allows SCSI commands and > data to be > > > > interleaved fairly freely on a TCP connection. A > target that stops > > > > reading from a TCP connection to avoid reading more > > command packets > > > > also prevents itself from reading data packets. Those > > data packets > > > > may be criticial to making progress on the currently executing > > > > command. > > > > > > > > Note the issue appears with one TCP connection for > > control and data > > > > and even appears in many of the multiple connection schemes. > > > > > > > > Data in iSCSI comes in two forms: > > > > > > > > 1) solicited - data requested by target via RTT > > > > - data requested by initiator via a > SCSI command > > > > 2) unsolicited - data sent by initiator without having > > received an > > > RTT > > > > > > > > The analysis below assumes that unsolicited data travels > > over the same > > > > TCP connection as SCSI commands. Otherwise, you run the > > risk of receiving > > > > unsolicited data before the relevant SCSI command (thus making > > > > implementations more complex). > > > > > > > > Four solutions: > > > > > > > > 1) Don't overflow the command queue (i.e. use credits) > > > > - and what do you do if a misbehaving initiator overflows > > > > your command queue anyway? Drop the connection? > > > > > > > > - requires you to reserve resources per initiator. > some people > > > > may want to overcommit > > > > > > > > 2) Allow dropping of SCSI commands when queue fills > > > > - how do you clean up after a dropped SCSI command? > > > > - there may be other commands in the pipeline > > > > > > > > One approach: On command drop, the target enters an error > > > > state. While in the error state, all newly > received commands > > > > terminate with an error until the initiator > explicitly clears > > > > the error state using a "clear error state" message. > > > > > > > > You might think that TASK SET FULL and ACA > mechanisms from SCSI > > > > could be used to attack this problem. However, > > TASK SET FULL > > > errors > > > > don't trigger ACA (in my reading of the SAM). > Also, ACA is only > > > > triggered by the current enabled command, not by > random commands > > > > entered into the task set. > > > > > > > > 3) Put solicited data on a dedicated TCP connection. > Require that > > > > unsolicited data MUST follow the command, ideally in the > > same iSCSI > > > > PDU > > > > > > > > 4) (Do it like NFS) Make all transfers from initiator to target > > > > unsolicited. Make sure unsolicited data follows the command > > > > immediately. > > > > > > > > > > > > Of all the options, #1 and #4 sound the easiest to > > implement. #2 is more > > > > sophisticated than #1. #3 is just plain clever but that's > > rarely a good > > > > thing. :) #4 has large ramifications on current SCSI > > target designs. > > > > > > > > -Costa > > > > > > >
Home Last updated: Tue Sep 04 01:07:15 2001 6315 messages in chronological order |