|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: Avoiding deadlock in iSCSIThat is a big issue - if what I hear is true. It isn't even related to latency but with having a queue. If you have a long running command and a queue of 1 (awaiting execution) you can have this error. I was (mistakenly) confident that it is handled by ACA. Is it safe to asume that T10 will handle it? Julo Jim McGrath <Jim.McGrath@quantum.com> on 12/09/2000 23:09:00 Please respond to Jim McGrath <Jim.McGrath@quantum.com> To: "'csapuntz@cisco.com'" <csapuntz@cisco.com>, Jim McGrath <Jim.McGrath@quantum.com> cc: ips@ece.cmu.edu (bcc: Julian Satran/Haifa/IBM) Subject: RE: Avoiding deadlock in iSCSI I agree with Stephen that this is a T10 issue. BTW, it has been discussed before in T10. The issue is that for commands ending with a CHECK CONDITION (error) status the ACA mechanism provides a very targeted means of solving this problem (use ACA for your ORDERED commands and you are fine). The QErr bit in the Control Mode page provides a similar global capability for the entire LUN. The problem is that there are other status conditions that may want to be treated in a similar manner, but which (to my knowledge) are not. The first one we looked at was RESERVATION CONFLICT. The second is QUEUE FULL. Both may create a desire to pause execution from the command queue. The current answer is simply not to get into this problem to begin with. That is, if you are sending an ORDERED command, make sure you do not send out the next command (ORDERED or UNORDERED) until you (the initiator) know that the first command has been received by the target. You clearly know if a command has been received by an explicit acknowledgement (as in parallel SCSI) or by a subsequent action (a data transfer or status transfer) that only makes sense if the command was received. You can argue that this degrades performance. For most of the interconnects considered up to now, with relatively low latency, this has not bee as much of a concern. In addition, there has been a feeling that the cases were ORDERED commands are used are so few that you either accepted the degradation (such as it is) when you use them, or you do a vendor unique approach around the problem. If this is considered to be a big issue, then I'd raise it to T10. I suggest you propose a solution that mimics the existing ACA and QErr solutions, since that will be easiest and quickest to adopt. One possibility is adding the status QUEUE FULL to the things that generate ACAs. Another is to propose a new QACA bit that can be used like ACA but for Queue Full (so users can continue to use the old ACA bit for backward compatibility). Since you either introduce a compatibility issue (the first approach) or need a bit (the second approach), the answer is not cost free, so T10 will want some justification for creating a compatibility issue or (more likely) to spend the bit. Jim -----Original Message----- From: csapuntz@cisco.com [mailto:csapuntz@cisco.com] Sent: Monday, September 11, 2000 10:33 PM To: Jim McGrath Cc: ips@ece.cmu.edu; csapuntz@cisco.com Subject: Re: Avoiding deadlock in iSCSI > Note that SCSI targets, when faced with getting a command queue full, do not > stop reading from the interconnect. If more commands are received then they > respond with a QUEUE FULL status. If data is received then they receive > that data without regard for the status of the command queue (as long as it > is data for an already queued command). This eliminates the potential for > deadlock between command and data queues. Jim, I believe there is problem with the current SCSI behavior. Consider the following scenario for a host with a pathological queue of 1 1) Initiator sends command 1 (ORDERED attribute) 2) Initiator sends command 2 (ORDERED attribute) 3) Initiator sends command 3 (ORDERED attributge) 4) Target reads command 1 5) Target reads command 2 6) Target returns queue full for command 2 7) Command 1 completes 8) Target reads command 3 9) Target executes command 3 We have just violated the ordering constraints of the application by doing command 3 before command 2. > Potentially you can have a situation where multiple commands already at the > target have only part of their data transmitted, with the remainder still at > the initiator(s), and then run out of buffer space for data. If the target > uses a credit model to pace the reception of data, it can also make sure > this never happens. Unsolicited data, even for commands already queued, can > end up creating this deadlock - which is why unsolicited data systems either > have to have a tight limit on the resources it can use (e.g. low login BB > credit in Fibre Channel terms) or some sort of clean (i.e. not IO > terminating) rejection mechanism from target to initiator (like in USB). Unsolicited data is NOT a problem with the current iSCSI spec. We allow the target to always drop data and request data transfers with an RTT. -Costa
Home Last updated: Tue Sep 04 01:07:18 2001 6315 messages in chronological order |