|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI Reqts: In-Order DeliveryCharles, Your solution requires a fair amount of tracking of commands based solely on their Client Tags. These Tags are randomly generated but will need to retain sequential order for your scheme. The transport must remember the type of command sent together with their relative placement based only on the Client Tag. In addition, these commands will need to be placed into different categories. Those commands executed out of sequence by means of a bypass flag, those commands that are Task Management commands, and commands affected by these other types of commands. It seems that in large part, these concerns can be met with proper handing of the transport without such laborious sorting of the Client Tags. The out-of-sequence or bypass flag also depends on the transport sorting the Client Tag. In addition to disabling flow-control, this technique of not incrementing the serialization of these commands, requires all commands with the same serialization value to be sent on the same connection without acknowledgment, if these commands are also to be kept in sequence. This connection requirement is yet to be specified. Ver 6, Pg 12: "iSCSI may avoid delivering some command to the SCSI layer if so required by some prior SCSI or iSCSI action (e.g., clear task set Task Management request received before all the commands it was supposed to act on)." Here, there seems to be expectations of the iSCSI transport interpreting the content of the SCSI commands. How this is done is not obvious. Is the transport expected to generate SCSI responses? In addition, although iSCSI presently relies on ACA, there are few applications that implement ACA. It would appear for iSCSI to work with the present protocol, significant application changes are required. With the proposal I am suggesting, this is not a problem as all bypassed commands are rejected back to the Initiator. The drivers that implement iSCSI will be required to provide handling for these commands that bypass other commands. The amount of information contained in a rejected command list should be relatively small and these occasions for such Management rare. Without proper handling of these events, there will be 2:00 AM alarm pagers going off. Here in the proposal, sorting CmdSN based on LUN values takes place within a "Barrier List." I can not tell what is implied by these recovery instructions. What is meant by Remove, Release, Drop, Cleanup, Placeholder, and ALL. What is the intended feedback to the initiator for this Clean-up? It would appear the transport works on behalf of the target. In the proposal that I am suggesting, there is no actions within the transport on behalf of the target. All decisions are done either by the Target or the Initiator. None by the transport. The concept is simple. Keep the transport simple. Do not expect the transport to decipher SCSI commands. Do not expect the transport to respond on behalf of the Target. Do not expect the transport to sort pending commands based on LUN value. Do not expect the transport to require SCSI and iSCSI ACA. In the case of session wide serialization, what is good for the goose is also good for the gander. It is important from the prospect of quickly detecting an error and knowing the server state to also use session wide serialization from the server. The technique of replicating Management commands down each connection in addition to changing global commands into specific commands already over burdens the set-aside that must be made to handle these non-serialized management commands. My proposal eliminates the problem of set-aside resources and loss of server state. Rather than silently rejecting commands out-of-sequence, these rejections are reported. Once done, this feature can be used to extract pending commands in a simple and direct manner without burdening the transport. As attempts are made to support the SCSI architecture, rather than increasing the intelligence of the transport, efforts should be made to simplify the transport. The number of fields that the transport must manipulate will be met with complexity and non-uniform implementation. See: http://www.ietf.org/internet-drafts/draft-otis-iscsi-fullack-00.txt Ver 6, Pg 92: "N.B. As an alternative to Logout and reissue commands, the initiator MAY instead reset the target and terminate all outstanding commands with a service response indicating Delivery Subsystem Failure. The initiator MUST perform one of the two actions." ... Ver 6, Pg 93: "The following general mechanism can be used to achieve the effect of ordered delivery for task management commands while enabling the "urgent" delivery that some of them imply and immediate execution of the task management commands without: At Initiator when a relevant task management command is issued: a) if ExpCmdSN is equal to CmdSN skip to step c b) mark all pending commands with a CmdSN field between ExpCmdSN and the current CmdSN and a relevant LUN as candidates for cleanup and retain CmdSN in a "barrier list". c) send the task management command for immediate delivery to the target At initiator when updating ExpCmdSN: a) if the "barrier list" is empty or ExpCmdSN is less than the first entry in the barrier list then skip to step d b) remove the barrier list entry and remove and drop all entries marked for cleanup having a CmdSN field less than ExpCmdSN c) go to step a d) release all queued entries between the old and new ExpCmdSN from the queue At target when receiving a relevant task management command for immediate delivery: a) if ExpCmdSN is equal to CmdSN skip to step c b) mark all pending entries (commands received and placeholders) with a CmdSN field between ExpCmdSN and the current CmdSN as candidates for cleanup and retain CmdSN in a "barrier list" including the referenced LUN (or an ALL marker) c) send the task management command to SCSI for immediate execution At target when updating ExpCmdSN (releasing ordered commands to SCSI): a) if the "barrier list" is empty or ExpCmdSN is less than the first entry in the barrier list then skip to step d b) remove the barrier list entry and remove and drop all entries marked for cleanup and having the same LUN as the barrier entry (any if the barrier is marked ALL) and a CmdSN field less than ExpCmdSN c) go to step a d) release all queued entries between the old and new ExpCmdSN from the queue Note that this scheme will withstand connection recovery." Doug > Hi Santosh: > > Please see below. > > > Charles Monia wrote: > > > > > > (1) MUST provide ordered delivery of SCSI commands from > > > > the initiator to the target in the absence of transport > > > > errors visible to iSCSI (e.g., iSCSI CRC failure, > > > > unexpected TCP connection closure). > > > > > > Does the term "SCSI commands" include task management > > functions as well? If > > > not, it should. > > > > > > Charles, > > > > Could iSCSI use a variant of the approach FCP-2 takes to solve the > > ordering issue for task mgmt error recovery ? > > > > The FCP-2 task management error recovery scheme is : > > - task mgmt function uses CRN 0 > > - task mgmt function is executed immediately with no ordering > > latencies > > - both initiator & target clear all resources that can be cleared > > un-ambiguously. > > - any ambiguous exchanges shall be aborted by the port that > > detects the > > ambiguous state. > > > > In the case of iSCSI, an analogous approach could be : > > - task mgmt function uses immediate delivery flag for the > > task mgmt PDU. > > - task mgmt fn executed immediately avoiding any ordering latencies. > > - initiator & target clear all resources that can be cleared > > un-ambiguously. > > - initiator uses Abort Task to explicitly abort all active outstanding > > I/Os at the time the task mgmt fn was issued to avoid any ambiguous > > stale PDUs of an exchange from appearing at the target. > > > > Such an approach would avoid latencies on the execution of > > the task mgmt > > fn while still flushing out all the stale PDUs upon completion of the > > initiator actions for that task mgmt fn. > > > > The problem is to avoid scenarios where the initiator and target's view of > the task set are out of step. Specifically, we must avoid the > case where an > initiator receives a PDU from a task it believes has been terminated. > > In that respect, the technique you describe above should work for an ABORT > TASK operation. > > In the case of ABORT TASK SET, the function could be emulated by issuing a > series of ABORT TASK requests. For CLEAR TASK SET, an initiator would > probably want to do the individual ABORT TASK operations, followed by a > CLEAR TASK SET to terminate tasks from other initiators. I assume TARGET > RESET and LUN RESET would be emulated in a manner similar to > CLEAR TASK SET. > In all of these cases there may be some "atomicity" side effects caused by > doing things one at a time instead of all at once. > > The only sticky problem is insuring that the CLEAR ACA function > works right. > By that I mean that you don't want to issue the function until all prior > SCSI commands that were in flight when the ACA occurred have been > terminated > with the ACA ACTIVE status. You can't simply replicate the > command on each > connection since you might inadvertently clear a subsequent ACA. (Yes -- I > know these are all edge cases, but we may as well try to get it right.) > Maybe the thing to do is implement the function such that the ACA > interlock > is not cleared until the CLEAR ACA function is sent on all the connections > comprising the session. > > One minor distinction worth noting is that CRN is enforced in the SCSI > layer, whereas cmdSN is enforced in the iSCSI transport. So, a CRN of 0 > doesn't take effect until the transport presents the command to the SCSI > layer for processing. In that case, leapfrogging of PDU ordering never > occurs. > > Incidentally, I've made the tacit assumption that commands on a given > connection are presented to the SCSI layer in order they were sent, > regardless of whether or nor cmdSN was set to 0. I assume the framing > mechanisms that have been discussed for buffer offloading do not > affect this > behavior. I.e., a fully formed PDU slated for immediate delivery won't be > passed to the SCSI layer before a partially complete PDU that was received > earlier. > > If that's true, immediate delivery seems to have no meaning in a > single-connection scenario. What's more, in all cases, the iSCSI layer > doesn't really have to be aware of task management semantics -- unless > someone decides to intermix immediate and sequential commands in a > multi-connection session. Then all bets are off. > > Charles >
Home Last updated: Tue Sep 04 01:04:55 2001 6315 messages in chronological order |