|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI: Out of order commandsMallikarjun, Could you comment on the concept of OOO on the ErrorRecoveryLevel>0. I had thought that "in order delivery" was part of the detection of missing PDUs and needed for timely Recovery. I was wondering if this changes the way we would use the ExpCmdSN, etc. I think your opinions on this part of the OOO discussion would be valuable. For example, how would you contrast the differences in detecting a problem and recovering from that problem etc., today vrs the OOO approach (if any). . . . John L. Hufferd Senior Technical Staff Member (STSM) IBM/SSG San Jose Ca Main Office (408) 256-0403, Tie: 276-0403, eFax: (408) 904-4688 Home Office (408) 997-6136, Cell: (408) 499-9702 Internet address: hufferd@us.ibm.com "Mallikarjun C." <cbm@rose.hp.com>@ece.cmu.edu on 11/07/2001 09:41:05 AM Please respond to cbm@rose.hp.com Sent by: owner-ips@ece.cmu.edu To: Santosh Rao <santoshr@cup.hp.com>, ips@ece.cmu.edu cc: Subject: Re: iSCSI: Out of order commands Santosh, I have only one comment on your responses. > Even a single connection target *MUST* implement a scoreboard. The > reason being that it can see out-of-order arrival of commands due to > commands being dropped on digest errors. In such a case, it must block > further command processing until holes are filled. I made two convenient assumptions if you noticed, :-), one of which is that target forces session recovery on *any* error that it sees (ErrorRecoveryLevel=0) - including a dropped command due to a digest error. With that assumption, a target can afford not to implement a scoreboard. As I said in a private note, I guess what primarily bothers me about OOO commands on a connection is that it requires the receiver to undo this "optimization" on its end - most notably on a single connection. TCP experts may comment on how/if they dealt with a similar issue. OTOH, you had some valid comments on exceptions to ordering during connection recovery. Perhaps we can move on by making Julian's proposed stipulation a SHOULD.... -- Mallikarjun Mallikarjun Chadalapaka Networked Storage Architecture Network Storage Solutions Organization MS 5668 Hewlett-Packard, Roseville. cbm@rose.hp.com Santosh Rao wrote: > > Mallikarjun, > > Some comments below. > > Regards, > Santosh > > "Mallikarjun C." wrote: > > > > Rod and Julian, > > > > This has been an interesting thread of discussion. Some > > comments - > > > > 1.My first reaction was - allowing out-of-order command > > transmission on the same connection deprives targets of > > an implementation choice. Targets which support only > > single-connection sessions and only support session > > recovery (reasonable assumptions in my mind) can no > > longer afford *not to* implement a command scoreboard. > > Even a single connection target *MUST* implement a scoreboard. The > reason being that it can see out-of-order arrival of commands due to > commands being dropped on digest errors. In such a case, it must block > further command processing until holes are filled. > > Thus, there is no getting away from implementing a sequencer at the > target. Given this, I think it is unreasonable to restrict initiator > implementation flexibility by imposing a strict ordering requirement > within the connection. > > > 2.Any end-node efficiency that is sought to be achieved > > by transmitting CmdSNs out-of-order from the initiator > > would be lost on the other end-node, since the target > > now must wait for re-ordering the commands. > > It has to handle this situation anyway to deal with holes caused by > digest errors. This scenario occurs even with initiators that issue > commands in order. > > > > > 3.The flipside is that out-of-order transmission saves > > link badwidth (albeit at the expense of end-node efficiency), > > compared to idling the link waiting for outbound DMA. > > We have to determine if this is a reasonable trade-off. > > > > 4.I can see Rod's point that prefetching all immediate > > data can be a burden on the NIC resources. But, two > > questions - > > - could the NIC not use unsolicited separate data > > PDUs in these cases? [ I realize that InitialR2T > > has to be "no" to let it happen... ] > > - could the NIC have a memory architecture that > > allows data prefetching for the next command (so > > this is a non-issue from the protocol perspective)? > > This scheme incurs one DMA delay for every new > > burst of commands. > > > > 5.Another (perhaps radical at this point) option is to do > > away with immediate unsolicited data, to stick only with > > separate unsolicited data. I would personally be okay > > with the choice, particularly if this feature (that > > helps software implementations) starts making hardware > > design complicated/expensive. > > > > So, to summarize - > > > > option immediate allow > > data in spec? out-of-order? > > > > (A) (5) above no no > > (B) No real reason to do this. no yes > > (C) (4) above yes no > > (D) pros & cons (1), (2) & (3) yes yes > > > > >From the arguments I heard so far, I am leaning towards > > option A, and option C in that order. > > > > Comments? > > -- > > Mallikarjun > > > > Mallikarjun Chadalapaka > > Networked Storage Architecture > > Network Storage Solutions Organization > > MS 5668 Hewlett-Packard, Roseville. > > cbm@rose.hp.com > > > > Rod Harrison wrote: > > > > > > Julian, > > > > > > I don't understand what you are proposing here, what do you mean by > > > "multiplexed" DMA? > > > > > > The problem is that the DMAs take some time, the more there are > > > queued the longer the last DMAs queued take to complete. Some commands > > > require DMAs to complete before they can be sent, i.e. Writes with > > > immediate data, some commands do not, i.e. Reads and writes with no > > > immediate data. The iSCSI HBA wants to be able to send commands as > > > soon a possible, which for a read after a write can be before the > > > write's DMA has completed. Maintaining an ordered queue for commands > > > to be sent on the HBA is expensive and redundant since the target > > > already knows how to queue commands before committing them to its SCSI > > > layer. > > > > > > The iSCSI HBA and its host driver are not at liberty to change the > > > order of commands from the OS, but the DMAs those commands need are > > > unlikely to complete in the same order, and as I mentioned some > > > commands need no DMA. If the HBA can't send commands out of CmdSN > > > order it has to maintain an ordered queue of commands waiting to be > > > sent, and potentially buffer a lot of data. For an HBA this makes > > > immediate data almost impossible to support. > > > > > > I don't see the problem with allowing out of order commands given > > > that the target already has to deal with very similar problems. I > > > think we are getting in to the area of implementation choices here, > > > which is inappropriate for a specification. > > > > > > - Rod > > > > > > -----Original Message----- > > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of > > > Julian Satran > > > Sent: Monday, November 05, 2001 10:06 PM > > > To: ips@ece.cmu.edu > > > Subject: Re: iSCSI: Out of order commands, was current UNH Plugfest > > > > > > Rod, > > > > > > I don't see any reason why DMA operations cant be "multiplexed" with > > > commands. > > > If you have scheduled a long outbound DMA you are doomed regardless of > > > the > > > command ordering. > > > And if you have scheduled DMA operations piecemeal then you can insert > > > your commands in correct order. > > > > > > Julo > > > > > > "Rod Harrison" <rod.harrison@windriver.com> > > > 05-11-01 20:48 > > > Please respond to "Rod Harrison" > > > > > > To: Julian Satran/Haifa/IBM@IBMIL, <ips@ece.cmu.edu> > > > cc: > > > Subject: iSCSI: Out of order commands, was current UNH > > > Plugfest > > > > > > [ Subject changed ] > > > > > > Julian, > > > > > > The ordering difference is introduced between the > > > host > > > side driver > > > and the iSCSI HBA. The host side driver must present SCSI commands to > > > the HBA in the order they are received from the OS to prevent read > > > after write dependency failures. The HBA might reorder the commands > > > depending on when DMA completes. The reordering can't be done ahead of > > > time in the host driver since it doesn't know how long each DMA might > > > take. As long as the HBA assigns CmdSN in the order it receives > > > commands the desired host ordering is preserved. > > > > > > - Rod > > > > > > -----Original Message----- > > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of > > > Julian Satran > > > Sent: Monday, November 05, 2001 12:35 AM > > > To: ips@ece.cmu.edu > > > Subject: RE: iSCSI: current UNH Plugfest > > > > > > Rod, > > > > > > I all examples give the point I find hard to understand is why is the > > > ordering on the wire different from the presentation order to the > > > initiator. You can get as many overlaps as you want by presenting the > > > commands to the initiator in the desired order. > > > What we are considering here is the case in which you want to ship in > > > an > > > order different than the one you present the commands. > > > > > > Julo > > > > > > "Rod Harrison" <rod.harrison@windriver.com> > > > Sent by: owner-ips@ece.cmu.edu > > > 04-11-01 04:42 > > > Please respond to "Rod Harrison" > > > > > > To: "Barry Reinhold" <bbrtrebia@mediaone.net>, "Dave > > > Sheehy" > > > <dbs@acropora.rose.agilent.com>, "IETF IP SAN Reflector" > > > <ips@ece.cmu.edu> > > > cc: > > > Subject: RE: iSCSI: current UNH Plugfest > > > > > > Barry, > > > > > > In general I agree but I don't think this is as much > > > of a > > > corner case > > > as it at first appears. Targets will have code very similar to that > > > needed to handle out of order commands to deal with digest errors. > > > Targets also need to queue commands whilst waiting for both solicited > > > and unsolicited data to arrive. Queuing out of order commands seems > > > little extra work. > > > > > > From an initiators point of view there are > > > efficiency, > > > and probably > > > performance gains to be had from sending commands out of order. Bob > > > Russell gave the example of a read being sent whilst write data DMA is > > > happening, and a similar situation can arise with DMA for writes > > > overtaking that of earlier writes if the initiator has multiple DMA > > > engines. In this case the initiator might be forced to let the wire go > > > idle if it can't send the data from completed DMAs as soon as > > > possible. > > > > > > We already have a command queue at the target to > > > enforce > > > correct > > > serialisation of commands, doing the same thing at the initiator is > > > redundant. > > > > > > Finally, I don't believe we should be writing a > > > standard > > > to work > > > around poor coding and test coverage, especially at the cost of > > > potential efficiency gains. > > > > > > I agree with Dave and Santosh that commands being > > > sent > > > out of order > > > on a single session should be allowed by the standard. > > > > > > - Rod > > > > > > -----Original Message----- > > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of > > > Barry Reinhold > > > Sent: Friday, November 02, 2001 5:24 PM > > > To: Dave Sheehy; IETF IP SAN Reflector > > > Subject: RE: iSCSI: current UNH Plugfest > > > > > > Using features such as out of order command delivery on a connection > > > tend to > > > be the sort of things that lead to interoperability problems. It is > > > unexpected and probably going to hit poorly tested code paths even if > > > the > > > standard is written to allow it. > > > > > > >-----Original Message----- > > > >From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf > > > Of > > > >Dave Sheehy > > > >Sent: Friday, November 02, 2001 4:19 PM > > > >To: IETF IP SAN Reflector > > > >Subject: Re: iSCSI: current UNH Plugfest > > > > > > > > > > > > > > > >> 3. Can commands be sent out of order on the same connection? > > > >> > > > >> The behavior of targets is clearly specified in Section 2.2.2.3 > > > on > > > >> page 25 of draft 8, which says: > > > >> "Except for the commands marked for immediate delivery the > > > iSCSI > > > >> target layer MUST eliver the commands for execution in the > > > order > > > >> specified by CmdSN." > > > >> > > > >> Section 2.2.2.3 on page 26 of draft 8 also says: > > > >> "- CmdSN - the current command Sequence Number advanced by 1 > > > on > > > >> each command shipped except for commands marked for immediate > > > >> delivery." > > > >> but the meaning of the term "shipped" is vague, and does not > > > >> necessarily > > > >> require that the PDUs arrive on the other end of a TCP > > > connection > > > >> in the same order that the CmdSN values were assigned to these > > > PDUs. > > > >> > > > >> Some initiators have been designed to send commands out of CmdSN > > > >> order on one connection. Consider the situation where there is > > > only > > > >> one connection and a high-level dispatcher creates a PDU for a > > > SCSI > > > >> command that involves writing immediate data to the target. > > > This PDU > > > >> is enqueued to a lower-level layer which has to setup, start, > > > and > > > >> wait-for a DMA operation to move the immediate data into an > > > onboard > > > >> buffer before the PDU can be put onto the wire. While this is > > > >> happening, the dispatcher creates another unrelated PDU for a > > > SCSI > > > >> read command (for example), and when this PDU is passed to the > > > >> lower-level layer it can be sent immediately, ahead of the > > > previous > > > >> write PDU and therefore out of order on this connection. > > > >> > > > >> The standard clearly allows this to happen if the two PDUs were > > > sent > > > >> on different connections, and seems to imply that this can also > > > happen > > > >> when the two PDUs are sent on the same connection. > > > >> > > > >> The suggestion is to put in the standard an explicit statement > > > that > > > >> this is allowed or not allowed, as appropriate. > > > >> > > > >> If this is allowed, such a statement would avoid the erroneous > > > >> assumption being made by some target implementers that within a > > > single > > > >> connection, commands will arrive in order. > > > >> > > > >> If this is not allowed, such a statement would avoid the > > > erroneous > > > >> assumption being made by some initiator implementers that within > > > a > > > >> single connection, commands can be put on the wire out of order. > > > >> > > > >> +++ > > > >> > > > >> will add an explicit statement saying that this behaviour is > > > forbidden. > > > >> 2.2.2.1 will contain: > > > >> > > > >> On any given connection, the iSCSI initiator MUST send the > > > >commands in the > > > >> order specified by CmdSN. > > > >> > > > >> +++ > > > > > > > >Why do you feel this behavior should be forbidden? Targets already > > > have to > > > >order commands across the session. I don't see why it's a problem to > > > extend > > > >that to the connection as well. I, for one, believe we should take > > > >a liberal > > > >stance on this. > > > > > > > >Dave Sheehy > > > > > > -- > ################################## > Santosh Rao > Software Design Engineer, > HP-UX iSCSI Driver Team, > Hewlett Packard, Cupertino. > email : santoshr@cup.hp.com > Phone : 408-447-3751 > ##################################
Home Last updated: Thu Nov 08 22:17:33 2001 7678 messages in chronological order |