|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI: Out of order commandsSomesh, I see a SHOULD encouraging efficient implementations in this case, and I think it's desirable to do that. As you yourself argued in your last email, this allows implementations to optimize for the most-likely case. IMHO, a MUST here would be going too far (connection recovery scenario pointed out by Santosh) unless we start special casing error recovery cases/multi-connection cases. In effect, that again comes down to a SHOULD for a general n-connection session case. Regards. -- Mallikarjun Mallikarjun Chadalapaka Networked Storage Architecture Network Storage Solutions Organization MS 5668 Hewlett-Packard, Roseville. cbm@rose.hp.com Somesh Gupta wrote: > > I think we should either have it as a MUST or not require > it (at both ends to get the real benefit). SHOULD is one > of those things that leads to implementation > burden and confusion, without perhaps the feature being > used. There are implementation as well as protocol > considerations mixed in here. > > If we are to remove the restriction, we should (SHOULD) > get the maximum benefit from it, rather than to > accomodate an implementation choice. Out of sequence > commands, combined with the possibility of digest errors, > will add substantial complexity on the target side, > without corrosponding benefit in performance. If we change > this to SHOULD, we should also relax the requirement > to present commands on the target side to a SHOULD. > > > -----Original Message----- > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of > > Julian Satran > > Sent: Wednesday, November 07, 2001 10:00 AM > > To: ips@ece.cmu.edu > > Subject: Re: iSCSI: Out of order commands > > > > > > Mallikarjun, > > > > I did not see a SINGLE performance improvement that results from OOO > > shipping. > > I would be bad engineering to give away the "no-deadlock" mechanism we > > have now for nothing. > > I have also the impression that the point about deadlock that I keep > > repeating is ignored or not understood. > > As we stand today commands can be shipped with Immediate data or without > > and an implementer determined > > to squeeze maximum bandwidth and overlap command start with delivery will > > choose not to work with immediate data > > (as you have pointed out) while a low performance software implementation > > will use immediate data to minimize CPU cycles consumed. However both > > will be guaranteed to work without deadlock as source and sink use the > > same ordering. > > Recovery is still a low probability event and should be handled with a > > different set of considerations in mind. > > As for the strictness of the recommendation - yes we could settle on > > SHOULD. > > > > Julo > > > > > > > > > > "Mallikarjun C." <cbm@rose.hp.com> > > Sent by: owner-ips@ece.cmu.edu > > 07-11-01 19:41 > > Please respond to cbm > > > > > > To: Santosh Rao <santoshr@cup.hp.com>, ips@ece.cmu.edu > > cc: > > Subject: Re: iSCSI: Out of order commands > > > > > > > > Santosh, > > > > I have only one comment on your responses. > > > > > Even a single connection target *MUST* implement a scoreboard. The > > > reason being that it can see out-of-order arrival of commands due to > > > commands being dropped on digest errors. In such a case, it must block > > > further command processing until holes are filled. > > > > I made two convenient assumptions if you noticed, :-), one of which > > is that target forces session recovery on *any* error that it sees > > (ErrorRecoveryLevel=0) - including a dropped command due to a digest > > error. With that assumption, a target can afford not to implement > > a scoreboard. > > > > As I said in a private note, I guess what primarily bothers me about > > OOO commands on a connection is that it requires the receiver to > > undo this "optimization" on its end - most notably on a single > > connection. TCP experts may comment on how/if they dealt with a > > similar issue. > > > > OTOH, you had some valid comments on exceptions to ordering during > > connection recovery. Perhaps we can move on by making Julian's > > proposed stipulation a SHOULD.... > > -- > > Mallikarjun > > > > > > Mallikarjun Chadalapaka > > Networked Storage Architecture > > Network Storage Solutions Organization > > MS 5668 Hewlett-Packard, Roseville. > > cbm@rose.hp.com > > > > > > Santosh Rao wrote: > > > > > > Mallikarjun, > > > > > > Some comments below. > > > > > > Regards, > > > Santosh > > > > > > "Mallikarjun C." wrote: > > > > > > > > Rod and Julian, > > > > > > > > This has been an interesting thread of discussion. Some > > > > comments - > > > > > > > > 1.My first reaction was - allowing out-of-order command > > > > transmission on the same connection deprives targets of > > > > an implementation choice. Targets which support only > > > > single-connection sessions and only support session > > > > recovery (reasonable assumptions in my mind) can no > > > > longer afford *not to* implement a command scoreboard. > > > > > > Even a single connection target *MUST* implement a scoreboard. The > > > reason being that it can see out-of-order arrival of commands due to > > > commands being dropped on digest errors. In such a case, it must block > > > further command processing until holes are filled. > > > > > > Thus, there is no getting away from implementing a sequencer at the > > > target. Given this, I think it is unreasonable to restrict initiator > > > implementation flexibility by imposing a strict ordering requirement > > > within the connection. > > > > > > > 2.Any end-node efficiency that is sought to be achieved > > > > by transmitting CmdSNs out-of-order from the initiator > > > > would be lost on the other end-node, since the target > > > > now must wait for re-ordering the commands. > > > > > > It has to handle this situation anyway to deal with holes caused by > > > digest errors. This scenario occurs even with initiators that issue > > > commands in order. > > > > > > > > > > > 3.The flipside is that out-of-order transmission saves > > > > link badwidth (albeit at the expense of end-node efficiency), > > > > compared to idling the link waiting for outbound DMA. > > > > We have to determine if this is a reasonable trade-off. > > > > > > > > 4.I can see Rod's point that prefetching all immediate > > > > data can be a burden on the NIC resources. But, two > > > > questions - > > > > - could the NIC not use unsolicited separate data > > > > PDUs in these cases? [ I realize that InitialR2T > > > > has to be "no" to let it happen... ] > > > > - could the NIC have a memory architecture that > > > > allows data prefetching for the next command (so > > > > this is a non-issue from the protocol perspective)? > > > > This scheme incurs one DMA delay for every new > > > > burst of commands. > > > > > > > > 5.Another (perhaps radical at this point) option is to do > > > > away with immediate unsolicited data, to stick only with > > > > separate unsolicited data. I would personally be okay > > > > with the choice, particularly if this feature (that > > > > helps software implementations) starts making hardware > > > > design complicated/expensive. > > > > > > > > So, to summarize - > > > > > > > > option immediate allow > > > > data in spec? out-of-order? > > > > > > > > (A) (5) above no no > > > > (B) No real reason to do this. no yes > > > > (C) (4) above yes no > > > > (D) pros & cons (1), (2) & (3) yes yes > > > > > > > > >From the arguments I heard so far, I am leaning towards > > > > option A, and option C in that order. > > > > > > > > Comments? > > > > -- > > > > Mallikarjun > > > > > > > > Mallikarjun Chadalapaka > > > > Networked Storage Architecture > > > > Network Storage Solutions Organization > > > > MS 5668 Hewlett-Packard, Roseville. > > > > cbm@rose.hp.com > > > > > > > > Rod Harrison wrote: > > > > > > > > > > Julian, > > > > > > > > > > I don't understand what you are proposing here, what do you > > mean by > > > > > "multiplexed" DMA? > > > > > > > > > > The problem is that the DMAs take some time, the more there > > are > > > > > queued the longer the last DMAs queued take to complete. Some > > commands > > > > > require DMAs to complete before they can be sent, i.e. Writes with > > > > > immediate data, some commands do not, i.e. Reads and writes with no > > > > > immediate data. The iSCSI HBA wants to be able to send commands as > > > > > soon a possible, which for a read after a write can be before the > > > > > write's DMA has completed. Maintaining an ordered queue for commands > > > > > to be sent on the HBA is expensive and redundant since the target > > > > > already knows how to queue commands before committing them to its > > SCSI > > > > > layer. > > > > > > > > > > The iSCSI HBA and its host driver are not at liberty to > > change the > > > > > order of commands from the OS, but the DMAs those commands need are > > > > > unlikely to complete in the same order, and as I mentioned some > > > > > commands need no DMA. If the HBA can't send commands out of CmdSN > > > > > order it has to maintain an ordered queue of commands waiting to be > > > > > sent, and potentially buffer a lot of data. For an HBA this makes > > > > > immediate data almost impossible to support. > > > > > > > > > > I don't see the problem with allowing out of order commands > > given > > > > > that the target already has to deal with very similar problems. I > > > > > think we are getting in to the area of implementation choices here, > > > > > which is inappropriate for a specification. > > > > > > > > > > - Rod > > > > > > > > > > -----Original Message----- > > > > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf > > Of > > > > > Julian Satran > > > > > Sent: Monday, November 05, 2001 10:06 PM > > > > > To: ips@ece.cmu.edu > > > > > Subject: Re: iSCSI: Out of order commands, was current UNH Plugfest > > > > > > > > > > Rod, > > > > > > > > > > I don't see any reason why DMA operations cant be "multiplexed" with > > > > > commands. > > > > > If you have scheduled a long outbound DMA you are doomed regardless > > of > > > > > the > > > > > command ordering. > > > > > And if you have scheduled DMA operations piecemeal then you can > > insert > > > > > your commands in correct order. > > > > > > > > > > Julo > > > > > > > > > > "Rod Harrison" <rod.harrison@windriver.com> > > > > > 05-11-01 20:48 > > > > > Please respond to "Rod Harrison" > > > > > > > > > > To: Julian Satran/Haifa/IBM@IBMIL, <ips@ece.cmu.edu> > > > > > cc: > > > > > Subject: iSCSI: Out of order commands, was current > > UNH > > > > > Plugfest > > > > > > > > > > [ Subject changed ] > > > > > > > > > > Julian, > > > > > > > > > > The ordering difference is introduced between the > > > > > host > > > > > side driver > > > > > and the iSCSI HBA. The host side driver must present SCSI commands > > to > > > > > the HBA in the order they are received from the OS to prevent read > > > > > after write dependency failures. The HBA might reorder the commands > > > > > depending on when DMA completes. The reordering can't be done ahead > > of > > > > > time in the host driver since it doesn't know how long each DMA > > might > > > > > take. As long as the HBA assigns CmdSN in the order it receives > > > > > commands the desired host ordering is preserved. > > > > > > > > > > - Rod > > > > > > > > > > -----Original Message----- > > > > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf > > Of > > > > > Julian Satran > > > > > Sent: Monday, November 05, 2001 12:35 AM > > > > > To: ips@ece.cmu.edu > > > > > Subject: RE: iSCSI: current UNH Plugfest > > > > > > > > > > Rod, > > > > > > > > > > I all examples give the point I find hard to understand is why is > > the > > > > > ordering on the wire different from the presentation order to the > > > > > initiator. You can get as many overlaps as you want by presenting > > the > > > > > commands to the initiator in the desired order. > > > > > What we are considering here is the case in which you want to ship > > in > > > > > an > > > > > order different than the one you present the commands. > > > > > > > > > > Julo > > > > > > > > > > "Rod Harrison" <rod.harrison@windriver.com> > > > > > Sent by: owner-ips@ece.cmu.edu > > > > > 04-11-01 04:42 > > > > > Please respond to "Rod Harrison" > > > > > > > > > > To: "Barry Reinhold" <bbrtrebia@mediaone.net>, "Dave > > > > > Sheehy" > > > > > <dbs@acropora.rose.agilent.com>, "IETF IP SAN Reflector" > > > > > <ips@ece.cmu.edu> > > > > > cc: > > > > > Subject: RE: iSCSI: current UNH Plugfest > > > > > > > > > > Barry, > > > > > > > > > > In general I agree but I don't think this is as > > much > > > > > of a > > > > > corner case > > > > > as it at first appears. Targets will have code very similar to that > > > > > needed to handle out of order commands to deal with digest errors. > > > > > Targets also need to queue commands whilst waiting for both > > solicited > > > > > and unsolicited data to arrive. Queuing out of order commands seems > > > > > little extra work. > > > > > > > > > > From an initiators point of view there are > > > > > efficiency, > > > > > and probably > > > > > performance gains to be had from sending commands out of order. Bob > > > > > Russell gave the example of a read being sent whilst write data DMA > > is > > > > > happening, and a similar situation can arise with DMA for writes > > > > > overtaking that of earlier writes if the initiator has multiple DMA > > > > > engines. In this case the initiator might be forced to let the wire > > go > > > > > idle if it can't send the data from completed DMAs as soon as > > > > > possible. > > > > > > > > > > We already have a command queue at the target to > > > > > enforce > > > > > correct > > > > > serialisation of commands, doing the same thing at the initiator is > > > > > redundant. > > > > > > > > > > Finally, I don't believe we should be writing a > > > > > standard > > > > > to work > > > > > around poor coding and test coverage, especially at the cost of > > > > > potential efficiency gains. > > > > > > > > > > I agree with Dave and Santosh that commands being > > > > > sent > > > > > out of order > > > > > on a single session should be allowed by the standard. > > > > > > > > > > - Rod > > > > > > > > > > -----Original Message----- > > > > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf > > Of > > > > > Barry Reinhold > > > > > Sent: Friday, November 02, 2001 5:24 PM > > > > > To: Dave Sheehy; IETF IP SAN Reflector > > > > > Subject: RE: iSCSI: current UNH Plugfest > > > > > > > > > > Using features such as out of order command delivery on a connection > > > > > tend to > > > > > be the sort of things that lead to interoperability problems. It is > > > > > unexpected and probably going to hit poorly tested code paths even > > if > > > > > the > > > > > standard is written to allow it. > > > > > > > > > > >-----Original Message----- > > > > > >From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf > > > > > Of > > > > > >Dave Sheehy > > > > > >Sent: Friday, November 02, 2001 4:19 PM > > > > > >To: IETF IP SAN Reflector > > > > > >Subject: Re: iSCSI: current UNH Plugfest > > > > > > > > > > > > > > > > > > > > > > > >> 3. Can commands be sent out of order on the same connection? > > > > > >> > > > > > >> The behavior of targets is clearly specified in Section > > 2.2.2.3 > > > > > on > > > > > >> page 25 of draft 8, which says: > > > > > >> "Except for the commands marked for immediate delivery the > > > > > iSCSI > > > > > >> target layer MUST eliver the commands for execution in the > > > > > order > > > > > >> specified by CmdSN." > > > > > >> > > > > > >> Section 2.2.2.3 on page 26 of draft 8 also says: > > > > > >> "- CmdSN - the current command Sequence Number advanced by 1 > > > > > on > > > > > >> each command shipped except for commands marked for > > immediate > > > > > >> delivery." > > > > > >> but the meaning of the term "shipped" is vague, and does not > > > > > >> necessarily > > > > > >> require that the PDUs arrive on the other end of a TCP > > > > > connection > > > > > >> in the same order that the CmdSN values were assigned to these > > > > > PDUs. > > > > > >> > > > > > >> Some initiators have been designed to send commands out of > > CmdSN > > > > > >> order on one connection. Consider the situation where there > > is > > > > > only > > > > > >> one connection and a high-level dispatcher creates a PDU for a > > > > > SCSI > > > > > >> command that involves writing immediate data to the target. > > > > > This PDU > > > > > >> is enqueued to a lower-level layer which has to setup, start, > > > > > and > > > > > >> wait-for a DMA operation to move the immediate data into an > > > > > onboard > > > > > >> buffer before the PDU can be put onto the wire. While this is > > > > > >> happening, the dispatcher creates another unrelated PDU for a > > > > > SCSI > > > > > >> read command (for example), and when this PDU is passed to the > > > > > >> lower-level layer it can be sent immediately, ahead of the > > > > > previous > > > > > >> write PDU and therefore out of order on this connection. > > > > > >> > > > > > >> The standard clearly allows this to happen if the two PDUs > > were > > > > > sent > > > > > >> on different connections, and seems to imply that this can > > also > > > > > happen > > > > > >> when the two PDUs are sent on the same connection. > > > > > >> > > > > > >> The suggestion is to put in the standard an explicit statement > > > > > that > > > > > >> this is allowed or not allowed, as appropriate. > > > > > >> > > > > > >> If this is allowed, such a statement would avoid the erroneous > > > > > >> assumption being made by some target implementers that within > > a > > > > > single > > > > > >> connection, commands will arrive in order. > > > > > >> > > > > > >> If this is not allowed, such a statement would avoid the > > > > > erroneous > > > > > >> assumption being made by some initiator implementers that > > within > > > > > a > > > > > >> single connection, commands can be put on the wire out of > > order. > > > > > >> > > > > > >> +++ > > > > > >> > > > > > >> will add an explicit statement saying that this behaviour is > > > > > forbidden. > > > > > >> 2.2.2.1 will contain: > > > > > >> > > > > > >> On any given connection, the iSCSI initiator MUST send the > > > > > >commands in the > > > > > >> order specified by CmdSN. > > > > > >> > > > > > >> +++ > > > > > > > > > > > >Why do you feel this behavior should be forbidden? Targets already > > > > > have to > > > > > >order commands across the session. I don't see why it's a problem > > to > > > > > extend > > > > > >that to the connection as well. I, for one, believe we should take > > > > > >a liberal > > > > > >stance on this. > > > > > > > > > > > >Dave Sheehy > > > > > > > > > > > > -- > > > ################################## > > > Santosh Rao > > > Software Design Engineer, > > > HP-UX iSCSI Driver Team, > > > Hewlett Packard, Cupertino. > > > email : santoshr@cup.hp.com > > > Phone : 408-447-3751 > > > ################################## > > > > > > > >
Home Last updated: Thu Nov 08 08:17:52 2001 7636 messages in chronological order |