SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: Out of order commands



    
    
    > -----Original Message-----
    > From: Robert D. Russell [mailto:rdr@mars.iol.unh.edu]
    > Sent: Wednesday, November 07, 2001 1:13 PM
    > To: Somesh Gupta
    > Cc: Julian Satran; ips@ece.cmu.edu
    > Subject: RE: iSCSI: Out of order commands
    >
    >
    > Somesh, Julian:
    >
    > You state that dealing with OOO commands on the target
    > will add substantial complexity on the target.
    > Do you have any basis for that claim?  My impression from the
    > plugfest is that most targets are already dealing with
    > it.  Perhaps we need to hear from someone who is actually
    > building a target for which this would be a real problem.
    
      Since we are making related products, I can assure you
      that this adds complexity. Mallikarjun also added the
      target perspecitive.
    
    >
    > If anything, what we are hearing from people who really
    > are building initiators is that dealing with the requirement
    > to send commands in order will introduce substantial complexity
    > on the initiator.
    
      I think you heard from one (WindRiver). We don't have a
      problem and it does not add "substantial complexity".
    
    >
    > So why should we be saving complexity on (hypothetically) simple
    > targets yet requiring complexity on real initiators?
    >
    > As far as the deadlock issue is concerned, if the only way
    > that deadlock can occur with OOO commands on the same
    > connection is during the use of immediate data (which is I
    > think what Julian was saying), then shouldn't we change
    > the standard to just say that -- if the initiator sends
    > commands out of order on a single connection, then immediate
    > data MUST NOT be used on that connection in order to avoid deadlock.
    >
    > This gives everybody what they want, since initiators who find
    > it beneficial to deliver commands OOO will just negotiate
    > immediate data off.  Those who really want to use immediate data
    > will have to ensure that commands are sent in order.
    > The tradeoff then becomes an implementation issue, not a
    > standards issue, which is where it belongs.
    
      An initiator that has a problem with this model can avoid
      sending immediate data.
    >
    >
    > Bob Russell
    > InterOperability Lab
    > University of New Hampshire
    > rdr@iol.unh.edu
    > 603-862-3774
    
      Bob,
    
      Sorry to be so blunt, but I say this because of your insinuations
      here. The only one in the debate not building a product
      seems to be you. But I do not disagree with your right
      to add meaningful points as you have done in the past.
    
      Somesh
    >
    >
    > On Wed, 7 Nov 2001, Somesh Gupta wrote:
    >
    > > I think we should either have it as a MUST or not require
    > > it (at both ends to get the real benefit). SHOULD is one
    > > of those things that leads to implementation
    > > burden and confusion, without perhaps the feature being
    > > used. There are implementation as well as protocol
    > > considerations mixed in here.
    > >
    > > If we are to remove the restriction, we should (SHOULD)
    > > get the maximum benefit from it, rather than to
    > > accomodate an implementation choice. Out of sequence
    > > commands, combined with the possibility of digest errors,
    > > will add substantial complexity on the target side,
    > > without corrosponding benefit in performance. If we change
    > > this to SHOULD, we should also relax the requirement
    > > to present commands on the target side to a SHOULD.
    > >
    > >
    > >
    > > > -----Original Message-----
    > > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
    > > > Julian Satran
    > > > Sent: Wednesday, November 07, 2001 10:00 AM
    > > > To: ips@ece.cmu.edu
    > > > Subject: Re: iSCSI: Out of order commands
    > > >
    > > >
    > > > Mallikarjun,
    > > >
    > > > I did not see a SINGLE performance improvement that results from OOO
    > > > shipping.
    > > > I would be bad engineering to give away the "no-deadlock" mechanism we
    > > > have now for nothing.
    > > > I have also the impression that the point about deadlock that I keep
    > > > repeating is ignored or not understood.
    > > > As we stand today commands can be shipped with Immediate data
    > or without
    > > > and an implementer determined
    > > > to squeeze maximum bandwidth and overlap command start with
    > delivery will
    > > > choose not to work with immediate data
    > > > (as you have pointed out) while a low performance software
    > implementation
    > > > will use immediate data to minimize CPU cycles consumed.  However both
    > > > will be guaranteed to work without deadlock as source and sink use the
    > > > same ordering.
    > > > Recovery is still a low probability event and should be handled with a
    > > > different set of considerations in mind.
    > > > As for the strictness of the recommendation - yes we could settle on
    > > > SHOULD.
    > > >
    > > > Julo
    > > >
    > > >
    > > >
    > > >
    > > > "Mallikarjun C." <cbm@rose.hp.com>
    > > > Sent by: owner-ips@ece.cmu.edu
    > > > 07-11-01 19:41
    > > > Please respond to cbm
    > > >
    > > >
    > > >         To:     Santosh Rao <santoshr@cup.hp.com>, ips@ece.cmu.edu
    > > >         cc:
    > > >         Subject:        Re: iSCSI: Out of order commands
    > > >
    > > >
    > > >
    > > > Santosh,
    > > >
    > > > I have only one comment on your responses.
    > > >
    > > > > Even a single connection target *MUST* implement a scoreboard. The
    > > > > reason being that it can see out-of-order arrival of commands due to
    > > > > commands being dropped on digest errors. In such a case, it
    > must block
    > > > > further command processing until holes are filled.
    > > >
    > > > I made two convenient assumptions if you noticed, :-), one of which
    > > > is that target forces session recovery on *any* error that it sees
    > > > (ErrorRecoveryLevel=0) - including a dropped command due to a digest
    > > > error.  With that assumption, a target can afford not to implement
    > > > a scoreboard.
    > > >
    > > > As I said in a private note, I guess what primarily bothers me about
    > > > OOO commands on a connection is that it requires the receiver to
    > > > undo this "optimization" on its end - most notably on a single
    > > > connection.  TCP experts may comment on how/if they dealt with a
    > > > similar issue.
    > > >
    > > > OTOH, you had some valid comments on exceptions to ordering during
    > > > connection recovery.  Perhaps we can move on by making Julian's
    > > > proposed stipulation a SHOULD....
    > > > --
    > > > Mallikarjun
    > > >
    > > >
    > > > Mallikarjun Chadalapaka
    > > > Networked Storage Architecture
    > > > Network Storage Solutions Organization
    > > > MS 5668          Hewlett-Packard, Roseville.
    > > > cbm@rose.hp.com
    > > >
    > > >
    > > > Santosh Rao wrote:
    > > > >
    > > > > Mallikarjun,
    > > > >
    > > > > Some comments below.
    > > > >
    > > > > Regards,
    > > > > Santosh
    > > > >
    > > > > "Mallikarjun C." wrote:
    > > > > >
    > > > > > Rod and Julian,
    > > > > >
    > > > > > This has been an interesting thread of discussion.  Some
    > > > > > comments -
    > > > > >
    > > > > > 1.My first reaction was - allowing out-of-order command
    > > > > >   transmission on the same connection deprives targets of
    > > > > >   an implementation choice.  Targets which support only
    > > > > >   single-connection sessions and only support session
    > > > > >   recovery (reasonable assumptions in my mind) can no
    > > > > >   longer afford *not to* implement a command scoreboard.
    > > > >
    > > > > Even a single connection target *MUST* implement a scoreboard. The
    > > > > reason being that it can see out-of-order arrival of commands due to
    > > > > commands being dropped on digest errors. In such a case, it
    > must block
    > > > > further command processing until holes are filled.
    > > > >
    > > > > Thus, there is no getting away from implementing a sequencer at the
    > > > > target. Given this, I think it is unreasonable to restrict initiator
    > > > > implementation flexibility by imposing a strict ordering requirement
    > > > > within the connection.
    > > > >
    > > > > > 2.Any end-node efficiency that is sought to be achieved
    > > > > >   by transmitting CmdSNs out-of-order from the initiator
    > > > > >   would be lost on the other end-node, since the target
    > > > > >   now must wait for re-ordering the commands.
    > > > >
    > > > > It has to handle this situation anyway to deal with holes caused by
    > > > > digest errors. This scenario occurs even with initiators that issue
    > > > > commands in order.
    > > > >
    > > > > >
    > > > > > 3.The flipside is that out-of-order transmission saves
    > > > > >   link badwidth (albeit at the expense of end-node efficiency),
    > > > > >   compared to idling the link waiting for outbound DMA.
    > > > > >   We have to determine if this is a reasonable trade-off.
    > > > > >
    > > > > > 4.I can see Rod's point that prefetching all immediate
    > > > > >   data can be a burden on the NIC resources.  But, two
    > > > > >   questions -
    > > > > >         - could the NIC not use unsolicited separate data
    > > > > >           PDUs in these cases? [ I realize that InitialR2T
    > > > > >           has to be "no" to let it happen... ]
    > > > > >         - could the NIC have a memory architecture that
    > > > > >           allows data prefetching for the next command (so
    > > > > >           this is a non-issue from the protocol perspective)?
    > > > > >           This scheme incurs one DMA delay for every new
    > > > > >           burst of commands.
    > > > > >
    > > > > > 5.Another (perhaps radical at this point) option is to do
    > > > > >   away with immediate unsolicited data, to stick only with
    > > > > >   separate unsolicited data.  I would personally be okay
    > > > > >   with the choice, particularly if this feature (that
    > > > > >   helps software implementations) starts making hardware
    > > > > >   design complicated/expensive.
    > > > > >
    > > > > > So, to summarize -
    > > > > >
    > > > > > option                         immediate         allow
    > > > > >                                data in spec?     out-of-order?
    > > > > >
    > > > > > (A) (5) above                  no                no
    > > > > > (B) No real reason to do this. no                yes
    > > > > > (C) (4) above                  yes               no
    > > > > > (D) pros & cons (1), (2) & (3) yes               yes
    > > > > >
    > > > > > >From the arguments I heard so far, I am leaning towards
    > > > > > option A, and option C in that order.
    > > > > >
    > > > > > Comments?
    > > > > > --
    > > > > > Mallikarjun
    > > > > >
    > > > > > Mallikarjun Chadalapaka
    > > > > > Networked Storage Architecture
    > > > > > Network Storage Solutions Organization
    > > > > > MS 5668 Hewlett-Packard, Roseville.
    > > > > > cbm@rose.hp.com
    > > > > >
    > > > > > Rod Harrison wrote:
    > > > > > >
    > > > > > > Julian,
    > > > > > >
    > > > > > >         I don't understand what you are proposing here,
    > what do you
    > > > mean by
    > > > > > > "multiplexed" DMA?
    > > > > > >
    > > > > > >         The problem is that the DMAs take some time,
    > the more there
    > > > are
    > > > > > > queued the longer the last DMAs queued take to complete. Some
    > > > commands
    > > > > > > require DMAs to complete before they can be sent, i.e.
    > Writes with
    > > > > > > immediate data, some commands do not, i.e. Reads and
    > writes with no
    > > > > > > immediate data. The iSCSI HBA wants to be able to send
    > commands as
    > > > > > > soon a possible, which for a read after a write can be
    > before the
    > > > > > > write's DMA has completed. Maintaining an ordered queue
    > for commands
    > > > > > > to be sent on the HBA is expensive and redundant since
    > the target
    > > > > > > already knows how to queue commands before committing
    > them to its
    > > > SCSI
    > > > > > > layer.
    > > > > > >
    > > > > > >         The iSCSI HBA and its host driver are not at liberty to
    > > > change the
    > > > > > > order of commands from the OS, but the DMAs those
    > commands need are
    > > > > > > unlikely to complete in the same order, and as I mentioned some
    > > > > > > commands need no DMA. If the HBA can't send commands
    > out of CmdSN
    > > > > > > order it has to maintain an ordered queue of commands
    > waiting to be
    > > > > > > sent, and potentially buffer a lot of data. For an HBA
    > this makes
    > > > > > > immediate data almost impossible to support.
    > > > > > >
    > > > > > >         I don't see the problem with allowing out of
    > order commands
    > > > given
    > > > > > > that the target already has to deal with very similar
    > problems. I
    > > > > > > think we are getting in to the area of implementation
    > choices here,
    > > > > > > which is inappropriate for a specification.
    > > > > > >
    > > > > > >         - Rod
    > > > > > >
    > > > > > > -----Original Message-----
    > > > > > > From: owner-ips@ece.cmu.edu
    > [mailto:owner-ips@ece.cmu.edu]On Behalf
    > > > Of
    > > > > > > Julian Satran
    > > > > > > Sent: Monday, November 05, 2001 10:06 PM
    > > > > > > To: ips@ece.cmu.edu
    > > > > > > Subject: Re: iSCSI: Out of order commands, was current
    > UNH Plugfest
    > > > > > >
    > > > > > > Rod,
    > > > > > >
    > > > > > > I don't see any reason why DMA operations cant be
    > "multiplexed" with
    > > > > > > commands.
    > > > > > > If you have scheduled a long outbound DMA you are
    > doomed regardless
    > > > of
    > > > > > > the
    > > > > > > command ordering.
    > > > > > > And if you have scheduled DMA operations piecemeal then you can
    > > > insert
    > > > > > > your commands in correct order.
    > > > > > >
    > > > > > > Julo
    > > > > > >
    > > > > > > "Rod Harrison" <rod.harrison@windriver.com>
    > > > > > > 05-11-01 20:48
    > > > > > > Please respond to "Rod Harrison"
    > > > > > >
    > > > > > >         To:     Julian Satran/Haifa/IBM@IBMIL, <ips@ece.cmu.edu>
    > > > > > >         cc:
    > > > > > >         Subject:        iSCSI: Out of order commands,
    > was current
    > > > UNH
    > > > > > > Plugfest
    > > > > > >
    > > > > > >                  [ Subject changed ]
    > > > > > >
    > > > > > > Julian,
    > > > > > >
    > > > > > >                  The ordering difference is introduced
    > between the
    > > > > > > host
    > > > > > > side driver
    > > > > > > and the iSCSI HBA. The host side driver must present
    > SCSI commands
    > > > to
    > > > > > > the HBA in the order they are received from the OS to
    > prevent read
    > > > > > > after write dependency failures. The HBA might reorder
    > the commands
    > > > > > > depending on when DMA completes. The reordering can't
    > be done ahead
    > > > of
    > > > > > > time in the host driver since it doesn't know how long each DMA
    > > > might
    > > > > > > take. As long as the HBA assigns CmdSN in the order it receives
    > > > > > > commands the desired host ordering is preserved.
    > > > > > >
    > > > > > >                  - Rod
    > > > > > >
    > > > > > > -----Original Message-----
    > > > > > > From: owner-ips@ece.cmu.edu
    > [mailto:owner-ips@ece.cmu.edu]On Behalf
    > > > Of
    > > > > > > Julian Satran
    > > > > > > Sent: Monday, November 05, 2001 12:35 AM
    > > > > > > To: ips@ece.cmu.edu
    > > > > > > Subject: RE: iSCSI: current UNH Plugfest
    > > > > > >
    > > > > > > Rod,
    > > > > > >
    > > > > > > I all examples give the point I find hard to understand
    > is why is
    > > > the
    > > > > > > ordering on the wire different from the presentation
    > order to the
    > > > > > > initiator.  You can get as many overlaps as you want by
    > presenting
    > > > the
    > > > > > > commands to the initiator in the desired order.
    > > > > > > What we are considering here is the case in which you
    > want to ship
    > > > in
    > > > > > > an
    > > > > > > order different than the one you present the commands.
    > > > > > >
    > > > > > > Julo
    > > > > > >
    > > > > > > "Rod Harrison" <rod.harrison@windriver.com>
    > > > > > > Sent by: owner-ips@ece.cmu.edu
    > > > > > > 04-11-01 04:42
    > > > > > > Please respond to "Rod Harrison"
    > > > > > >
    > > > > > >         To:     "Barry Reinhold" <bbrtrebia@mediaone.net>, "Dave
    > > > > > > Sheehy"
    > > > > > > <dbs@acropora.rose.agilent.com>, "IETF IP SAN Reflector"
    > > > > > > <ips@ece.cmu.edu>
    > > > > > >         cc:
    > > > > > >         Subject:        RE: iSCSI: current UNH Plugfest
    > > > > > >
    > > > > > > Barry,
    > > > > > >
    > > > > > >                  In general I agree but I don't think this is as
    > > > much
    > > > > > > of a
    > > > > > > corner case
    > > > > > > as it at first appears. Targets will have code very
    > similar to that
    > > > > > > needed to handle out of order commands to deal with
    > digest errors.
    > > > > > > Targets also need to queue commands whilst waiting for both
    > > > solicited
    > > > > > > and unsolicited data to arrive. Queuing out of order
    > commands seems
    > > > > > > little extra work.
    > > > > > >
    > > > > > >                  From an initiators point of view there are
    > > > > > > efficiency,
    > > > > > > and probably
    > > > > > > performance gains to be had from sending commands out
    > of order. Bob
    > > > > > > Russell gave the example of a read being sent whilst
    > write data DMA
    > > > is
    > > > > > > happening, and a similar situation can arise with DMA for writes
    > > > > > > overtaking that of earlier writes if the initiator has
    > multiple DMA
    > > > > > > engines. In this case the initiator might be forced to
    > let the wire
    > > > go
    > > > > > > idle if it can't send the data from completed DMAs as soon as
    > > > > > > possible.
    > > > > > >
    > > > > > >                  We already have a command queue at the
    > target to
    > > > > > > enforce
    > > > > > > correct
    > > > > > > serialisation of commands, doing the same thing at the
    > initiator is
    > > > > > > redundant.
    > > > > > >
    > > > > > >                  Finally, I don't believe we should be writing a
    > > > > > > standard
    > > > > > > to work
    > > > > > > around poor coding and test coverage, especially at the cost of
    > > > > > > potential efficiency gains.
    > > > > > >
    > > > > > >                  I agree with Dave and Santosh that
    > commands being
    > > > > > > sent
    > > > > > > out of order
    > > > > > > on a single session should be allowed by the standard.
    > > > > > >
    > > > > > >                  - Rod
    > > > > > >
    > > > > > > -----Original Message-----
    > > > > > > From: owner-ips@ece.cmu.edu
    > [mailto:owner-ips@ece.cmu.edu]On Behalf
    > > > Of
    > > > > > > Barry Reinhold
    > > > > > > Sent: Friday, November 02, 2001 5:24 PM
    > > > > > > To: Dave Sheehy; IETF IP SAN Reflector
    > > > > > > Subject: RE: iSCSI: current UNH Plugfest
    > > > > > >
    > > > > > > Using features such as out of order command delivery on
    > a connection
    > > > > > > tend to
    > > > > > > be the sort of things that lead to interoperability
    > problems. It is
    > > > > > > unexpected and probably going to hit poorly tested code
    > paths even
    > > > if
    > > > > > > the
    > > > > > > standard is written to allow it.
    > > > > > >
    > > > > > > >-----Original Message-----
    > > > > > > >From: owner-ips@ece.cmu.edu
    > [mailto:owner-ips@ece.cmu.edu]On Behalf
    > > > > > > Of
    > > > > > > >Dave Sheehy
    > > > > > > >Sent: Friday, November 02, 2001 4:19 PM
    > > > > > > >To: IETF IP SAN Reflector
    > > > > > > >Subject: Re: iSCSI: current UNH Plugfest
    > > > > > > >
    > > > > > > >
    > > > > > > >
    > > > > > > >> 3. Can commands be sent out of order on the same connection?
    > > > > > > >>
    > > > > > > >>    The behavior of targets is clearly specified in Section
    > > > 2.2.2.3
    > > > > > > on
    > > > > > > >>    page 25 of draft 8, which says:
    > > > > > > >>      "Except for the commands marked for immediate
    > delivery the
    > > > > > > iSCSI
    > > > > > > >>      target layer MUST eliver the commands for
    > execution in the
    > > > > > > order
    > > > > > > >>      specified by CmdSN."
    > > > > > > >>
    > > > > > > >>    Section 2.2.2.3 on page 26 of draft 8 also says:
    > > > > > > >>      "- CmdSN - the current command Sequence Number
    > advanced by 1
    > > > > > > on
    > > > > > > >>      each command shipped except for commands marked for
    > > > immediate
    > > > > > > >>      delivery."
    > > > > > > >>    but the meaning of the term "shipped" is vague,
    > and does not
    > > > > > > >> necessarily
    > > > > > > >>    require that the PDUs arrive on the other end of a TCP
    > > > > > > connection
    > > > > > > >>    in the same order that the CmdSN values were
    > assigned to these
    > > > > > > PDUs.
    > > > > > > >>
    > > > > > > >>    Some initiators have been designed to send commands out of
    > > > CmdSN
    > > > > > > >>    order on one connection.  Consider the situation
    > where there
    > > > is
    > > > > > > only
    > > > > > > >>    one connection and a high-level dispatcher
    > creates a PDU for a
    > > > > > > SCSI
    > > > > > > >>    command that involves writing immediate data to
    > the target.
    > > > > > > This PDU
    > > > > > > >>    is enqueued to a lower-level layer which has to
    > setup, start,
    > > > > > > and
    > > > > > > >>    wait-for a DMA operation to move the immediate
    > data into an
    > > > > > > onboard
    > > > > > > >>    buffer before the PDU can be put onto the wire.
    > While this is
    > > > > > > >>    happening, the dispatcher creates another
    > unrelated PDU for a
    > > > > > > SCSI
    > > > > > > >>    read command (for example), and when this PDU is
    > passed to the
    > > > > > > >>    lower-level layer it can be sent immediately, ahead of the
    > > > > > > previous
    > > > > > > >>    write PDU and therefore out of order on this connection.
    > > > > > > >>
    > > > > > > >>    The standard clearly allows this to happen if the two PDUs
    > > > were
    > > > > > > sent
    > > > > > > >>    on different connections, and seems to imply that this can
    > > > also
    > > > > > > happen
    > > > > > > >>    when the two PDUs are sent on the same connection.
    > > > > > > >>
    > > > > > > >>    The suggestion is to put in the standard an
    > explicit statement
    > > > > > > that
    > > > > > > >>    this is allowed or not allowed, as appropriate.
    > > > > > > >>
    > > > > > > >>    If this is allowed, such a statement would avoid
    > the erroneous
    > > > > > > >>    assumption being made by some target implementers
    > that within
    > > > a
    > > > > > > single
    > > > > > > >>    connection, commands will arrive in order.
    > > > > > > >>
    > > > > > > >>    If this is not allowed, such a statement would avoid the
    > > > > > > erroneous
    > > > > > > >>    assumption being made by some initiator implementers that
    > > > within
    > > > > > > a
    > > > > > > >>    single connection, commands can be put on the wire out of
    > > > order.
    > > > > > > >>
    > > > > > > >> +++
    > > > > > > >>
    > > > > > > >> will add an explicit statement saying that this behaviour is
    > > > > > > forbidden.
    > > > > > > >> 2.2.2.1 will contain:
    > > > > > > >>
    > > > > > > >> On any given connection, the iSCSI initiator MUST send the
    > > > > > > >commands in the
    > > > > > > >> order specified by CmdSN.
    > > > > > > >>
    > > > > > > >> +++
    > > > > > > >
    > > > > > > >Why do you feel this behavior should be forbidden?
    > Targets already
    > > > > > > have to
    > > > > > > >order commands across the session. I don't see why
    > it's a problem
    > > > to
    > > > > > > extend
    > > > > > > >that to the connection as well. I, for one, believe we
    > should take
    > > > > > > >a liberal
    > > > > > > >stance on this.
    > > > > > > >
    > > > > > > >Dave Sheehy
    > > > > > > >
    > > > >
    > > > > --
    > > > > ##################################
    > > > > Santosh Rao
    > > > > Software Design Engineer,
    > > > > HP-UX iSCSI Driver Team,
    > > > > Hewlett Packard, Cupertino.
    > > > > email : santoshr@cup.hp.com
    > > > > Phone : 408-447-3751
    > > > > ##################################
    > > >
    > > >
    > > >
    > > >
    > >
    >
    >
    
    


Home

Last updated: Wed Nov 07 18:17:38 2001
7627 messages in chronological order