SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: need for new data SNACK code?



    > > The flag per task is not needed - I'd expect the Target to look
    > > at the Data PDUs it would have to resend, check them against the max
    > > Data PDU size for this connection and fail the regular SNACK if any
    > > PDU is too large
    > 
    > I am afraid there's no free lunch here.  In this description, you're now 
    > expecting targets to maintain the PDU size of every PDU that it shipped
    > for each of the tasks, which causes a metadata explosion.  
    
    Targets already have to do that in order to cope with Data SNACKs with
    non-zero BegRun after a PDU size change.
    
    To make this concrete, suppose a target has shipped 4 PDUs with 4K of
    data each (DataSN 0-3) followed by 4 PDUs with 8K each (DataSN 4-7) and
    gets a Data SNACK for PDU 5 to the end - how does it figure out that
    the requested data starts 24K from the beginning?  Skipping 5 x 4K of
    data starts at 20K (middle of PDU 4, wrong), skipping 5 x 8K starts
    at 40K (PDU 7, also wrong).  It looks like the target has to maintain
    the old size (4K), the new size (8K), and the DataSN that the new size
    started at (4), which is just enough metadata to "maintain the PDU
    size of every PDU that it shipped" ...  
    
    The only obvious way I see to eliminate this "metadata explosion"
    is to require BegRun to be zero if there's any possibility of
    resegmentation.  That was part of my original comment, but seems
    to have been dropped.  If BegRun=0 is required when resegmentation
    occurs, then a flag per task would be needed in the target to
    replace the "metadata explosion".  OTOH, this is another way
    for an initiator to cause serious problems - if it issues a
    Data SNACK with BegRun != 0 and the target resegments to the
    new PDU size, the initiator gets the wrong data.
    
    > >If the permission is not used, the Initiator's
    > > status SNACK is not needed but does no harm.
    > 
    > Well, the point is - shouldn't the target be detecting these 
    > obvious bugs, and attempting
    > recovery/fix for these errors (it's a clear disconnect b/n 
    > target and initiator state).  Seems like
    > additional complexity on either end - to cover implementation 
    > bugs wrt prior synchronization.
    
    It's not an "obvious bug" or an error.  There are two different
    algorithms for a Data SNACK - one in which target resegmentation is
    prohibited and the initiator does not need to use a Status SNACK
    due to resegmentation (but may still do so for other reasons),
    and one in which target resegmentation is permitted and the
    initiator MUST use a Status SNACK.  Switching from one algorithm
    to the other changes behavior at *both* the target and initiator -
    the additional Data SNACK code serves to ensure that this change
    is synchronized by explicitly communicating which algorithm to use.
    Both algorithms work as long as both parties agree on which one
    is being used.  The additional Data SNACK code puts the initiator
    unambiguously in charge of algorithm selection.
    
    > > As the complexity of a protocol increases, that synchronized
    > > state machine assumption becomes more prone to failure.
    > 
    > I think this is where the major disconnect is between us.  As
    > I responded to Dave
    > Sheehy yesterday, the iSCSI protocol specification *mandates*
    > that a target must 
    > ship "exact replicas" of the data PDUs barring certain header 
    > fields unless the
    > PDU size was changed by an intermediate successful text 
    > negotiation.   What you're
    > suggesting is: despite this mandate, target may resegment 
    > illegally, so let's define a
    > new data SNACK code with identical wire semantics.
    
    Actually, I'm much more concerned about an Initiator that doesn't
    correctly track when the "successful text negotiation" took place
    with respect to the multitude of outstanding commands it has in
    various states.  The negotiation is not synchronous with respect
    to the command stream unless all commands on the connection are
    quiesced, which Mallikarjun does not want to do.  If an initiator
    fails to catch the fact that a target resegmented, the result is
    corrupt/wrong data for a READ or similar command.
      
    > >and for an initiator to expects to be able to do
    > > this with uninterrupted high performance is unrealistic 
    > ..
    > >right sort of incentives in discouraging
    > > initiators from changing the max Data PDU size.
    > 
    > You're making an incorrect assumption here that it's just the 
    > initiators that are likely to change the max PDU size.  Either party
    > can do it [... snip ...]
    
    I don't think so ... MaxRecvDataSegmentLength is "Declarative", so only
    the initiator can change the initiator's MaxRecvDataSegmentLength.  The
    target can change the target's MaxRecvDataSegmentLength, but that's
    not relevant to this discussion because targets don't issue SNACKs
    of any form to initiators.  
    
    > Now, on to your proposal...
    > 
    > > That strikes me as a productive direction that I could see enforcing
    > > An initiator that wants to be able to issue a Data SNACK for
    > > some or all of its commands then has to ensure that no such
    > > commands are outstanding when/while it changes (in particular
    > > reduces) the max Data PDU size. 
    > 
    > I am afraid you may have misunderstood what I was suggesting.
    
    Sorry for being unclear.  I was following the line of thinking behind
    Mallikarjun's proposal a) to a conclusion beyond where he took it.  I
    know Mallikarjun didn't suggest removing resegmenting Data SNACKs, and I
    didn't mean to ascribe that position to him.
    
    > There's one weird corner case in the simplified option you're 
    > suggesting here - when a target wants to initiate a max PDU size
    > change, it cannot know when the initiator is likely to quiesce the
    > I/Os, nor there's a way to tell the initiator to stop.
    
    The "weird corner case" does not exist because MaxRecvDataSegmentLength
    is "Declarative" - see above.
    
    > With that said, let me suvey the available options:
    
    Unfortunately, there are a bunch of things wrong with the survey.
     
    > Option.A
    >          - Keep the rev13 text, plus add the two additional
    >		text segments I proposed
    >           on the beginning of this thread (initiators must 
    >		drop status in one case, SNACK
    >           must be issued only before the status is ack'ed), 
    > 		*and* add "no data SNACKs while any text negotiation is on".
    >          Pros:   - No need for a new data SNACK code with 
    >				identical wire semantics
    >                     - Can allow the PDU size change to happen 
    >				with no wait for quiescing
    >                       any long running writes/reads (and
    > 				those operations too benefit from ULPDU
    >                       containment from this changed PDU size).
    
    I disagree with "identical wire semantics" - see discussion
    of two Data SNACK algorithms above, but agree that not introducing a
    new Data SNACK code is a Pro.  ULPDU containment is irrelevant (more
    below), and writes do not need to be quiesced, although reads would.
    
    >          Cons:   - Additional complexity (compared to the 
    >				standard data SNACK) on the 
    >                       initiator to drop the status SNACK; 
    >				and to mark all active tasks while the 
    >                       PDU change had happened, so their 
    >				statuses can dropped if necessary.
    
    Con: Failure to drop SCSI response with good status when required
    can lead to incorrect/incomplete data being returned.
    
    > Option.B
    >           - Same as Option.A, but add the new resegmenting-Data SNACK code
    per 
    >              David's Last Call comment.
    >           Pros:  - Precludes surprises due to implementation 
    >				errors (also a con, see below)
    
    Pro: allows simpler implementations that don't track the state
    required for Option A.  Places the responsibility for state
    tracking for resegmentation on the target where the resegmentation
    occurs.
    
    >                     - Can allow the PDU size change to happen 
    >				with no wait for quiescing
    >                       any long running writes/reads (and 
    >				those operations too benefit from ULPDU
    >                       containment from this changed PDU size).
    
    Only need to quiesce reads; writes aren't affected.  ULPDU containment
    is irrelevant - see below.
    
    Pro: Required response drop at the initiator is based on less state that
    	is more directly connected with the Data SNACK that creates the need
    	for the response drop.
    
    
    >          Cons:  - Attempt to address an implementation error 
    >				by protocol means, could be a
    >                       slippery slope.
    
    I disagree with this Con based on the login "if in doubt, negotiate" maxim.
    
    >                    - Requires a new data SNACK code which
    > 				both sides have to handle, and which
    >                       conveys completely redundant 
    >				information about the changed PDU size.
    
    I disagree with "completely redundant" statement.  The information
    being conveyed is an instruction from the initiator about whether
    the resegmenting Data SNACK algorithm or the non-resegmenting Data
    SNACK algorithm is in use.
    
    >                    - Additional complexity (compared to the 
    >				standard data SNACK) on the 
    >                       initiator to drop the status SNACK; and 
    >				to mark all active tasks while the 
    >                       PDU change had happened, so their 
    >				statuses can be dropped if necessary.
    
    I disagree with the "and mark ..." text.  Only tasks for which a
    resegmenting Data SNACK has been issued need be marked.  Also,
    
    Con: Failure to drop response with good status when required can lead
    	to incorrect/incomplete data being returned.
    
    > Option.C
    >            - Completely disallow PDU size changes (initiated 
    >			by either party) while any tasks are active.
    
    That's not right because MaxRecvDataSegmentLength length is declarative
    (hence "(initiated by either party)" is wrong, and the description
    assumes that the Initiator wants to be able to issue Data SNACKs
    for all tasks (Data SNACKs don't apply to WRITE, etc.).  A correct
    description would be:
    
    		- Data SNACKs cannot be issued for a task if the
    			initiator has changed its size limit on received
    			Data PDUs while the task is outstanding.  Initiators
    			have to ensure that tasks for which Data SNACKs
    			could be issued are not active during such a
    			size limit change.
    
    >			Rev13 text should be stripped of 
    >			the resegmenting discussion.  Any 
    >               data SNACK always gets exact replicas.
    >
    >            Pros:  - Simpler approach, initiators don't need 
    >				to drop status PDUs, nor mark the
    >                        active tasks.  
    >            Cons:  - Active tasks cannot dynamically adapt to 
    >				PMTU degradation, so ULPDU 
    >                       containment isn't always guaranteed - 
    >				particulary painful for long-running tasks
    >                       for either party.
    
    Irrelevant - iSCSI does not require, recommend, or even describe
    ULPDU containment.  TCP ULP framing was removed from the iSCSI
    draft many months ago after a long controversial discussion.  I
    strongly object to attempting to sneak it back in.  Besides,
    adaptation to PMTU degradation is the transport's job, and iSCSI
    should not be trying to do the transport's work for it, as it
    in general does not have direct access to PMTU information.
    
    >                       - Desired changes in max PDU size would 
    >				  need to wait for all tasks to quiesce
    >                         and the statuses be acknowledged, 
    >				  forcing a pause in the I/O activity.
    
    Not "all tasks", rather "reads for which Data SNACK recovery
    is required if errors occur".  Pause in I/O activity is still
    a distinct possibility (e.g., if all/most tasks are such reads).
    
    >                       - Any text negotiation prompted by the 
    >				  target can't be carried on until all 
    >                         active I/Os are quiesced (even if the 
    >				  target intends to negotiate other keys).
    
    Definitely wrong - target can't negotiate this, see above.
    
    > I prefer Option.A, followed by Option.B.  Option.C's cons 
    > appear to outweigh its simplicity, so wouldn't prefer that.
    
    Let's see - the description for option C is wrong, as is one of
    its Cons, and a second Con is irrelevant leaving only one of the
    three ... some additional consideration may be in order ...
    
    I have enough issues with the attempted survey, that I'm going
    to try a comparison table (and also to provide something concise
    for folks to shoot at):
    
    First, here's what I believe the A/B/C descriptions are.  I'm going
    to assume for now that BegRun=0 is required for any resegmentation
    in order to avoid the "metadata explosion" problem.
    
    Applies to all options:
    	- Initiators MUST NOT issue SNACKs after the status for the
    		command is ACKed via ExpStatSN.
    
    A:	- Initiators MUST drop SCSI response and issue a status SNACK
    		when a Data SNACK is issued for a command that was
    outstanding
    		during an initiator receive data PDU size limit change.
    	- Targets MAY resegment in response to a Data SNACK when the
    		initiator receive data PDU size limit changes while
    		the command is outstanding.
    	- Initiators MUST not issue new Data SNACKs during a change
    		to initiator receive data PDU size limit (and wait for
    		existing data SNACKs to complete before making such
    		a change?).  [Aside: It's not clear to me why this is
    		needed, but I'll include it since Mallikarjun asked for it.]
    
    B:	- New Resegmenting Data SNACK code defined to distinguish behavior
    		in "MAY resegment" and "MUST NOT" resegment cases.
    	- Initiators MUST drop response and issue a status SNACK when
    		a Resegmenting Data SNACK is issued for a command.
    	- Targets MUST NOT resegment in response to an ordinary Data
    		SNACK and MAY resegment in response to a Resegmenting
    		Data SNACK.
    	- No limits appear to be needed on when Data SNACKs can be
    		issued.
    
    C:	- Resegmentation is forbidden.  Initiators MUST NOT issue
    		Data SNACKs that require resegmentation, Targets MUST
    		NOT resegment - Data SNACKs always return "exact
    		replicas" of original PDUs.
    	- No new Data SNACK code, no need to drop otherwise good
    		SCSI responses.
    
    
    +-----------------+-----------+-----------+-----------+
    |	Attribute	|	A	|	B	|	C	|
    
    +-----------------+-----------+-----------+-----------+
    | SNACK codes	|	1	|	2	|	1	|
    +-----------------+-----------+-----------+-----------+
    | Quiesce reads 	|	No	|	No	|	Yes	|
    +-----------------+-----------+-----------+-----------+
    | Command state	| initiator |  target   |	No	|
    +-----------------+-----------+-----------+-----------+
    | Response drop	| cmd state	| RD SNACK 	|	No	|
    +-----------------+-----------+-----------+-----------+
    | Data reseg	|	Yes	|  new code	|	No	|
    +-----------------+-----------+-----------+-----------+
    | Integrity risks	|	2	|	1	|	1	|
    +-----------------+-----------+-----------+-----------+
    | Risk removal	|  target,1 |	No	|  target,1	|
    +-----------------+-----------+-----------+-----------+
    
    Attribute Explanation:
    
    SNACK codes: Number of Data SNACK codes.  Fewer is better.
    Quiesce reads: Need to quiesce reads to which Data SNACK
    	data recovery is applicable at in order to change
    	initiator's receive Data PDU size limit.  No is better
    Command state: Whether/where receive Data PDU size change
    	requires state per Data-SNACK-recoverable command outstanding
    	at the time of the change. No is better, other values
    	indicate where the state is kept.  This is referred to
    	as "command state" for short below.
    Response drop: Whether and how initiator MUST drop
    	and retry SCSI responses to deal with Data SNACK
    	resegmentation.  "cmd state" = based on
    	command state (previous item), "RD SNACK"
    	= based on whether a Resegmenting Data SNACK has
    	been issued.  No is better, as having to drop
    	a SCSI response with good status is peculiar.
    Data reseg: Data can be resegmented in response to some
    	form of Data SNACK.  Not clear what's better here.
    	"new code" = only in response to new Resegmenting
    	Data SNACK code and is as good/bad as Yes.
    Integrity Risks: This is going to be controversial.  It's
    	is a count of risks to data integrity in error cases.
    	The error cases are subtle as they're based on errors
    	in error recovery.  The risks are:
    	- (A and B) Failure to drop a response and issue a
    		status SNACK can lead to incomplete data in the
    		face of Data PDU non-delivery (e.g., header corruption).
    		Can be detected but not prevented by the target.
    	- (A only) BegRun != 0 can cause the wrong data to be
    		returned when resegmentation happens.  Preventing
    		this for A requires command state at target.  Not
    		applicable to B because BegRun = 0 is required
    		for the new Resegmenting Data SNACK code and is
    		easily checked by the target.
    	- (C only) If the initiator issues a Data SNACK that
    		causes the target to resegment, bad things
    		happen, as the protocol doesn't support this.
    		Can be prevented by command state at the target.
    
    - Risk removal: indicates that the latter two risks can be
    	removed by adding command state to the target to
    	catch the initiator misbehavior.  For A, this would be
    	"Targets MUST check that BegRun=0 when a Data SNACK
    	would result in resegmentation".  For C this would
    	be "Targets MUST check that Data SNACKs do not
    	cause resegmentation".
    
    If the BegRun = 0 requirement is removed, B goes to No command
    state, and A goes to 1 integrity risk and No risk removal -
    the net effect on the comparison of A and B is close to a
    wash, but C gets a serious plus for no "metadata explosion".
    	
    
    Observations
    
    (1) The new factor in this message is the interaction of
    	BegRun != 0 with Data resegmentation.  Unless I've
    	missed something major, it looks like BegRun MUST
    	be zero when resegmentation happens in
    	order to avoid the "metadata explosion".
    (2) I think the right versions of A and C to compare
    	are the ones with extra state to catch the BegRun
    	and "resegmenting attempted but is forbidden" cases.
    (3) With (2), A has to keep command state on both sides of the
    	connection, B and C only need it at the target.  B pays
    	for less state with the additional Data SNACK type and
    	code to support it.  C has no integrity risks (never needs
    	a status SNACK, hence can't screw it up), and is somewhat
    	simpler (never has to drop a good response), but pays for
    	it by not supporting resegmentation which causes 
    	a need to quiesce data-recoverable reads in order to
    	change the PDU size.
    
    Analysis:
    
    - I prefer the extra Data SNACK type code to the state needed
    	to enforce the BegRun=0 condition for resegmentation at
    	the target.  (A < B).  I prefer C's never having to drop
    	a SCSI response with good status to the additional complexity
    	of resegmentation, even though C forces some commands to
    	be quiesced in the rare case of a Data PDU size change (C > A, B).  
    
    - I strongly object to the introduction of ULPDU containment
    	into this discussion - that all but reopens the framing tarpit
    	that I spent a great deal of time and effort getting to closure.
    
    My conclusion is that I prefer C, with B as a second choice,
    but this is no longer as clear-cut as I initially thought,
    as some of the distinctions above are fine.  The fact that
    that resegmentation continues to be difficult to get right
    (e.g., Mallikarjun seems to have missed the "metadata explosion"
    required by his preferred option A), suggests to me that it
    ought to be left out (C) or isolated in a fashion that makes it
    easy not to use (B).
    
    Sorry about the length of this message - this is somewhat subtle
    stuff.
    
    Thanks,
    --David
    
    p.s.  I'm in Yokohama typing this on the IETF wireless network
    - serious round of applause and thanks to the WIDE project
    folks (local hosts) for how easy and convenient this is to use.
    
    ---------------------------------------------------
    David L. Black, Senior Technologist
    EMC Corporation, 42 South St., Hopkinton, MA  01748
    +1 (508) 249-6449            FAX: +1 (508) 497-8018
    black_david@emc.com       Mobile: +1 (978) 394-7754
    ---------------------------------------------------
    


Home

Last updated: Sun Jul 14 00:18:49 2002
11311 messages in chronological order