RE: iSCSI: need for new data SNACK code?

To: cbm@rose.hp.com
Subject: RE: iSCSI: need for new data SNACK code?
From: Black_David@emc.com
Date: Sat, 13 Jul 2002 20:59:13 -0400
Cc: ips@ece.cmu.edu
Content-Type: text/plain;charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu
> > The flag per task is not needed - I'd expect the Target to look
> > at the Data PDUs it would have to resend, check them against the max
> > Data PDU size for this connection and fail the regular SNACK if any
> > PDU is too large
> 
> I am afraid there's no free lunch here.  In this description, you're now 
> expecting targets to maintain the PDU size of every PDU that it shipped
> for each of the tasks, which causes a metadata explosion.  

Targets already have to do that in order to cope with Data SNACKs with
non-zero BegRun after a PDU size change.

To make this concrete, suppose a target has shipped 4 PDUs with 4K of
data each (DataSN 0-3) followed by 4 PDUs with 8K each (DataSN 4-7) and
gets a Data SNACK for PDU 5 to the end - how does it figure out that
the requested data starts 24K from the beginning?  Skipping 5 x 4K of
data starts at 20K (middle of PDU 4, wrong), skipping 5 x 8K starts
at 40K (PDU 7, also wrong).  It looks like the target has to maintain
the old size (4K), the new size (8K), and the DataSN that the new size
started at (4), which is just enough metadata to "maintain the PDU
size of every PDU that it shipped" ...  

The only obvious way I see to eliminate this "metadata explosion"
is to require BegRun to be zero if there's any possibility of
resegmentation.  That was part of my original comment, but seems
to have been dropped.  If BegRun=0 is required when resegmentation
occurs, then a flag per task would be needed in the target to
replace the "metadata explosion".  OTOH, this is another way
for an initiator to cause serious problems - if it issues a
Data SNACK with BegRun != 0 and the target resegments to the
new PDU size, the initiator gets the wrong data.

> >If the permission is not used, the Initiator's
> > status SNACK is not needed but does no harm.
> 
> Well, the point is - shouldn't the target be detecting these 
> obvious bugs, and attempting
> recovery/fix for these errors (it's a clear disconnect b/n 
> target and initiator state).  Seems like
> additional complexity on either end - to cover implementation 
> bugs wrt prior synchronization.

It's not an "obvious bug" or an error.  There are two different
algorithms for a Data SNACK - one in which target resegmentation is
prohibited and the initiator does not need to use a Status SNACK
due to resegmentation (but may still do so for other reasons),
and one in which target resegmentation is permitted and the
initiator MUST use a Status SNACK.  Switching from one algorithm
to the other changes behavior at *both* the target and initiator -
the additional Data SNACK code serves to ensure that this change
is synchronized by explicitly communicating which algorithm to use.
Both algorithms work as long as both parties agree on which one
is being used.  The additional Data SNACK code puts the initiator
unambiguously in charge of algorithm selection.

> > As the complexity of a protocol increases, that synchronized
> > state machine assumption becomes more prone to failure.
> 
> I think this is where the major disconnect is between us.  As
> I responded to Dave
> Sheehy yesterday, the iSCSI protocol specification *mandates*
> that a target must 
> ship "exact replicas" of the data PDUs barring certain header 
> fields unless the
> PDU size was changed by an intermediate successful text 
> negotiation.   What you're
> suggesting is: despite this mandate, target may resegment 
> illegally, so let's define a
> new data SNACK code with identical wire semantics.

Actually, I'm much more concerned about an Initiator that doesn't
correctly track when the "successful text negotiation" took place
with respect to the multitude of outstanding commands it has in
various states.  The negotiation is not synchronous with respect
to the command stream unless all commands on the connection are
quiesced, which Mallikarjun does not want to do.  If an initiator
fails to catch the fact that a target resegmented, the result is
corrupt/wrong data for a READ or similar command.
  
> >and for an initiator to expects to be able to do
> > this with uninterrupted high performance is unrealistic 
> ..
> >right sort of incentives in discouraging
> > initiators from changing the max Data PDU size.
> 
> You're making an incorrect assumption here that it's just the 
> initiators that are likely to change the max PDU size.  Either party
> can do it [... snip ...]

I don't think so ... MaxRecvDataSegmentLength is "Declarative", so only
the initiator can change the initiator's MaxRecvDataSegmentLength.  The
target can change the target's MaxRecvDataSegmentLength, but that's
not relevant to this discussion because targets don't issue SNACKs
of any form to initiators.  

> Now, on to your proposal...
> 
> > That strikes me as a productive direction that I could see enforcing
> > An initiator that wants to be able to issue a Data SNACK for
> > some or all of its commands then has to ensure that no such
> > commands are outstanding when/while it changes (in particular
> > reduces) the max Data PDU size. 
> 
> I am afraid you may have misunderstood what I was suggesting.

Sorry for being unclear.  I was following the line of thinking behind
Mallikarjun's proposal a) to a conclusion beyond where he took it.  I
know Mallikarjun didn't suggest removing resegmenting Data SNACKs, and I
didn't mean to ascribe that position to him.

> There's one weird corner case in the simplified option you're 
> suggesting here - when a target wants to initiate a max PDU size
> change, it cannot know when the initiator is likely to quiesce the
> I/Os, nor there's a way to tell the initiator to stop.

The "weird corner case" does not exist because MaxRecvDataSegmentLength
is "Declarative" - see above.

> With that said, let me suvey the available options:

Unfortunately, there are a bunch of things wrong with the survey.
 
> Option.A
>          - Keep the rev13 text, plus add the two additional
>		text segments I proposed
>           on the beginning of this thread (initiators must 
>		drop status in one case, SNACK
>           must be issued only before the status is ack'ed), 
> 		*and* add "no data SNACKs while any text negotiation is on".
>          Pros:   - No need for a new data SNACK code with 
>				identical wire semantics
>                     - Can allow the PDU size change to happen 
>				with no wait for quiescing
>                       any long running writes/reads (and
> 				those operations too benefit from ULPDU
>                       containment from this changed PDU size).

I disagree with "identical wire semantics" - see discussion
of two Data SNACK algorithms above, but agree that not introducing a
new Data SNACK code is a Pro.  ULPDU containment is irrelevant (more
below), and writes do not need to be quiesced, although reads would.

>          Cons:   - Additional complexity (compared to the 
>				standard data SNACK) on the 
>                       initiator to drop the status SNACK; 
>				and to mark all active tasks while the 
>                       PDU change had happened, so their 
>				statuses can dropped if necessary.

Con: Failure to drop SCSI response with good status when required
can lead to incorrect/incomplete data being returned.

> Option.B
>           - Same as Option.A, but add the new resegmenting-Data SNACK code
per 
>              David's Last Call comment.
>           Pros:  - Precludes surprises due to implementation 
>				errors (also a con, see below)

Pro: allows simpler implementations that don't track the state
required for Option A.  Places the responsibility for state
tracking for resegmentation on the target where the resegmentation
occurs.

>                     - Can allow the PDU size change to happen 
>				with no wait for quiescing
>                       any long running writes/reads (and 
>				those operations too benefit from ULPDU
>                       containment from this changed PDU size).

Only need to quiesce reads; writes aren't affected.  ULPDU containment
is irrelevant - see below.

Pro: Required response drop at the initiator is based on less state that
	is more directly connected with the Data SNACK that creates the need
	for the response drop.


>          Cons:  - Attempt to address an implementation error 
>				by protocol means, could be a
>                       slippery slope.

I disagree with this Con based on the login "if in doubt, negotiate" maxim.

>                    - Requires a new data SNACK code which
> 				both sides have to handle, and which
>                       conveys completely redundant 
>				information about the changed PDU size.

I disagree with "completely redundant" statement.  The information
being conveyed is an instruction from the initiator about whether
the resegmenting Data SNACK algorithm or the non-resegmenting Data
SNACK algorithm is in use.

>                    - Additional complexity (compared to the 
>				standard data SNACK) on the 
>                       initiator to drop the status SNACK; and 
>				to mark all active tasks while the 
>                       PDU change had happened, so their 
>				statuses can be dropped if necessary.

I disagree with the "and mark ..." text.  Only tasks for which a
resegmenting Data SNACK has been issued need be marked.  Also,

Con: Failure to drop response with good status when required can lead
	to incorrect/incomplete data being returned.

> Option.C
>            - Completely disallow PDU size changes (initiated 
>			by either party) while any tasks are active.

That's not right because MaxRecvDataSegmentLength length is declarative
(hence "(initiated by either party)" is wrong, and the description
assumes that the Initiator wants to be able to issue Data SNACKs
for all tasks (Data SNACKs don't apply to WRITE, etc.).  A correct
description would be:

		- Data SNACKs cannot be issued for a task if the
			initiator has changed its size limit on received
			Data PDUs while the task is outstanding.  Initiators
			have to ensure that tasks for which Data SNACKs
			could be issued are not active during such a
			size limit change.

>			Rev13 text should be stripped of 
>			the resegmenting discussion.  Any 
>               data SNACK always gets exact replicas.
>
>            Pros:  - Simpler approach, initiators don't need 
>				to drop status PDUs, nor mark the
>                        active tasks.  
>            Cons:  - Active tasks cannot dynamically adapt to 
>				PMTU degradation, so ULPDU 
>                       containment isn't always guaranteed - 
>				particulary painful for long-running tasks
>                       for either party.

Irrelevant - iSCSI does not require, recommend, or even describe
ULPDU containment.  TCP ULP framing was removed from the iSCSI
draft many months ago after a long controversial discussion.  I
strongly object to attempting to sneak it back in.  Besides,
adaptation to PMTU degradation is the transport's job, and iSCSI
should not be trying to do the transport's work for it, as it
in general does not have direct access to PMTU information.

>                       - Desired changes in max PDU size would 
>				  need to wait for all tasks to quiesce
>                         and the statuses be acknowledged, 
>				  forcing a pause in the I/O activity.

Not "all tasks", rather "reads for which Data SNACK recovery
is required if errors occur".  Pause in I/O activity is still
a distinct possibility (e.g., if all/most tasks are such reads).

>                       - Any text negotiation prompted by the 
>				  target can't be carried on until all 
>                         active I/Os are quiesced (even if the 
>				  target intends to negotiate other keys).

Definitely wrong - target can't negotiate this, see above.

> I prefer Option.A, followed by Option.B.  Option.C's cons 
> appear to outweigh its simplicity, so wouldn't prefer that.

Let's see - the description for option C is wrong, as is one of
its Cons, and a second Con is irrelevant leaving only one of the
three ... some additional consideration may be in order ...

I have enough issues with the attempted survey, that I'm going
to try a comparison table (and also to provide something concise
for folks to shoot at):

First, here's what I believe the A/B/C descriptions are.  I'm going
to assume for now that BegRun=0 is required for any resegmentation
in order to avoid the "metadata explosion" problem.

Applies to all options:
	- Initiators MUST NOT issue SNACKs after the status for the
		command is ACKed via ExpStatSN.

A:	- Initiators MUST drop SCSI response and issue a status SNACK
		when a Data SNACK is issued for a command that was
outstanding
		during an initiator receive data PDU size limit change.
	- Targets MAY resegment in response to a Data SNACK when the
		initiator receive data PDU size limit changes while
		the command is outstanding.
	- Initiators MUST not issue new Data SNACKs during a change
		to initiator receive data PDU size limit (and wait for
		existing data SNACKs to complete before making such
		a change?).  [Aside: It's not clear to me why this is
		needed, but I'll include it since Mallikarjun asked for it.]

B:	- New Resegmenting Data SNACK code defined to distinguish behavior
		in "MAY resegment" and "MUST NOT" resegment cases.
	- Initiators MUST drop response and issue a status SNACK when
		a Resegmenting Data SNACK is issued for a command.
	- Targets MUST NOT resegment in response to an ordinary Data
		SNACK and MAY resegment in response to a Resegmenting
		Data SNACK.
	- No limits appear to be needed on when Data SNACKs can be
		issued.

C:	- Resegmentation is forbidden.  Initiators MUST NOT issue
		Data SNACKs that require resegmentation, Targets MUST
		NOT resegment - Data SNACKs always return "exact
		replicas" of original PDUs.
	- No new Data SNACK code, no need to drop otherwise good
		SCSI responses.


+-----------------+-----------+-----------+-----------+
|	Attribute	|	A	|	B	|	C	|

+-----------------+-----------+-----------+-----------+
| SNACK codes	|	1	|	2	|	1	|
+-----------------+-----------+-----------+-----------+
| Quiesce reads 	|	No	|	No	|	Yes	|
+-----------------+-----------+-----------+-----------+
| Command state	| initiator |  target   |	No	|
+-----------------+-----------+-----------+-----------+
| Response drop	| cmd state	| RD SNACK 	|	No	|
+-----------------+-----------+-----------+-----------+
| Data reseg	|	Yes	|  new code	|	No	|
+-----------------+-----------+-----------+-----------+
| Integrity risks	|	2	|	1	|	1	|
+-----------------+-----------+-----------+-----------+
| Risk removal	|  target,1 |	No	|  target,1	|
+-----------------+-----------+-----------+-----------+

Attribute Explanation:

SNACK codes: Number of Data SNACK codes.  Fewer is better.
Quiesce reads: Need to quiesce reads to which Data SNACK
	data recovery is applicable at in order to change
	initiator's receive Data PDU size limit.  No is better
Command state: Whether/where receive Data PDU size change
	requires state per Data-SNACK-recoverable command outstanding
	at the time of the change. No is better, other values
	indicate where the state is kept.  This is referred to
	as "command state" for short below.
Response drop: Whether and how initiator MUST drop
	and retry SCSI responses to deal with Data SNACK
	resegmentation.  "cmd state" = based on
	command state (previous item), "RD SNACK"
	= based on whether a Resegmenting Data SNACK has
	been issued.  No is better, as having to drop
	a SCSI response with good status is peculiar.
Data reseg: Data can be resegmented in response to some
	form of Data SNACK.  Not clear what's better here.
	"new code" = only in response to new Resegmenting
	Data SNACK code and is as good/bad as Yes.
Integrity Risks: This is going to be controversial.  It's
	is a count of risks to data integrity in error cases.
	The error cases are subtle as they're based on errors
	in error recovery.  The risks are:
	- (A and B) Failure to drop a response and issue a
		status SNACK can lead to incomplete data in the
		face of Data PDU non-delivery (e.g., header corruption).
		Can be detected but not prevented by the target.
	- (A only) BegRun != 0 can cause the wrong data to be
		returned when resegmentation happens.  Preventing
		this for A requires command state at target.  Not
		applicable to B because BegRun = 0 is required
		for the new Resegmenting Data SNACK code and is
		easily checked by the target.
	- (C only) If the initiator issues a Data SNACK that
		causes the target to resegment, bad things
		happen, as the protocol doesn't support this.
		Can be prevented by command state at the target.

- Risk removal: indicates that the latter two risks can be
	removed by adding command state to the target to
	catch the initiator misbehavior.  For A, this would be
	"Targets MUST check that BegRun=0 when a Data SNACK
	would result in resegmentation".  For C this would
	be "Targets MUST check that Data SNACKs do not
	cause resegmentation".

If the BegRun = 0 requirement is removed, B goes to No command
state, and A goes to 1 integrity risk and No risk removal -
the net effect on the comparison of A and B is close to a
wash, but C gets a serious plus for no "metadata explosion".
	

Observations

(1) The new factor in this message is the interaction of
	BegRun != 0 with Data resegmentation.  Unless I've
	missed something major, it looks like BegRun MUST
	be zero when resegmentation happens in
	order to avoid the "metadata explosion".
(2) I think the right versions of A and C to compare
	are the ones with extra state to catch the BegRun
	and "resegmenting attempted but is forbidden" cases.
(3) With (2), A has to keep command state on both sides of the
	connection, B and C only need it at the target.  B pays
	for less state with the additional Data SNACK type and
	code to support it.  C has no integrity risks (never needs
	a status SNACK, hence can't screw it up), and is somewhat
	simpler (never has to drop a good response), but pays for
	it by not supporting resegmentation which causes 
	a need to quiesce data-recoverable reads in order to
	change the PDU size.

Analysis:

- I prefer the extra Data SNACK type code to the state needed
	to enforce the BegRun=0 condition for resegmentation at
	the target.  (A < B).  I prefer C's never having to drop
	a SCSI response with good status to the additional complexity
	of resegmentation, even though C forces some commands to
	be quiesced in the rare case of a Data PDU size change (C > A, B).  

- I strongly object to the introduction of ULPDU containment
	into this discussion - that all but reopens the framing tarpit
	that I spent a great deal of time and effort getting to closure.

My conclusion is that I prefer C, with B as a second choice,
but this is no longer as clear-cut as I initially thought,
as some of the distinctions above are fine.  The fact that
that resegmentation continues to be difficult to get right
(e.g., Mallikarjun seems to have missed the "metadata explosion"
required by his preferred option A), suggests to me that it
ought to be left out (C) or isolated in a fashion that makes it
easy not to use (B).

Sorry about the length of this message - this is somewhat subtle
stuff.

Thanks,
--David

p.s.  I'm in Yokohama typing this on the IETF wireless network
- serious round of applause and thanks to the WIDE project
folks (local hosts) for how easy and convenient this is to use.

---------------------------------------------------
David L. Black, Senior Technologist
EMC Corporation, 42 South St., Hopkinton, MA  01748
+1 (508) 249-6449            FAX: +1 (508) 497-8018
black_david@emc.com       Mobile: +1 (978) 394-7754
---------------------------------------------------
Prev by Date: RE: iSCSI: need for new data SNACK code?
Next by Date: RE: iSCSI: DLB's Comment on SCSI Port Names
Prev by thread: Re: iSCSI: need for new data SNACK code?
Next by thread: RE: iSCSI: need for new data SNACK code?
Index(es):
- Date
- Thread
Home
Last updated: Sun Jul 14 00:18:49 2002
11311 messages in chronological order