|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI: need for new data SNACK code?> > The flag per task is not needed - I'd expect the Target to look > > at the Data PDUs it would have to resend, check them against the max > > Data PDU size for this connection and fail the regular SNACK if any > > PDU is too large > > I am afraid there's no free lunch here. In this description, you're now > expecting targets to maintain the PDU size of every PDU that it shipped > for each of the tasks, which causes a metadata explosion. Targets already have to do that in order to cope with Data SNACKs with non-zero BegRun after a PDU size change. To make this concrete, suppose a target has shipped 4 PDUs with 4K of data each (DataSN 0-3) followed by 4 PDUs with 8K each (DataSN 4-7) and gets a Data SNACK for PDU 5 to the end - how does it figure out that the requested data starts 24K from the beginning? Skipping 5 x 4K of data starts at 20K (middle of PDU 4, wrong), skipping 5 x 8K starts at 40K (PDU 7, also wrong). It looks like the target has to maintain the old size (4K), the new size (8K), and the DataSN that the new size started at (4), which is just enough metadata to "maintain the PDU size of every PDU that it shipped" ... The only obvious way I see to eliminate this "metadata explosion" is to require BegRun to be zero if there's any possibility of resegmentation. That was part of my original comment, but seems to have been dropped. If BegRun=0 is required when resegmentation occurs, then a flag per task would be needed in the target to replace the "metadata explosion". OTOH, this is another way for an initiator to cause serious problems - if it issues a Data SNACK with BegRun != 0 and the target resegments to the new PDU size, the initiator gets the wrong data. > >If the permission is not used, the Initiator's > > status SNACK is not needed but does no harm. > > Well, the point is - shouldn't the target be detecting these > obvious bugs, and attempting > recovery/fix for these errors (it's a clear disconnect b/n > target and initiator state). Seems like > additional complexity on either end - to cover implementation > bugs wrt prior synchronization. It's not an "obvious bug" or an error. There are two different algorithms for a Data SNACK - one in which target resegmentation is prohibited and the initiator does not need to use a Status SNACK due to resegmentation (but may still do so for other reasons), and one in which target resegmentation is permitted and the initiator MUST use a Status SNACK. Switching from one algorithm to the other changes behavior at *both* the target and initiator - the additional Data SNACK code serves to ensure that this change is synchronized by explicitly communicating which algorithm to use. Both algorithms work as long as both parties agree on which one is being used. The additional Data SNACK code puts the initiator unambiguously in charge of algorithm selection. > > As the complexity of a protocol increases, that synchronized > > state machine assumption becomes more prone to failure. > > I think this is where the major disconnect is between us. As > I responded to Dave > Sheehy yesterday, the iSCSI protocol specification *mandates* > that a target must > ship "exact replicas" of the data PDUs barring certain header > fields unless the > PDU size was changed by an intermediate successful text > negotiation. What you're > suggesting is: despite this mandate, target may resegment > illegally, so let's define a > new data SNACK code with identical wire semantics. Actually, I'm much more concerned about an Initiator that doesn't correctly track when the "successful text negotiation" took place with respect to the multitude of outstanding commands it has in various states. The negotiation is not synchronous with respect to the command stream unless all commands on the connection are quiesced, which Mallikarjun does not want to do. If an initiator fails to catch the fact that a target resegmented, the result is corrupt/wrong data for a READ or similar command. > >and for an initiator to expects to be able to do > > this with uninterrupted high performance is unrealistic > .. > >right sort of incentives in discouraging > > initiators from changing the max Data PDU size. > > You're making an incorrect assumption here that it's just the > initiators that are likely to change the max PDU size. Either party > can do it [... snip ...] I don't think so ... MaxRecvDataSegmentLength is "Declarative", so only the initiator can change the initiator's MaxRecvDataSegmentLength. The target can change the target's MaxRecvDataSegmentLength, but that's not relevant to this discussion because targets don't issue SNACKs of any form to initiators. > Now, on to your proposal... > > > That strikes me as a productive direction that I could see enforcing > > An initiator that wants to be able to issue a Data SNACK for > > some or all of its commands then has to ensure that no such > > commands are outstanding when/while it changes (in particular > > reduces) the max Data PDU size. > > I am afraid you may have misunderstood what I was suggesting. Sorry for being unclear. I was following the line of thinking behind Mallikarjun's proposal a) to a conclusion beyond where he took it. I know Mallikarjun didn't suggest removing resegmenting Data SNACKs, and I didn't mean to ascribe that position to him. > There's one weird corner case in the simplified option you're > suggesting here - when a target wants to initiate a max PDU size > change, it cannot know when the initiator is likely to quiesce the > I/Os, nor there's a way to tell the initiator to stop. The "weird corner case" does not exist because MaxRecvDataSegmentLength is "Declarative" - see above. > With that said, let me suvey the available options: Unfortunately, there are a bunch of things wrong with the survey. > Option.A > - Keep the rev13 text, plus add the two additional > text segments I proposed > on the beginning of this thread (initiators must > drop status in one case, SNACK > must be issued only before the status is ack'ed), > *and* add "no data SNACKs while any text negotiation is on". > Pros: - No need for a new data SNACK code with > identical wire semantics > - Can allow the PDU size change to happen > with no wait for quiescing > any long running writes/reads (and > those operations too benefit from ULPDU > containment from this changed PDU size). I disagree with "identical wire semantics" - see discussion of two Data SNACK algorithms above, but agree that not introducing a new Data SNACK code is a Pro. ULPDU containment is irrelevant (more below), and writes do not need to be quiesced, although reads would. > Cons: - Additional complexity (compared to the > standard data SNACK) on the > initiator to drop the status SNACK; > and to mark all active tasks while the > PDU change had happened, so their > statuses can dropped if necessary. Con: Failure to drop SCSI response with good status when required can lead to incorrect/incomplete data being returned. > Option.B > - Same as Option.A, but add the new resegmenting-Data SNACK code per > David's Last Call comment. > Pros: - Precludes surprises due to implementation > errors (also a con, see below) Pro: allows simpler implementations that don't track the state required for Option A. Places the responsibility for state tracking for resegmentation on the target where the resegmentation occurs. > - Can allow the PDU size change to happen > with no wait for quiescing > any long running writes/reads (and > those operations too benefit from ULPDU > containment from this changed PDU size). Only need to quiesce reads; writes aren't affected. ULPDU containment is irrelevant - see below. Pro: Required response drop at the initiator is based on less state that is more directly connected with the Data SNACK that creates the need for the response drop. > Cons: - Attempt to address an implementation error > by protocol means, could be a > slippery slope. I disagree with this Con based on the login "if in doubt, negotiate" maxim. > - Requires a new data SNACK code which > both sides have to handle, and which > conveys completely redundant > information about the changed PDU size. I disagree with "completely redundant" statement. The information being conveyed is an instruction from the initiator about whether the resegmenting Data SNACK algorithm or the non-resegmenting Data SNACK algorithm is in use. > - Additional complexity (compared to the > standard data SNACK) on the > initiator to drop the status SNACK; and > to mark all active tasks while the > PDU change had happened, so their > statuses can be dropped if necessary. I disagree with the "and mark ..." text. Only tasks for which a resegmenting Data SNACK has been issued need be marked. Also, Con: Failure to drop response with good status when required can lead to incorrect/incomplete data being returned. > Option.C > - Completely disallow PDU size changes (initiated > by either party) while any tasks are active. That's not right because MaxRecvDataSegmentLength length is declarative (hence "(initiated by either party)" is wrong, and the description assumes that the Initiator wants to be able to issue Data SNACKs for all tasks (Data SNACKs don't apply to WRITE, etc.). A correct description would be: - Data SNACKs cannot be issued for a task if the initiator has changed its size limit on received Data PDUs while the task is outstanding. Initiators have to ensure that tasks for which Data SNACKs could be issued are not active during such a size limit change. > Rev13 text should be stripped of > the resegmenting discussion. Any > data SNACK always gets exact replicas. > > Pros: - Simpler approach, initiators don't need > to drop status PDUs, nor mark the > active tasks. > Cons: - Active tasks cannot dynamically adapt to > PMTU degradation, so ULPDU > containment isn't always guaranteed - > particulary painful for long-running tasks > for either party. Irrelevant - iSCSI does not require, recommend, or even describe ULPDU containment. TCP ULP framing was removed from the iSCSI draft many months ago after a long controversial discussion. I strongly object to attempting to sneak it back in. Besides, adaptation to PMTU degradation is the transport's job, and iSCSI should not be trying to do the transport's work for it, as it in general does not have direct access to PMTU information. > - Desired changes in max PDU size would > need to wait for all tasks to quiesce > and the statuses be acknowledged, > forcing a pause in the I/O activity. Not "all tasks", rather "reads for which Data SNACK recovery is required if errors occur". Pause in I/O activity is still a distinct possibility (e.g., if all/most tasks are such reads). > - Any text negotiation prompted by the > target can't be carried on until all > active I/Os are quiesced (even if the > target intends to negotiate other keys). Definitely wrong - target can't negotiate this, see above. > I prefer Option.A, followed by Option.B. Option.C's cons > appear to outweigh its simplicity, so wouldn't prefer that. Let's see - the description for option C is wrong, as is one of its Cons, and a second Con is irrelevant leaving only one of the three ... some additional consideration may be in order ... I have enough issues with the attempted survey, that I'm going to try a comparison table (and also to provide something concise for folks to shoot at): First, here's what I believe the A/B/C descriptions are. I'm going to assume for now that BegRun=0 is required for any resegmentation in order to avoid the "metadata explosion" problem. Applies to all options: - Initiators MUST NOT issue SNACKs after the status for the command is ACKed via ExpStatSN. A: - Initiators MUST drop SCSI response and issue a status SNACK when a Data SNACK is issued for a command that was outstanding during an initiator receive data PDU size limit change. - Targets MAY resegment in response to a Data SNACK when the initiator receive data PDU size limit changes while the command is outstanding. - Initiators MUST not issue new Data SNACKs during a change to initiator receive data PDU size limit (and wait for existing data SNACKs to complete before making such a change?). [Aside: It's not clear to me why this is needed, but I'll include it since Mallikarjun asked for it.] B: - New Resegmenting Data SNACK code defined to distinguish behavior in "MAY resegment" and "MUST NOT" resegment cases. - Initiators MUST drop response and issue a status SNACK when a Resegmenting Data SNACK is issued for a command. - Targets MUST NOT resegment in response to an ordinary Data SNACK and MAY resegment in response to a Resegmenting Data SNACK. - No limits appear to be needed on when Data SNACKs can be issued. C: - Resegmentation is forbidden. Initiators MUST NOT issue Data SNACKs that require resegmentation, Targets MUST NOT resegment - Data SNACKs always return "exact replicas" of original PDUs. - No new Data SNACK code, no need to drop otherwise good SCSI responses. +-----------------+-----------+-----------+-----------+ | Attribute | A | B | C | +-----------------+-----------+-----------+-----------+ | SNACK codes | 1 | 2 | 1 | +-----------------+-----------+-----------+-----------+ | Quiesce reads | No | No | Yes | +-----------------+-----------+-----------+-----------+ | Command state | initiator | target | No | +-----------------+-----------+-----------+-----------+ | Response drop | cmd state | RD SNACK | No | +-----------------+-----------+-----------+-----------+ | Data reseg | Yes | new code | No | +-----------------+-----------+-----------+-----------+ | Integrity risks | 2 | 1 | 1 | +-----------------+-----------+-----------+-----------+ | Risk removal | target,1 | No | target,1 | +-----------------+-----------+-----------+-----------+ Attribute Explanation: SNACK codes: Number of Data SNACK codes. Fewer is better. Quiesce reads: Need to quiesce reads to which Data SNACK data recovery is applicable at in order to change initiator's receive Data PDU size limit. No is better Command state: Whether/where receive Data PDU size change requires state per Data-SNACK-recoverable command outstanding at the time of the change. No is better, other values indicate where the state is kept. This is referred to as "command state" for short below. Response drop: Whether and how initiator MUST drop and retry SCSI responses to deal with Data SNACK resegmentation. "cmd state" = based on command state (previous item), "RD SNACK" = based on whether a Resegmenting Data SNACK has been issued. No is better, as having to drop a SCSI response with good status is peculiar. Data reseg: Data can be resegmented in response to some form of Data SNACK. Not clear what's better here. "new code" = only in response to new Resegmenting Data SNACK code and is as good/bad as Yes. Integrity Risks: This is going to be controversial. It's is a count of risks to data integrity in error cases. The error cases are subtle as they're based on errors in error recovery. The risks are: - (A and B) Failure to drop a response and issue a status SNACK can lead to incomplete data in the face of Data PDU non-delivery (e.g., header corruption). Can be detected but not prevented by the target. - (A only) BegRun != 0 can cause the wrong data to be returned when resegmentation happens. Preventing this for A requires command state at target. Not applicable to B because BegRun = 0 is required for the new Resegmenting Data SNACK code and is easily checked by the target. - (C only) If the initiator issues a Data SNACK that causes the target to resegment, bad things happen, as the protocol doesn't support this. Can be prevented by command state at the target. - Risk removal: indicates that the latter two risks can be removed by adding command state to the target to catch the initiator misbehavior. For A, this would be "Targets MUST check that BegRun=0 when a Data SNACK would result in resegmentation". For C this would be "Targets MUST check that Data SNACKs do not cause resegmentation". If the BegRun = 0 requirement is removed, B goes to No command state, and A goes to 1 integrity risk and No risk removal - the net effect on the comparison of A and B is close to a wash, but C gets a serious plus for no "metadata explosion". Observations (1) The new factor in this message is the interaction of BegRun != 0 with Data resegmentation. Unless I've missed something major, it looks like BegRun MUST be zero when resegmentation happens in order to avoid the "metadata explosion". (2) I think the right versions of A and C to compare are the ones with extra state to catch the BegRun and "resegmenting attempted but is forbidden" cases. (3) With (2), A has to keep command state on both sides of the connection, B and C only need it at the target. B pays for less state with the additional Data SNACK type and code to support it. C has no integrity risks (never needs a status SNACK, hence can't screw it up), and is somewhat simpler (never has to drop a good response), but pays for it by not supporting resegmentation which causes a need to quiesce data-recoverable reads in order to change the PDU size. Analysis: - I prefer the extra Data SNACK type code to the state needed to enforce the BegRun=0 condition for resegmentation at the target. (A < B). I prefer C's never having to drop a SCSI response with good status to the additional complexity of resegmentation, even though C forces some commands to be quiesced in the rare case of a Data PDU size change (C > A, B). - I strongly object to the introduction of ULPDU containment into this discussion - that all but reopens the framing tarpit that I spent a great deal of time and effort getting to closure. My conclusion is that I prefer C, with B as a second choice, but this is no longer as clear-cut as I initially thought, as some of the distinctions above are fine. The fact that that resegmentation continues to be difficult to get right (e.g., Mallikarjun seems to have missed the "metadata explosion" required by his preferred option A), suggests to me that it ought to be left out (C) or isolated in a fashion that makes it easy not to use (B). Sorry about the length of this message - this is somewhat subtle stuff. Thanks, --David p.s. I'm in Yokohama typing this on the IETF wireless network - serious round of applause and thanks to the WIDE project folks (local hosts) for how easy and convenient this is to use. --------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 42 South St., Hopkinton, MA 01748 +1 (508) 249-6449 FAX: +1 (508) 497-8018 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------
Home Last updated: Sun Jul 14 00:18:49 2002 11311 messages in chronological order |