|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI - Change Proposal X bitSantosh, The draft DOES NOT ALLOW commands to be reissued on another connection without an intervening logout. Julo Santosh Rao <santoshr@cup.hp.com> Sent by: santoshr@cup.hp.com 25-10-01 01:44 Please respond to Santosh Rao To: Julian Satran/Haifa/IBM@IBMIL cc: Subject: Re: iSCSI - Change Proposal X bit Julian, The one scenario where I do see stale CmdSNs creating a problem is the following : - 2 connections. - CmdSN 3 issued on connection 1. - Initiator does not get CmdSN ack for 1 and re-issues CmdSN 1 on connection 2. - CmdSN ack for 1 received on connection 2. All further I/O traffic on connection 2 only (nothing on connection 1) and CmdSN sequence wraps around. - Finally, connection 1 clears up and CmdSN 1 is delivered on connection 1. In the above situation, the stale CmdSN 1 is a problem. However, the problem only occurs if the draft allows previously issued commands to be re-issued on a different connection without first logging out the previous connection. Why is it not possible to disallow commands to be re-issued on a different connection unless the original connection used for that command was first logged out successfully ? This would avoid this stale CmdSN on wrap around problem, as well as avoid sporadic ULP timeout of I/Os due to race condition b/n initiator re-issuing command on another connection and target sending CmdSN update on the first connection. Thanks, Santosh ps : I still don't see how your scenario below holds good. When the target ack's CmdSNs upto 8 on connection 3, it has received all CmdSNs. Hence, there can be no stalled commands on connection 2. However, a different scenario which does cause a problem is the one I describe above. Julian Satran wrote: > > Santosh, > > There is nothing in a command that arrives late on a link (as in the > example in which it was sent redundantly) to distinguish it from a new > (valid) command. > > This wraparound problem exists in all protocols - even in TCP, and we use > the CmdSN per session in the same fashion TCP uses sequence numbers per > connection - and it is solved in different ways (TCP uses time-stamps). > > The NOP is meant to solve that wrap-around problem. > > I am sure that when rereading the example you will see the issue. > > Julo > > Santosh Rao <santoshr@cup.hp.com> > Sent by: santoshr@cup.hp.com > 24-10-01 18:29 > Please respond to Santosh Rao > > > To: Julian Satran/Haifa/IBM@IBMIL > cc: ips@ece.cmu.edu > Subject: Re: iSCSI - Change Proposal X bit > > > > Julian, > > Some comments on the below quoted scenarios : > > > session has 3 connections > > on connection 1 I->T c1,c2,c3,C6 > > on connection 2 I->T c4,c5,c7,c8 > > Target receives 1,2,4,5,7,8 (miss 3 and 6) and acks 1 & 2 > > Initiator closes 1 and resends c3, c4, c5,c7,c8 on connection 2 and > 6 > > on connection 3 > > target receives all and starts executing and acks 8 on connection 3 > but > > connection 2 stalls after c3 for a LONG TIME > > then (after 2 full sequence wraps) connection 2 is gets alive and > > delivers c4,c5 etc (that are now valid) > > When the target acks CmdSN 8 on connection 3, it has, in effect, sent > CmdSN ack's for CMdSNs 3,4,5,6,7,8. This implies that the commands with > CmdSN 3, 4, 5, 7, & 8 were received by the target on connection 2 and > their processing was commenced. > > Hence, the following does not make sense : > > > connection 2 stalls after c3 for a LONG TIME > > then (after 2 full sequence wraps) connection 2 is gets alive and > > delivers c4,c5 etc (that are now valid) > > c4, c5, etc were already delivered to the target and are not being > re-delivered. There is no problem in this case. (??). > > Take the next scenario : > > 2 connections: > > > > connection1 I->T c3,c4,c5 > > status of 3 contains ack up to 6 and it and all other statuses are > > lost > > connection2 resend c3, c4 & c5 (no logout) and those are executed! > > Since the initiator got CmdSN ack's upto 6, the initiator should not be > re-issuing these I/Os ?? > > I still don't see justification to require that initiators send a > immediate NOP-OUT in the manner being advocated. > > On a more fundamental note, I see some issues with the initiator being > allowed to re-issue the commands on a different connection without > having first logged out the previous connection successfully. I see > nothing in the draft that suggests such behaviour, while at the same > time, it is not forbidden. > > By resorting to command retries on a different connection in an attempt > to plug the hole, without first logging out the previous connection, the > initiator is susceptible to encountering I/O failure of that I/O due to > ULP timeout. > > Here's the scenario why such recovery should not be allowed : > - Initiator sends CmdSN 3 on connection 1. > - No CmdSN updates for a while and initiator re-sends CmdSn 3 on > connection 2. > - At the same time, target has sent CmdSN ack's for CmdSN 3 on > connection 1. > > - Initiator has transferred the command allegiance on its side from > connection 1 to connection 2 and is attempting the command on connection > 2. However, the command does not go through, since the (ExpCmdSN, > MaxCmdSN) window has advanced and the trget discards the command. > > - Target sends in data and/or R2T and/or status for CmdSN 3 on > connection 1. Since the initiator is not expecting any traffic for that > I/O on connection 1, it discards any PDUs received on that connection 1 > for which no I/O state existed. > > In the above scenario, initiator will never get a CmdSN ack on > connection 2 and will never be able to plug the hole despite repeated > retries, finally, causing a ULP timeout, followed by session recovery. > > Given the above scenario, I suggest that the initiator must only > re-issue commands on the same connection, and can re-issue them on > another connection only following a successful logout. > > Comments ? > > Thanks, > Santosh > > Julian Satran wrote: > > > > Santosh, > > > > The scenarios I am talking about are all derivatives of an initiator > trying > > to plug-in holes and switching connections. > > As the initiator does know the "extent" of a hole it can send-out > commands > > that he did not have to. > > I have sent the attached not to Mallikarjun a while ago. I think that > > there might be many of this kind. I am also aware that X bit by itself > > might have some bad scenarios but the new proposal fixes them all. > > > > Julo > > > > _____________________________ > > > > Mallikarjun, > > > > Take the following sequence scenario: > > > > session has 3 connections > > on connection 1 I->T c1,c2,c3,C6 > > on connection 2 I->T c4,c5,c7,c8 > > Target receives 1,2,4,5,7,8 (miss 3 and 6) and acks 1 & 2 > > Initiator closes 1 and resends c3, c4, c5,c7,c8 on connection 2 and > 6 > > on connection 3 > > target receives all and starts executing and acks 8 on connection 3 > but > > connection 2 stalls after c3 for a LONG TIME > > then (after 2 full sequence wraps) connection 2 is gets alive and > > delivers c4,c5 etc (that are now valid) > > > > That is not a very likely scenario, I admit, but it is possible. > > With X bit I could not find any such scenario since an X either follows > a > > good one on the same connection or can be safely discarded. > > I suspect that there are some more scenarios that involve immediate > > commands or commands that carry their own ack in the status and are > acked > > like: > > > > 2 connections: > > > > connection1 I->T c3,c4,c5 > > status of 3 contains ack up to 6 and it and all other statuses are > > lost > > connection2 resend c3, c4 & c5 (no logout) and those are executed! > > > > I think we can avoid those be requiring a NOP exchange before reissuing > a > > command on a new connection or reissue the command with a task > management > > (that has an implied ordering) but why do it if X is an obvious and safe > > solution. > > > > Julo > > > > Regards, > > Julo > > > > > > "Mallikarjun > > C." To: Julian > Satran/Haifa/IBM@IBMIL > > <cbm@rose.hp.c cc: > > om> Subject: Re: iscsi : X bit > in SCSI Command PDU. > > > > 08-10-01 21:45 > > Please respond > > to cbm > > > > > > > > Julian, > > > > We currently have the following specified in section 2.2.2.1 - > > > > "The target MUST NOT transmit a MaxCmdSN that is more than > > 2**31 - 1 above the last ExpCmdSN." > > > > It appears to me that the above is sufficient to ward off the > > accidents of the sort you describe. Do you think otherwise? > > -- > > Mallikarjun > > > > Mallikarjun Chadalapaka > > Networked Storage Architecture > > Network Storage Solutions Organization > > MS 5668 Hewlett-Packard, Roseville. > > cbm@rose.hp.com > > > > Julian Satran wrote: > > > > > > Mallikarjun, > > > > > > There is at least one theoretical scenario in which an "old" command > > > may appear in a "new window" and be reinstantiated. > > > At 10Gbs and several connection that does not take months. With X the > > > probability is far lower (not 0). I have no other strong arguments > > > but I am still thinking. Matt Wakeley that insisted on it (against > > > me) had some other argument that I am trying to find (I am note > > > remembering). > > > > > > Julo > > > > > > "Mallikarjun C." > > > <cbm@rose.hp.com> To: Julian > > > Satran/Haifa/IBM@IBMIL > > > 08-10-01 20:39 cc: > > > Please respond to cbm Subject: Re: iscsi : X > > > bit in SCSI Command PDU. > > > > > > > > > > > > Julian, > > > > > > Now that you put me on the spot, :-), my response - > > > > > > Santosh argued with me privately that X-bit no longer serves a > > > useful purpose after the advent of task management commands to > > > reassign. My response was that it never was a requirement per se, > > > but always a "courtesy" extended by the initiator to help the > > > target. I also suggested that X-bit may be considered for its > > > usefulness in debugging. > > > > > > He still had some (very reasonable) comments for simplification > > > - the most appealing of which (to me) was the opportunity to do > > > away with the X-bit checking for *every* command PDU that the target > > > has to endure now. > > > > > > If I missed a legitimate use of X-bit, please comment. Do you > > > think it is a protocol requirement per se? I couldn't justify > > > to myself so far (except the Login). > > > > > > Regards. > > > -- > > > Mallikarjun > > > > > > Mallikarjun Chadalapaka > > > Networked Storage Architecture > > > Network Storage Solutions Organization > > > MS 5668 Hewlett-Packard, Roseville. > > > cbm@rose.hp.com > > > > > > > > > > > > Julian Satran wrote: > > > > > > > > Santosh, > > > > > > > > I am not sure you went through all scenarios. A conversation with > > > your > > > > colleague - Mallikarjun - and getting through the state table may go > > > a > > > > long way to clarify the need for X. > > > > > > > > And I am sure that by now you found yourself several . > > > > > > > > Julo > > > > > > > > Santosh Rao > > > > <santoshr@cup.hp.com> To: IPS Reflector > > > > Sent by: owner-ips@ece.cmu.edu <ips@ece.cmu.edu> > > > > cc: > > > > 06-10-01 01:56 Subject: iscsi : X > > > > Please respond to Santosh Rao bit in SCSI Command PDU. > > > > > > > > > > > > > > > > All, > > > > > > > > With the elimination of command relay from iscsi [in the interests > > > of > > > > simplification ?], I believe that the X bit in the SCSI Command PDU > > > > can > > > > also be removed. As it exists today, the X bit is only being used > > > for > > > > command restart, which is at attempt by the initiator to plug a > > > > potential hole in the CmdSN sequence at the target. It does this on > > > > failing to get an ExpCmdSN ack for a previously sent command within > > > > some > > > > timeout period. > > > > > > > > Given the above usage of command restart, no X bit is required to be > > > > set > > > > in the SCSI Command PDU when command re-start is done. > > > > > > > > Either : > > > > (a) the target had dropped the command earlier due to a digest > > > error, > > > > in > > > > which case, the command restart plugs the CmdSN hole in the target. > > > > > > > > [OR] > > > > > > > > (b) the target had received the command and was working on it, when > > > > the > > > > initiator timed out too soon and attempted a command restart to plug > > > > [what it thought was] a possible hole in the CmdSN sequence. > > > > > > > > In case (a), no X bit was required, since the target knows nothing > > > of > > > > the original command. In case (b), no X bit is required again, since > > > > the > > > > (ExpCmdSN, MaxCmdSN) window would have advanced and the target can > > > > silently discard the received retry and continue working on the > > > > original > > > > command received. > > > > > > > > Removal of the X bit in the SCSI Command PDU has the following > > > > benefits > > > > : > > > > > > > > a) The CmdSN rules at the target are simplified. No need to look at > > > X > > > > bit, only validate received CmdSN with (ExpCmdSN, MaxCmdSN) window. > > > > > > > > b) The reject reason code "command already in progress" can be > > > > removed. > > > > There's no need for this reject reason code anymore, since X bit > > > > itself > > > > is not required, and the targets can silently discard commands > > > outside > > > > the command window and continue to work on the original instance of > > > > the > > > > command already being processed at the target. > > > > > > > > c) Less work for the target and less resources consumed since it no > > > > longer needs to generate a Reject PDU of type "command in progress". > > > > It > > > > can just silently discard any command PDU outside the (ExpCmdSN, > > > > MaxCmdSN) window. > > > > > > > > d) Less code for the target, since it does not need : > > > > - any Reject code paths when it receives X bit command PDUs that are > > > > already in progress. > > > > - No special casing of CmdSN checking rules. > > > > - No overheads of verifying a received command based on its > > > initiator > > > > task tag, to check if the task is currently active, prior to sending > > > a > > > > Reject response with "command in progress". > > > > > > > > Comments ? > > > > > > > > Thanks, > > > > Santosh > > > > > > > > -- > > > > ################################## > > > > Santosh Rao > > > > Software Design Engineer, > > > > HP-UX iSCSI Driver Team, > > > > Hewlett Packard, Cupertino. > > > > email : santoshr@cup.hp.com > > > > Phone : 408-447-3751 > > > > ################################## > > > > > > Santosh Rao > > <santoshr@cup. To: IPS Reflector > <ips@ece.cmu.edu> > > hp.com> cc: > > Sent by: Subject: Re: iSCSI - Change > Proposal X bit > > owner-ips@ece. > > cmu.edu > > > > > > 23-10-01 22:50 > > Please respond > > to Santosh Rao > > > > > > > > Julian Satran wrote: > > > > > > However in order to drop "old" commands that might in the pipe on a > > > sluggish connection - removing the X bit will require the initiator to > > > issue an immediate NOP requiring a NOP response on every open > connection > > > whenever CmdSN wraps around (becomes equal to InitCmdSN). > > > > Julian, > > > > Can you please explain further the corner case you are describing above > > ? Are you suggesting that special action should be taken every time > > CmdSN wraps around, in case there were holes in the CmdSN sequence at > > the wrap time ? Why is that ? > > > > Here's my understanding of how this plays out : > > > > Rule 1) > > The CmdSN management rules at the target should be handling CmdSN wrap > > case and the initiator cannot issue more than 2^32 -1 commands beyond > > the last ExpCmdSN update it has received from the target, since the > > target MUST NOT transmit a MaxCmdSN that is more than 2**31 - 1 above > > the last ExpCmdSN. (per Section 2.2.2.1) > > > > Rule 2) > > Any holes that occur in the CmdSN sequence are attempted to be plugged > > by the initiator by re-issuing the original command. If the CmdSN never > > got acknowledged and the I/O's ULP timeout expired, the initiator MUST > > perform session recovery. (per Section 8.6) > > > > Thus, going by the above 2 rules, if the CmdSN sequence wraps upto > > ExpCmdSN, the initiator will not be able to issue further commands, > > since the target will keep the CmdSN window closed. The window can only > > re-open when the CmdSN holes are plugged allowing ExpCmdSN and thereby, > > MaxCmdSN to advance. (rule 1 above). > > > > Under the above circumstances, the initiator will possibly try to plug > > the CmdSN hole by re-issuing the original command. It may do this 1 or > > more times before its ULP timeout expires. Either the holes get plugged > > and the windoe re-opens, or ULP timeout occurs without the corresponding > > CmdSN for that I/O having been acknowledged, resulting in session > > logout. (rule 2 above). > > > > What is required over and beyond the above ? Why does removal of X-bit > > require an immediate NOP to be issued every time CmdSN wraps and a hole > > exists in the CmdSN sequence (??). > > > > Regards, > > Santosh -- ################################## Santosh Rao Software Design Engineer, HP-UX iSCSI Driver Team, Hewlett Packard, Cupertino. email : santoshr@cup.hp.com Phone : 408-447-3751 ##################################
Home Last updated: Thu Oct 25 14:17:36 2001 7397 messages in chronological order |