|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iscsi : changes involving tgt portal group tag.John Hufferd wrote: > > 1. Not specifying a *port* in the Login dialogue explicitly > is something I am concerned could cause surprises down > the road. Given that a Login is meant to establish an I_T > nexus to a port (not to a node), I am rather surprised to see > the opposition simply because the proposal is coming late. > [Huff/] > based on my previous note, I do not buy this as a problem, since I do > not think this occurs without manual intervention and a significant > time interval (and most likely a power down). This means that it would > seem to be a natural thing for the initiator to attempt to rediscover > the connection. It seems that simple wordage that Jim Hafner has > suggested for the draft meets this issue. John, The procedure to re-config a target portal group is specific to each product and while it may be reasonable for some product installation manuals to recommend that all sessions be terminated and the target be taken offline for a re-config, I don't believe the spec should base its correct-ness upon this requirement. After all, with multi-connection session architecture, iscsi does allow for the target to continue to service active session traffic while being able to de-commision individual NICs and re-assign them to other portal groups. Consider also that such a network portal re-assign may only be a logical admin operation and does not always require the target to be taken offline or powered off. Since there is no iscsi protocol specified async notification and authentication mechanism that prevents connections from being accidentally established to incorrect portal groups, there is a possiblity of high-end arrays that advertise 24 x 7 support and online re-config capabilities, causing initiators to accidentally log into the wrong portal group during such re-configs. This can be solved in 2 steps : a) Have a new async pdu reason code that says "portal group re-configured" which allows currently logged-in initiator sessions to be notified and in turn, trigger re-discovery. b) Send the TPGT as a part of the login and require the tgt port to authenticate the port name/identifier upon login. I don't see these as major changes in the spec. They will block initiators from accidentally logging into the wrong portal groups, which needs to be protected against, since it can result in a number of side effects. If we want to minimize the changes, perhaps, the TPGT could be introduced as a login key, instead of being in the login pdu header, thereby, causing no change in the login pdu format. > > One of the reasons that I am concerned about late proposals, is that > the full review of impacts tends not to be done adequately. All my > experience has shown me that the largest number of errors and retrofits > occur with the last items added to a product, or spec. In fact I > believe there can be a strong correlation between time of arrival of a > change, and the probability of unforeseen impacts. So yes, I would > hate to make changes this late for a problem that I am not sure even > exist, and if it does, a rediscovery fixes the problem. I agree with your risk assessment. However, we do have a correctness issue in that the protocol does not authenticate port name/identifier upon login and does not have an async notification scheme to existing initiators which will prevent accidental [re-]login to incorrect portal groups. To depend on Unit Attentions to solve this problem is insufficient due to the following reasons : a) The "REPORTED LUNS DATA HAS CHANGED" UA can get cleared if the target were to be power cycled, prior to I/O activity from the initiator. b) UAs can get cleared if several other UA conditions that caused the target to exceed the number of concurrent UAs it can queue and deliver. c) Requiring that the initiator's legacy SCSI ULP stacks be modified in order to react to these UAs to address an iscsi specific problem is not a good idea, since, iscsi drivers must not require changes in the O.S. SCSI ULPs. Further, iscsi driver writers may not control the O.S. SCSI ULPs and the change may not be under their control. by the time the next I/O comes in from an initiator, and reacting to UAs requires a change in the legacy SCSI ULPs of the O.S' that will run iscsi, or requires all the iscsi initiators to be It is common for all other serial scsi transports (FCP, SRP) to perform port name/identifier authenticatio upon login. > [\Huff] > > 2. > manual reconfiguration (including a probable power down), that the > Target > > will maintain this key state .. > This and a lot of your other text below dwells on the unlikelihood of > target not maintaining the state - I agree with you. My point is > *not* that a target would, but the need to design the quickest and > most reliable way to communicate the loss of state back to the > initiator. > I believe addition of TPGT to the Login Request PDU accomplishes that. > > [Huff/] > Since I feel this type of thing is rare if a problem at all, This is debatable, since I can envision a field engineer using the portal group re-config as a quick customer site workaround upon detecting a bug in the multi-connection session implementation in a target, or a bug in the co-operation of multiple network portal types in supporting a multi-connection session. Without losing the connectivity of the target, it can be converted from a (2 x 4) connectivity array to a (1 x 8) connectivity array, causing minimal degradation in its performance and no downtime of the customer's data. (m x n => no. of portal groups x no. of network portals). Initial implementations of a new protocol are not without their share of bugs and it would be a useful feature to not have to bring down the target to perform such re-configs. > I think > that documentation about not affecting the TPG if state is outstanding, > and a suggestion to the Initiator that if an unusual amount of time > goes by with the Session Down, that a Rediscovery should be done (as if > they would not do that anyway). So, because of it being rare, if a > problem at all, I am not convinced that the right approach is to > optimize the response time to restart a session that has been down for > a long time anyway. If it take an extra discovery, I do not think this > is a problem. > [\Huff] We seem to be talking about different scenarios here ! I have called out an issue regarding the re-config of portal groups without requiring a down-time in the storage (i.e. no disruption to existing sessions), while you seem to be referring to a session being down for a long time above. We don't seem to be talking about the same scenario (?). Again, I agree that a product installation guide can resolve this issue by requiring all initiators to be quiesced and the storage to be taken offline for any re-config. However, this limitation should not be imposed on a scsi transport protocol for ensuring its correctness and should not limit implementation's capabilites of providing 24x7 uptime. Thanks in advance for considering all aspects of this issue. Regards, Santosh -- ################################## Santosh Rao Software Design Engineer, HP-UX iSCSI Driver Team, Hewlett Packard, Cupertino. email : santoshr@cup.hp.com Phone : 408-447-3751 ##################################
Home Last updated: Fri Mar 15 23:18:09 2002 9144 messages in chronological order |