SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: Review of the 07 draft



    
    Robert,
    
    Thanks for you kind opening and for taking the time to read it.
    Comments in text.
    
    Julo
    
    "Robert Griswold" <rgriswold@crossroads.com> on 03-08-2001 22:33:56
    
    Please respond to "Robert Griswold" <rgriswold@crossroads.com>
    
    To:   Julian Satran/Haifa/IBM@IBMIL
    cc:   "Bill Moody" <bmoody@crossroads.com>, <ips@ece.cmu.edu>
    Subject:  Review of the 07 draft
    
    
    
    Julian:
    
    First, great work on this initiative and please forgive my huge email.
    
    We have undertaken a review of the 07 version, and even though
    Crossroads will be at the meetings in London, we thought it would be
    prudent to post these observations on the current specification.  If we
    have completely misread the document, or have missed resolutions that
    have come across the reflector, we apologize up front.  I have also sent
    you some editorial only comments for your review.
    
    ++++++++++
    
    Section 1.2.2.1 (and others):
    The third paragraph up from the bottom of page 19 refers to the action
    that a target should take when receiving commands outside of a range,
    that are not correctly marked as immediate, with the right flags set.
    The question is in regards to the MUST instruction on "silently ignore".
    Assuming the I_T nexus is established, the devices are communicating and
    the target receives what could only correctly be termed a protocol error
    from the initiator it has established communication with, why would the
    target not respond with an error back?  If in fact the error is
    transport related, and the initiator really did send the correct PDU,
    then the target being mute on the reception of the this command leaves
    the initiator guessing.  There are other points in the specification
    where the target is instructed to "silently ignore."
    
    +++ This point has a long history remains of which you can find in the mail
    archives.  I and several other authors where inclined to "silently discard"
    whatever can't be repaired (e.g., bugs in the initiator, delayed packets or
    packets arriving late) and explicitly reject only what can be repaired.
    Debugging ease can be achieved by other means (event logs etc.). However
    many of our colleagues on the list wanted error rejects to be explicit
    whenever it is feasible and that is what you have now except for the cases
    in which discards are due to a "legitimate packet" arriving late (e.g.,
    from a previous transmission attempt that is already recovered).
    Discards are also better in preventing DOS attacks by stream replay++++
    
    Section 1.2.2.2:
    The fifth paragraph indicates that a "large difference" between StatSN
    and ExpStatSN may indicate a problem, but the number required by the
    specification as the maximum before an initiator must take action is 2.1
    Billion!  This seems like a very big number, and most implementers would
    want to take action long before this limit was reached.  What is the
    thinking behind this number?  Is there something we aren't
    understanding?
    
    +++ That is very much dependent on implementation and infrastructure (round
    trip delay). We assumed that at several hundreds you will start to worry :
    -)) +++
    
    
    Section 1.2.5:
    On page 25, the second full paragraph indicates that the definition of
    the use of target tags is not specifically specified by the protocol.
    While we see possible uses for Target Transfer Tags, we would like to
    understand if the author's feel that limiting the interpretation of its
    use to that described in 2.17.4 would be helpful.  Clearly this 32-bit
    field plays little role in the overall usefulness iSCSI command
    protocol, so someone must have felt that it would have been a great idea
    to have this in the specification.  Can that be described?
    
    +++ We envisioned TTTs to be used by the target to identify parts of the
    data stream that arrive in response of an R2T. Additionally they identify
    NOP-Outs that are sent in response to a polling Nop-IN and in general when
    valid indicate that the target expects something and identify the response
    (link it to the "cause") for the target. We mentioned that the protocol
    does not care about the value mainly to stress that, unlike ITT, the
    protocol has no uniqueness requirement for them and whenever they appear
    the LUN is also present and can serve as an additional differentiator.
    ++++
    
    Section 1.2.6
    This section covers the topic of connection termination, which is also
    covered in the discussion of the Logout command in section 2.14.  There
    seems to be some mingling of the concept of TCP logout with the concepts
    around termination of a communication channel for storage protocols.
    The concept of sending a logout, then waiting for the receiving end to
    do some clean-up, flushing, and or additional communication seems odd to
    the command.  The Logout directive should require no further
    communication, and should put neither initiator nor target into the
    state of expecting any such communication.  We would like to see the
    Logout command (not the request for a logout) be free of any
    expectations once sent, or received.  It seems logical that since the
    initiator is the responsible party for sending this command (we would
    also like to begin some discussion on a target only Logout command, like
    SRP is working on now) that it would have done all of the required
    cleanup and preparation before issuing the command.  Implementation of
    target code to make decisions on what outstanding commands it should or
    shouldn't give some credence to before breaking the connection seems out
    of place.
    
    +++ Logout has also history. We started without a logout but realized soon
    that we will need a synchronized logout to enable command allegiance change
    for recovery (that is possible only after logout). Besides it is a simple
    and painless way to do a clean shutdown of a connection. As for a target
    only logout that is what the Asynch Message does (after some timeouts)
    Please keep in mind that there is also a fine interplay between logout and
    recovery and that both initiator and target may decide to start some
    recovery action+++
    
    Section 1.2.7
    On page 28, in the fourth to last paragraph in this section, there is a
    description on how a target can assume an iSCSI Initiator Name, as
    learned through authentication.  This is a confusing paragraph, and we
    cannot determine if it is implying that the target should use the
    initiator name it has been given through some service, or that it was
    authorized to use by an initiator that discovered its iSCSI functions.
    Can this paragraph be further expanded to clarify how the name was
    determined?
    
    +++ This is getting thrown out (it is already) considering that T10 has
    gone a long way to expand the addressing of third party command source and
    destinatiantion +++
    
    Section 1.2.9.1
    The last paragraph on this section has as 42 words in it's first
    sentence, with the word "message" being used six times.  Suggested
    rewording:
    "Relying solely on the "message length" information from the iSCSI
    message header may make it impossible to find iSCSI message boundaries
    in subsequent TCP segments; due to the loss of a TCP segment containing
    the iSCSI message length."
    
    ++ OK +++
    
    Section 1.2.9.2
    The second full paragraph's last two sentences have odd wording.
    Suggested rewording:
    "In practice, though, iSCSI is not expected to handle infinitely long
    streams; stream addressing will wrap around at 2**32-1."
    
    
    +++ OK +++
    
    The first sentence of the next paragraph is difficult to interpret.
    Suggested rewording:
    "This model assumes that the iSCSI layer will deliver complete PDUs to
    underlying layers in single (atomic) operations."
    You may also want to consider adding words that explain the reasoning
    behind this assumption, as some who don't care about those layers may
    not understand why it would be a reasonable assumption.
    
    
    +++ I've made it:
    
    This model assumes that the iSCSI layer will deliver complete PDUs to
    underlying layers in single (atomic) operations.  The underlying layer doe
    not need to examine the stream content to discover the PDU boundaries.
    
    But please note that this whole sectionmight go away :-)
    
    ++++
    
    The rest of this section uses the word MUST a few times in the
    description of actions or implementation assumptions.  While the use of
    the word MUST may make complete sense to the topic, and the points they
    are being applied to, it seems out of place for an optional layer of the
    model.  If the Synch and Steering layer is truly optional, then how can
    the specification make claims to what MUST happen?
    
    +++ it says that IF you doit that MUST be the way to do it ++++
    
    Section 1.5.3
    The second paragraph on page 39 uses the term nexuses, but does not
    clearly build the relationship that the ISID rule is attempting to
    explain.  Can the level (I_T, I_T_L, I_T_L_x) of nexus be identified
    that is being referred to here, to help clarify the ISID rule mentioned
    in the previous paragraph?
    +++ I_T - will fix the text +++
    
    Section 2.2.2
    The definition of the AHS (see below, 2.2.3.1) seems to be covered as a
    global PDU property, but only op-code 0x01 (SCSI Command) seems to make
    use of it.  Since the AHS seems to be only needed to encapsulate an
    extended CDB, then why define its possible existence for all PDUs?  Can
    the AHS be used in other commands?
    +++ There is another one - BiDirectional command fields. +++
    
    Section 2.2.2.4
    This section defines the behavior of non-specific fields in op-code use,
    but not completely.  One portion of the field is explained (P/F bit),
    but the use of the rest of the field is left to the reader to find in
    the other op-codes.  Would it make more sense to pull the definition of
    the P/F bit out of this section, then simply tell the reader that the
    other bits are defined in the op-code specific sections?  Maybe tabulate
    the P/F usage in this section, to the op-code specific use?
    
    +++ Good point I'll remove the P/F text - it appears wherever it is used.
    +++
    
    Section 2.2.2.7
    The second sentence of this paragraph defines that the LUN field can be
    used for "some other purpose."  While there may be a good reason for
    overloading this field, we would like to see some more information on
    the nature of this use.  What about inserting "opcode-specific" in from
    of the word "purpose".
    
    +++ I've changed it to:
    
    1.1.1.1   LUN
    
       Some opcodes operate on a specific Logical Unit. The Logical Unit Number
       (LUN) field identifies which Logical Unit.  If the opcode does not
       relate to a Logical Unit, this field either is ignored or may be used in
       an opcode specific way.  The LUN field is 64-bits and it is to be
       formatted in accordance with [SAM2].
    
     ++++
    
    Section 2.2.3.1
    The inclusion of a bit that identifies if the CDB is extended should not
    live in an optional header.  Expecting that all implementations would
    correctly code the use of an optional header to return critical data
    that should be identified with the required portion of the CDB seems out
    of place.  We would suggest that the use of a reserved bit or a bit in a
    reserved byte be used, from the BHS, be used to identify an extended
    CDB.  Putting the spillover of the CDB into the AHS (section 2.3.6) is
    not an issue, but flagging if there is spillover should not be in the
    AHS.
    
    +++ That must be a "misreading" - We are talking about a coded field one of
    the values is Extended CDB. The whole code is the AHSType.  And if you have
    an extended CDB this header is not optional - it is the way to code the
    extended CDB (no other way is decribed)
    ++++
    
    Section 2.4.3
    This section has seen a lot of discussion, and we agree with the
    position that Rob Elliot and David Black took on the disassociation of
    the iSCSI responses from the ASC/ASCQ SCSI status codes.  While the
    discussion of ACA and it's relationship to iSCSI may not be resolved
    quickly, there needs to be a way to avoid polluting the expectation of
    the SAM status to the devices with the iSCSI response codes.
    
    +++ It is going to change. We have to find also a good code (ASC/ASQ) for
    Insufficient immediate data
    ++++
    
    Section 2.5.1
    This section seems to build on the work done in FCP for out of CDB
    delivery of task management.  But the behavior of a target on receipt of
    a Target Cold Reset should be similar to that of a Logout (assuming
    behavior as explained above in comments on section 1.2.6), where the
    target returns as little information as possible, and does not attempt
    to clean-up or respond upon reception of the command.  Also, this is the
    only section in the document where the term MIB is used, and since
    management of iSCSI is outside of the scope of this document, we would
    prefer to see the last sentence in the paragraph on Target Cold Reset
    removed.
    
    +++ I can refer to some "opaque" management structure instead of MIB
    As for not  returning anything this is acceptable for Target Cold Reset -
    where the TCP connections are teared down but not for those that leave the
    connections
    standing +++
    
    Section 2.8
    While the section on Bookmarks is brand new, and will probably go
    through more definition, the double definition of this field should be
    removed.  If it stays as Bookmark, and it is optional, then it is not
    reserved.
    
    +++ The Bookmark field is a bookmark if the B bit is 1 else it is reserved
    +++
    
    Section 2.8.5
    This section allows (in the first paragraph) large binary items to be
    encoded as hex or decimal.  This discussion is ongoing now, and we
    believe the encoding should be hex.
    
    +++ We think that hex will be the preferred encoding used but sometimes
    people will copy decimal numbers. Once you have to have a decimal
    conversion routine it makes little sense to force hexadecimal only by
    dictate
    +++
    
    The third to last sentence in this section alludes to the ability to use
    key=value pairs to perform active operations.  Since iSCSI active
    operations are interpreted by us as related to SCSI operations, what is
    being said here?  Does this mean that an initiator could perform
    mode-parameter or task management operations inside of Text Messages?
    If so, then more definition of this behavior is needed.
    
    +++ I will change it to "long lasting operations" and those can be vendor
    specific +++
    
    Section 2.9.5
    The second paragraph attempts to instruct the reader to order response
    key=value pairs in the same order they were received, but provides
    little direction on why, or what possible consequences would be if it
    were not done in that order.  Why is this sentence needed, except for
    instruction on best practices?
    
    +++ It is not a MUST but it was thought by many (with some vehemence) as
    helping matching.  IMHO it has no meaning and would suggest removing it.
    +++
    
    Section 2.10.1
    The instructions to targets on supporting multiple connections run up
    against the minimum requirement of targets to support only one
    connection.  If this section is a better reason to require at least two
    connections at a minimum, then the specification should call that the
    minimum.  If initiators are going to expect to be able to use this
    command, but targets that only support one connection would refuse it,
    than what good is this instruction going to be?  Would a target designer
    who develops an iSCSI target create the ability to open two connections
    to support only this situation, and not expand the interface to run two
    connections in Full Feature Mode?
    
    +++ The text only says that if you are going to support connection recovery
    you should be able to support opening a second connection in order to clean
    the first connection and close it. It also explicitly says that you don't
    have to support 2 connections in full feature phase. And even on 2 TCP
    connections it is just a recommendation - using IP-over-pidgeons can be an
    good alternative for forcing a TCP connection to close. On a more serious
    note the second TCP connection can be on the same physical connection.
    +++
    
    Section 2.12.4
    If the limit of reflected data is 4096 bytes, then the instruction to
    the initiator should not be "avoid sending", it should be "MUST NOT"
    send more than 4096 bytes.  Is there a reason why the initiator would
    want to send more than those number of bytes?
    
    +++ The implications of saying MUST NOT would be that the target has to
    check and to reject with a format error. We don't want this do we? +++
    
    Section 2.14
    This section was discussed above in relationship to section 1.2.6.  The
    Logout command should be simple and immediate, with no further work
    needed by the target then the clearing of internal buffers, and the
    reporting on the next Login (CHECK CONDITION) that the Logout was
    performed.
    
    +++ the answer is the same as before +++
    
    Section 2.16
    The last sentence in this section seems at odds with good error
    detection for targets, and at odds with section 2.19.1.  If a SNACK
    received has and error, whether or not it was a miss-directed PDU, or a
    transit error, why would a target just silently ignore it?  Should it
    use the SNACK Rejected code of the Reject response, or at least tell the
    initiator something?  Besides, the use of the word "must" seems
    inconsistent with what this sentence is trying to say.  If there is a
    good reason for a target to ignore this, then it should be mentioned,
    and the directive should be "MUST" if the expectations of the initiator
    are set that way.
    
    +++ A SNACK that is not supported is answered either by an Asynch Message
    asking logout - for failover - or total silence - in which case we assume
    SCSI will timeout and do its thing (abort, reissue the command or whatever
    neded.
    There no need to answer. A badly formed SNACK is rejected as any othe badly
    formed PDU (although even this is perceived by some as excessive when it
    does not help the initiator take repair action).
    ++++
    
    Section 2.16.1
    The last paragraph of this section is confusing.  We cannot determine
    how it should be rewritten, since the intent is not clear.  Why would
    the target issue a message requesting Logout simply because it cannot
    process as SNACK that is optional anyway?  How does the initiator know
    if the target supports in-connection or out-of-connection recovery with
    regards to what the target does after discarding the SNACK?
    +++ The target should request logout if it supports connection recovery but
    not
    data snack +++
    
    Section 2.17.3
    What is the purpose of the last sentence on page 87?  It seems to
    indicate that there is a good reason (besides the obvious) why the
    Desired Data Length should not be zero, but falls short of explaining,
    or directing the reader with a "MUST," or "SHOULD."
    
    +++ the reason is in 2.17.1 the number of R2Ts in a command should be less
    than x'ffffffff'
    +++
    
    Section 2.17.4
    This topic was discussed above in section 1.2.5.
    
    +++ I hope the answer was adequate +++
    
    Section 3
    We would like to see the explanation in this section clarified with
    regards to the use of mode parameter changes for SCSI settings, versus
    iSCSI settings.  Suggested rewording for the first sentence of the
    second paragraph:
    "The mode parameters for iSCSI behavior cannot be set by SCSI mode-set
    but can be retrieved by SCSI mode-sense commands."
    Suggested additional wording:
    "SCSI mode parameters (non-iSCSI specific) cannot be set or retrieved
    from within iSCSI mode page commands."  -  Or something like that.
    +++ OK +++
    Section 4.2
    Under the key SecurityContextComplete=yes, there is a description of
    what the party sending this key should do, finishing with the phrase
    "until the handshake is complete."  What is the scenario for avoiding a
    lock condition in this handshake?  Should the party use some standard
    timeout? Where is this defined?
    
    +++ I think we can leave this to be implementation dependent. If I would
    have to chose a "way out" would say limited number of text exchanges rather
    than a timeout.  ++++
    
    Section 7
    The second paragraph in this section defines an assumption of the
    specification.  Potential manufacturers of iSCSI targets will want to
    know under what circumstances will the saving of sense and status
    information differ than those already done for FC or SCSI operation;
    specifically with regards to iSCSI Logout and iSCSI task management
    commands.  Can this paragraph have more details added?
    
    +++ The status and sense are saved for command recovery while the status is
    not acknowledged. Te recovery may involve status only, or some lost data
    (detected) when status arrived) or the whole command replay. Details are
    ... I think we have now more than 50 pages of detail +++
    
    Section 7.1c
    This description of the use of the Retry bit seems to put the onus of
    interpretation of the Retry bit on the target for command replay.  Is it
    at all possible for a Replay request to get issued, within the scope of
    an initiator delaying its acknowledgement, such that the command
    actually did get completed, but the Retry bit was set too soon?  Would
    one assume that an initiator just avoids doing something like this?
    
    +++ I will come back to this later. +++
    
    Section 7.6
    This section uses the term "ULP timeout", but there is no definition or
    reference to where it can be found.
    
    +++ It is not defined here - it is assumed that when we let SCSI commands
    dry out (drop data packets, status etc.) the SCSI drivers will recover
    after a timeout and do some cleanup +++
    
    Section 7.9
    By recommending (in the first paragraph) that a connection be tested in
    HA environments with iSCSI NOP-ping command, aren't the authors
    recommending that designers use bandwidth consuming methods for
    determining this?  If you assume a multiple path connection (fabric) for
    such iSCSI implementations, then couldn't a bunch of initiators
    potentially flood the fabric with such commands to "recognize a failing
    connection?"
    
    +++ We don't think that a a NOP once a second or once 100ms is a bandwidth
    consuming event.  +++
    
    Section 7.11.1 and 7.11.2
    Specifically, what time-outs are being referred to here?  The time a
    target or initiator should or might wait for acknowledgement from the
    other node that error recovery is being attempted?
    
    +++ Yes and the values are fabric and device dependent +++
    
    Section 7.11.3
    This section seems to not agree with the methods for completing a Logout
    command as discussed earlier.  Item number (2) in this section indicates
    that the initiator should treat the reception of an Asynchronous Message
    requesting a Logout command to that of a TCP failure, yet the target is
    instructed (section 2.14) to treat the reception of a Logout command as
    an opportunity to complete any commands "it deems necessary."  Again,
    the need for a Target Logout and Initiator Logout seems apparent.
    
    +++ Item 2 says only that the recovery procedure - if the parties are
    capable
    and willing to do within session recovery - should be initiated (like on a
    complete TCP failure).  +++
    
    Thanks,
    Bob
    
    Robert Griswold
    Technologist
    Crossroads Systems, Inc.
    512-928-7272
    
    
    
    
    


Home

Last updated: Tue Sep 04 01:04:06 2001
6315 messages in chronological order