SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iSCSI: more on StatRN



    
    
    1. Hitting with the big hammer at the first error is a question of designer
    judgement.  And he can do it whenever he wants - as long as we don't force
    him to do it at the first error by mandating it.
    
    2.Dropped PDUs and the DOS attack - PDUs sent while not in full feature
    phase will certainly help avoiding DOS.
     We have also to consider the case of a bad initiator shelling the target
    with badly formed PDUs. If the initiator
    is not able to recover what is the point of reporting the error? It should
    be logged for maintenance but nothing else
    would help.
    
    3. I am aware of the pitfalls of keep alive - but it is a cheap way of
    early detection of link failures, unless
    you use the iSCSI ping (that is slightly more expensive).
    
    Thanks,
    Julo
    
    Stephen Bailey <steph@cs.uchicago.edu> on 25/10/2000 01:03:43
    
    Please respond to Stephen Bailey <steph@cs.uchicago.edu>
    
    To:   ips@ece.cmu.edu
    cc:
    Subject:  Re: iSCSI: more on StatRN
    
    
    
    
    Julian,
    
    > The reason I suggested dropping connections after several format errors
    was
    > tolerance to software "glitches".
    
    'tolerating' software glitches usually means detecting them where
    possible and making sure that you don't go off in the weeds as a
    result of them.  Unfortunately, most (? should we vote by distinct
    glitches, glitch occurences, or maybe the amount of time (wall clock?
    programmer?) wasted by glitches %^) software glitches are not
    recoverable by mere retry.  They require explicit work-around.
    Therefore, I think the appropriate stance is to specify that the
    detector should hit the source of the glitch with the biggest possible
    hammer (connection reset) immediately.
    
    Obviously, work-arounds will happen, and as a result, they'll violate
    the SHALLs in the spec, but the fact is they're already addressing
    other violations of the SHALL.  No big deal.
    
    > The Check Condition is meant for cases in which SCSI can act - and yes
    from
    > the transport POV the command has finished.
    
    I guess the only point I'm trying to make is that I don't think SCSI
    status should be used for conditions which are not already defined in
    SAM/T10.  FCP and SST both define a `response' status mechanism which
    is used to report conditions which can be reported in-line, but are
    not SCSI generic.  For example, conflicting option flag settings in
    the CMD PDU (other than those in the CDB).  A key point (of which
    you're probably already aware), is that any error which CAN be
    reported in-line should be reported in-line, to improve overall
    responsiveness.
    
    If you're already on top of all that, and I'm preaching to the choir,
    right on.  If not, there it is.
    
    > Dropped PDUs will help us avid DOS attacks with badly formed PDUs.
    
    What's the DOS attack that this addresses?  Certainly PDUs outside a
    connection will be dropped, but at the TCP layer before iSCSI ever
    sees it.  Once an iSCSI connection is established, I don't see how
    you're any more open or protected from a DOS attack.  Specifically,
    you initiate a TCP connection close on the first bogus PDU, and while
    you're closing you ignore everything that's not part of the close
    protocol, right?
    
    > And I will suggest activating the TCP keep alive option for early
    detection
    > of link failures.
    
    TCP keep alive has a chequered history, and may not be the right thing
    here.  Stevens said somewhere (TCPI I think), that it's more chic to
    have the ULP do keep alive if desired, which is where this whole
    discussion started.
    
    As long as you have no outstanding operations on a connection, neither
    end probably needs (or wants, if you believe Stevens' arguments) a
    keep alive.  Once you have operations in progress, the initiator is
    already keeping timers on every operation, so connection failure can
    initially be detected in that way.
    
    The reason why we specified a connection viability check on operation
    timeout in SST is to improve responsiveness during link failures.  You
    don't NEED to do the viability test at all, in which case, each
    operation will fail under its own timeout.  However, badly engineered
    FC implementations have shown that it's important to detect failure as
    early as possible where ever possible.  Otherwise the system can get
    extremely sluggish.
    
    And then there's the issue of the target recovering resources in a
    bounded amount of time.  In SST we specified that the target shall
    perform keep alives for this reason.  In iSCSI, I would suggest that
    it would be approprate to specify that targets MAY perform an iSCSI
    keep alive when they have live commands on a connection if they care
    about recovering their resources.
    
    The key thing to remember about keep alives is that iSCSI endpoints
    may have extremely high connectivity degree, but are likely to have
    many inactive connections.  Having everybody banging away on each
    other with keep alives could have a substantial cost (or was everybody
    planning to hardware accelerate the keep alives :-?)
    
    Steph
    
    
    
    


Home

Last updated: Tue Sep 04 01:06:35 2001
6315 messages in chronological order