|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI: more on StatRN1. Hitting with the big hammer at the first error is a question of designer judgement. And he can do it whenever he wants - as long as we don't force him to do it at the first error by mandating it. 2.Dropped PDUs and the DOS attack - PDUs sent while not in full feature phase will certainly help avoiding DOS. We have also to consider the case of a bad initiator shelling the target with badly formed PDUs. If the initiator is not able to recover what is the point of reporting the error? It should be logged for maintenance but nothing else would help. 3. I am aware of the pitfalls of keep alive - but it is a cheap way of early detection of link failures, unless you use the iSCSI ping (that is slightly more expensive). Thanks, Julo Stephen Bailey <steph@cs.uchicago.edu> on 25/10/2000 01:03:43 Please respond to Stephen Bailey <steph@cs.uchicago.edu> To: ips@ece.cmu.edu cc: Subject: Re: iSCSI: more on StatRN Julian, > The reason I suggested dropping connections after several format errors was > tolerance to software "glitches". 'tolerating' software glitches usually means detecting them where possible and making sure that you don't go off in the weeds as a result of them. Unfortunately, most (? should we vote by distinct glitches, glitch occurences, or maybe the amount of time (wall clock? programmer?) wasted by glitches %^) software glitches are not recoverable by mere retry. They require explicit work-around. Therefore, I think the appropriate stance is to specify that the detector should hit the source of the glitch with the biggest possible hammer (connection reset) immediately. Obviously, work-arounds will happen, and as a result, they'll violate the SHALLs in the spec, but the fact is they're already addressing other violations of the SHALL. No big deal. > The Check Condition is meant for cases in which SCSI can act - and yes from > the transport POV the command has finished. I guess the only point I'm trying to make is that I don't think SCSI status should be used for conditions which are not already defined in SAM/T10. FCP and SST both define a `response' status mechanism which is used to report conditions which can be reported in-line, but are not SCSI generic. For example, conflicting option flag settings in the CMD PDU (other than those in the CDB). A key point (of which you're probably already aware), is that any error which CAN be reported in-line should be reported in-line, to improve overall responsiveness. If you're already on top of all that, and I'm preaching to the choir, right on. If not, there it is. > Dropped PDUs will help us avid DOS attacks with badly formed PDUs. What's the DOS attack that this addresses? Certainly PDUs outside a connection will be dropped, but at the TCP layer before iSCSI ever sees it. Once an iSCSI connection is established, I don't see how you're any more open or protected from a DOS attack. Specifically, you initiate a TCP connection close on the first bogus PDU, and while you're closing you ignore everything that's not part of the close protocol, right? > And I will suggest activating the TCP keep alive option for early detection > of link failures. TCP keep alive has a chequered history, and may not be the right thing here. Stevens said somewhere (TCPI I think), that it's more chic to have the ULP do keep alive if desired, which is where this whole discussion started. As long as you have no outstanding operations on a connection, neither end probably needs (or wants, if you believe Stevens' arguments) a keep alive. Once you have operations in progress, the initiator is already keeping timers on every operation, so connection failure can initially be detected in that way. The reason why we specified a connection viability check on operation timeout in SST is to improve responsiveness during link failures. You don't NEED to do the viability test at all, in which case, each operation will fail under its own timeout. However, badly engineered FC implementations have shown that it's important to detect failure as early as possible where ever possible. Otherwise the system can get extremely sluggish. And then there's the issue of the target recovering resources in a bounded amount of time. In SST we specified that the target shall perform keep alives for this reason. In iSCSI, I would suggest that it would be approprate to specify that targets MAY perform an iSCSI keep alive when they have live commands on a connection if they care about recovering their resources. The key thing to remember about keep alives is that iSCSI endpoints may have extremely high connectivity degree, but are likely to have many inactive connections. Having everybody banging away on each other with keep alives could have a substantial cost (or was everybody planning to hardware accelerate the keep alives :-?) Steph
Home Last updated: Tue Sep 04 01:06:35 2001 6315 messages in chronological order |