|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI: Keep aliveJohn, I do not agree with your assessment for the need of establishing a connection failure timeout. To achieve a desired detection interval within the tens of seconds, a mechanism must be in place. Contemplating dropping, suspicious, beyond some timeout value fails to provide a deterministic means of assessing the status of the connection. I agree there is no point in pinging every 10 seconds under the following conditions: 1) There is already communication traffic confirming connection. 2) The connection is idle and no status is pending. This makes a SCSI ping every ten seconds a rare event. It does however allow expeditious detection of a failure within a time interval suitable for preventing overlapping retry mechanisms. As a SCSI ping will be a rare event used only during periods of very low utilization, requiring the ping to be serviced by the task process improves the reliability of failure detection. Doug > Team, > Let me see if I can now boil down the thoughts that have occurred on the > keep-alive thread and some of my input: > > It has been stated that there may be value in ensuring that a > link has gone > down and re-establish it, without the SCSI application being aware. > Assuming that this is valid: > > 1. If for what ever reason an iSCSI session thinks something has > gone wrong > and is contemplating dropping the connection it should Ping > before dropping > a session that maybe down, as a possible confirmation that the link is > down. (This is not 100% guarantee that this will always detect that the > connection is still active, but if responded too, that will guarantee that > the connection is still up.) > 2. No point to pinging every 10 Seconds or so. If nothing is outstanding, > or missing, then why Ping? Perhaps the implementation can have time out > values etc. that it can use to determine if a suspicious connection is > still up. This seems to be an implementation issue. I think that an > implementer note in the draft would be sufficient. Something like "The > implementation may consider the discovery of dropped connections by use of > a Ping, at the point the implementation is suspicious of failure. (should > not be done regularly). A suspicion maybe raised by an outstanding > expected response, beyond some Time-out value." > 3. We do not need to find out if no one is home at the SCSI task layer, > that is the job of the SCSI Task layer. We just need to ensure that the > iSCSI transport is OK. > 4. It has been pointed out that, the ping can be returned by the HW, > without respect to things going on higher in the various layers, but that > with SW implementations, it may be blocked behind other unprocessed stuff. > This also depends on the implementation of the SW, and the buffer > handling. > So I would suggest, that the very most a ping can help with, in the SW > implementation, is a premature line drop -- by the pinging side -- that > would have otherwise occurred without the ping. > 5.The ping can be useful in sorting out the difference between a long SCSI > operation and a connection hang. That is, if it is a long SCSI response > time, the ping will return, and the connection dropping can be avoided. > 6. The only time the adequacy of the above approach is an issue, is when > stuff has backed up in the iSCSI buffers and SW TCP/IP buffers -- > undelivered to SCSI -- for such a long time that it is noticed on the > other end. We have other flow control things to control this problem. > Therefore, a ping at (and only at) the time of suspicion and to avoid a > inappropriate connection drop is a valid approach. > > > . > . > . > John L. Hufferd > Senior Technical Staff Member (STSM) > IBM/SSG San Jose Ca > (408) 256-0403, Tie: 276-0403 > Internet address: hufferd@us.ibm.com >
Home Last updated: Tue Sep 04 01:06:31 2001 6315 messages in chronological order |