|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: questions about FCIP connection failure detectionWith multiple levels of reporting, how do you handle cascading error conditions/reporting, and ensure that the single high-level timer catches the right conditions and reports correctly? This implies that once an error is reported at one level, their is a notification mechanism for the other layers to reset their timers, right? /m. -----Original Message----- From: Fraser, Don [mailto:Don.Fraser@compaq.com] Sent: Tuesday, April 30, 2002 12:17 PM To: Chong Peng; ips@ece.cmu.edu Subject: RE: questions about FCIP connection failure detection Hi; I don't think you missed anything in your understanding of the FCIP keep-alive timer. And yes it is true that for it to work, both sides must use the same messages. Please remember that there are several other link outage detection techniques intended to detect outages much faster than 2 hours. All of these are listed in the table at the very end of the FCIP draft. Then upon detecting an outage of the TCP connect, the FCIP entity is to report those to the FC entity which in turn informs the FC fabric of the FCIP link outage. In addition at the Fibre Channel fabric level, one should also expect the basic Fibre Channel hello protocol to also periodically test the status of the same TCP connection but between FCIP switching elements, not at the TCP stack level. Don -----Original Message----- From: Chong Peng [mailto:ChongPeng@MaXXan.com] Sent: Friday, April 26, 2002 5:25 PM To: Fraser, Don; ips@ece.cmu.edu Subject: RE: questions about FCIP connection failure detection Don: Thanks for the explaination. But I do have another question. Here is my understanding: A TCP connection can fail in two different situations. (1) TCP connection fails when data flows across it. (2) TCP connection fails when there is no data flows across it. For example, the one end of a TCP connection crashes/reboot while no data exchanged across the TCP connection. Failure (1) is relatively easy to detect. For example, after TCP does re-transmit for a few times, it will send a RST. So, eventually, both ends of the TCP connection will notice the failure. Failure (2) is relatively hard to handle. When one end get rebooted, there is a possiblity that the other end never notice the failure. This is especially true when the end get rebooted is the TCP client, because usually, when the TCP clients do not send service requests to the TCP servers, the TCP servers would not send anything to the TCP clients. That is why TCP keep-alive timer, although not defined in RFC 793, come into the place in some of the TCP implementations. I believe the purpose of the TCP keep-alive timer is to guranteer that both ends of the TCP connection eventually detect failure (2), enev though it is after a long time (max two hours). Now look at the TCP failures in the context of FC over TCPIP. The first paragraph of Section 9.4 in FCIP spec basically says that, in order to detect failure (2) in FC over TCPIP, means other than TCP keep-alive timer is needed because two hours is too long. And the spec then suggests that "In order to facilitate faster detection of loss of connectivity, FC Entities SHOULD implement some form of Fibre Channel connection failure detection (see FC-BB-2 [4])". Here, my understanding is that the spec suggests some sort of "keep-alive like" scheme can be implemented in the FC entity. The question is: how can we keep the interoperability among FCIP devices from different vendors if we let vendors to implement their own "keep-alive like" scheme in the FC entity? My understanding is that any "keep-alive like" scheme involves message exchanges between two ends, in other word, for any "keep-alive like" scheme to work properly, both ends of the connection have to talk the same language. Do I understand this wrong or miss something here? chong peng -----Original Message----- From: Fraser, Don [mailto:Don.Fraser@compaq.com] Sent: Friday, April 26, 2002 7:29 AM To: Chong Peng; ips@ece.cmu.edu Subject: RE: questions about FCIP connection failure detection Hi: > In idle mode, a TCP Connection "keep alive" option of TCP is normally used to keep a connection alive. However, this timeout is fairly large and may prevent early detection of loss of connectivity. In order to facilitate faster detection of loss of connectivity, FC Entities SHOULD implement some form of Fibre Channel connection failure detection (see FC-BB-2 [4]). This is a not required to implement to pass interoperability with other FCIP gateways devices and is not in error. A vendor may choose to implement their own keep-alive to be used whenever there is no traffic received for the keep-alive time internal. > When an FCIP Entity discovers that TCP connectivity has been lost, the FCIP Entity SHALL notify the FC Entity of the failure including information about the reason for the failure. On the other hand the FCIP entity being closer to the TCP stack than the FC entity and is therefore able to detect and report the loss of TCP connectivity. The method of reporting this loss to the FC entity is left up to the implementer. In a revision of the FC-BB-2 made at the last T11 meeting in Vancouver it was approved to add the following to a new clause in section 16.3: 16.3.x FCIP Error Reporting The FC entity will receive notifications from the FCIP entity due to a number of errors detected by the FCIP entity. As a result, the E_Port implementation of the FC entity must report those errors to the local FC switch element via the local VE_port (see Fig 23). Similarly the B_Port implementation must report the error to the local VB_access port (see figure 26). In addition the FC entity may pass these error reports to the local PMM for inclusion in a local event log. In both cases, the FC entity shall convert the error message received from the FCIP entity into a Registered Link Incident Report (FC-FS RLIR). It is the RLIR that is forwarded from the FC entity to either the VE_Port (figure 23) or VB_Access (figure 26). On receipt of the message from the FC Entity, the VE_Port or VB_Access shall immediately forward the RLIR to the FC Switch Entity. As a minimum the FC Entity shall accept the following messages from the FCIP entity and shall transfer them as an RLIR to the FC Switching Element by the VE_Port or to the FC Network by the VB_Access: FCIP RFC Section 6.6.2.3: Loss of FC frame synchronization FCIP RFC Section 9.1.2.3: Failure to setup TCP connection FCIP RFC Section 9.1.3: TCP connect request timeout or Duplicate connect request FCIP RFC Section 9.2: Successful completion of FC Entity request to close TCP connection FCIP RFC Section 9.4: Loss of TCP connectivity FCIP RFC Section 10.4.3: Excessive number of dropped datagrams or Any confidentiality violations FCIP RFC Section 10.4.4: SA parameter mis-match Don Fraser Contributor to FCIP -----Original Message----- From: Chong Peng [mailto:ChongPeng@MaXXan.com] Sent: Thursday, April 25, 2002 2:48 PM To: ips@ece.cmu.edu Subject: questions about FCIP connection failure detection Hi, all The Section 9.4 (TCP Connection Considerations) of draft-ietf-ips-fcovertcpip-09 says: In idle mode, a TCP Connection "keep alive" option of TCP is normally used to keep a connection alive. However, this timeout is fairly large and may prevent early detection of loss of connectivity. In order to facilitate faster detection of loss of connectivity, FC Entities SHOULD implement some form of Fibre Channel connection failure detection (see FC-BB-2 [4]). When an FCIP Entity discovers that TCP connectivity has been lost, the FCIP Entity SHALL notify the FC Entity of the failure including information about the reason for the failure. I have a couple of questions regarding this section: 1. The first pragraph states that the FC entity is responsable to discover the connection failure. But the second paragraph implys the FCIP entity discovers the connection failure first and then notifies the FC entity. Is there an editorial error? 2. If we let the application protocol on the top of TCP to discover the connection failure, what scheme are we going to use? Are we planning to define some "FCIP keep alive" frames in the future? I checked FC-BB-2, in the section related to discovery (13.2.2.4.2), it says "TBD". Chong Peng This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized use; review, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by return email and destroy all copies of the original message. Copyright © 2002 MaXXan Systems, Inc. All rights reserved. This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized use; review, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by return email and destroy all copies of the original message. Copyright © 2002 MaXXan Systems, Inc. All rights reserved.
Home Last updated: Tue Apr 30 18:18:38 2002 9907 messages in chronological order |