RE: questions about FCIP connection failure detection

To: "'Fraser, Don'" <Don.Fraser@compaq.com>, Chong Peng <ChongPeng@MaXXan.com>, ips@ece.cmu.edu
Subject: RE: questions about FCIP connection failure detection
From: Michel Maddux <Michel.Maddux@mcdata.com>
Date: Tue, 30 Apr 2002 12:51:43 -0600
Content-Type: text/plain;charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu
With multiple levels of reporting, how do you handle cascading error
conditions/reporting,
and ensure that the single high-level timer catches the right conditions and
reports correctly?

This implies that once an error is reported at one level, their is a
notification mechanism
for the other layers to reset their timers, right? /m.

-----Original Message-----
From: Fraser, Don [mailto:Don.Fraser@compaq.com]
Sent: Tuesday, April 30, 2002 12:17 PM
To: Chong Peng; ips@ece.cmu.edu
Subject: RE: questions about FCIP connection failure detection


Hi;

I don't think you missed anything in your understanding of the FCIP
keep-alive timer.  And yes it is true that for it to work, both sides must
use the same messages.  

Please remember that there are several other link outage detection
techniques intended to detect outages much faster than 2 hours.  All of
these are listed in the table at the very end of the FCIP draft.  Then upon
detecting an outage of the TCP connect, the FCIP entity is to report those
to the FC entity which in turn informs the FC fabric of the FCIP link
outage.  In addition at the Fibre Channel fabric level, one should also
expect the basic Fibre Channel hello protocol to also periodically test the
status of the same TCP connection but between FCIP switching elements, not
at the TCP stack level.

Don

-----Original Message-----
From: Chong Peng [mailto:ChongPeng@MaXXan.com]
Sent: Friday, April 26, 2002 5:25 PM
To: Fraser, Don; ips@ece.cmu.edu
Subject: RE: questions about FCIP connection failure detection


Don:

Thanks for the explaination. But I do have another question.

Here is my understanding:

A TCP connection can fail in two different situations.

(1) TCP connection fails when data flows across it.
(2) TCP connection fails when there is no data flows across it. For example,

    the one end of a TCP connection crashes/reboot while no data exchanged 
    across the TCP connection.

Failure (1) is relatively easy to detect. For example, after TCP does 
re-transmit for a few times, it will send a RST. So, eventually,
both ends of the TCP connection will notice the failure.

Failure (2) is relatively hard to handle. When one end get rebooted, there 
is a possiblity that the other end never notice the failure. This is
especially 
true when the end get rebooted is the TCP client, because usually, when
the TCP clients do not send service requests to the TCP servers, the TCP 
servers would not send anything to the TCP clients. That is why TCP
keep-alive 
timer, although not defined in RFC 793, come into the place in some of the 
TCP implementations. I believe the purpose of the TCP keep-alive timer is to

guranteer that both ends of the TCP connection eventually detect failure
(2), 
enev though it is after a long time (max two hours).

Now look at the TCP failures in the context of FC over TCPIP. The first
paragraph 
of Section 9.4 in FCIP spec basically says that, in order to detect failure
(2) in
FC over TCPIP, means other than TCP keep-alive timer is needed because two
hours 
is too long. And the spec then suggests that "In order to facilitate faster 
detection of loss of connectivity, FC Entities SHOULD implement some form of

Fibre Channel connection failure detection (see FC-BB-2 [4])". Here, my 
understanding is that the spec suggests some sort of "keep-alive like"
scheme can be 
implemented in the FC entity. The question is: how can we keep the
interoperability 
among FCIP devices from different vendors if we let vendors to implement
their own 
"keep-alive like" scheme in the FC entity? My understanding is that any 
"keep-alive like" scheme involves message exchanges between two ends, in
other 
word, for any "keep-alive like" scheme to work properly, both ends of the 
connection have to talk the same language.

Do I understand this wrong or miss something here?

chong peng

-----Original Message-----
From: Fraser, Don [mailto:Don.Fraser@compaq.com]
Sent: Friday, April 26, 2002 7:29 AM
To: Chong Peng; ips@ece.cmu.edu
Subject: RE: questions about FCIP connection failure detection


Hi:

> In idle mode, a TCP Connection "keep alive" option of TCP is
   normally used to keep a connection alive. However, this timeout is
   fairly large and may prevent early detection of loss of
   connectivity. In order to facilitate faster detection of loss of
   connectivity, FC Entities SHOULD implement some form of Fibre
   Channel connection failure detection (see FC-BB-2 [4]).

This is a not required to implement to pass interoperability with other FCIP
gateways devices and is not in error.  A vendor may choose to implement
their own keep-alive to be used whenever there is no traffic received for
the keep-alive time internal.

> When an FCIP Entity discovers that TCP connectivity has been lost,
   the FCIP Entity SHALL notify the FC Entity of the failure including
   information about the reason for the failure.

On the other hand the FCIP entity being closer to the TCP stack than the FC
entity and is therefore able to detect and report the loss of TCP
connectivity.  The method of reporting this loss to the FC entity is left up
to the implementer.  In a revision of the FC-BB-2 made at the last T11
meeting in Vancouver it was approved to add the following to a new clause in
section 16.3:

16.3.x  FCIP Error Reporting

The FC entity will receive notifications from the FCIP entity due to a
number of errors detected by the FCIP entity. As a result, the E_Port
implementation of the FC entity must report those errors to the local FC
switch element via the local VE_port (see Fig 23).  Similarly the B_Port
implementation must report the error to the local VB_access port (see figure
26). In addition the FC entity may pass these error reports to the local PMM
for inclusion in a local event log.

In both cases, the FC entity shall convert the error message received from
the FCIP entity into a Registered Link Incident Report (FC-FS RLIR).  It is
the RLIR that is forwarded from the FC entity to either the VE_Port (figure
23) or VB_Access (figure 26).  On receipt of the message from the FC Entity,
the VE_Port or VB_Access shall immediately forward the RLIR to the FC Switch
Entity.

As a minimum the FC Entity shall accept the following messages from the FCIP
entity and shall transfer them as an RLIR to the FC Switching Element by the
VE_Port or to the FC Network by the VB_Access:
	FCIP RFC Section 6.6.2.3: Loss of FC frame synchronization
	FCIP RFC Section 9.1.2.3: Failure to setup TCP connection
	FCIP RFC Section 9.1.3: TCP connect request timeout or Duplicate
connect request
	FCIP RFC Section 9.2: Successful completion of FC Entity request to
close TCP connection
	FCIP RFC Section 9.4: Loss of TCP connectivity
	FCIP RFC Section 10.4.3: Excessive number of dropped datagrams or
Any confidentiality 			violations
	FCIP RFC Section 10.4.4: SA parameter mis-match

Don Fraser
Contributor to FCIP

-----Original Message-----
From: Chong Peng [mailto:ChongPeng@MaXXan.com] 
Sent: Thursday, April 25, 2002 2:48 PM
To: ips@ece.cmu.edu
Subject: questions about FCIP connection failure detection


Hi, all

The Section 9.4 (TCP Connection Considerations) of
draft-ietf-ips-fcovertcpip-09 
says:
 
   In idle mode, a TCP Connection "keep alive" option of TCP is
   normally used to keep a connection alive. However, this timeout is
   fairly large and may prevent early detection of loss of
   connectivity. In order to facilitate faster detection of loss of
   connectivity, FC Entities SHOULD implement some form of Fibre
   Channel connection failure detection (see FC-BB-2 [4]).
 
   When an FCIP Entity discovers that TCP connectivity has been lost,
   the FCIP Entity SHALL notify the FC Entity of the failure including
   information about the reason for the failure.

I have a couple of questions regarding this section:

1. The first pragraph states that the FC entity is responsable to discover
the 
   connection failure. But the second paragraph implys the FCIP entity
discovers 
   the connection failure first and then notifies the FC entity. Is there an

   editorial error?
2. If we let the application protocol on the top of TCP to discover the 
   connection failure, what scheme are we going to use? Are we planning to
   define some "FCIP keep alive" frames in the future? I checked FC-BB-2,
   in the section related to discovery (13.2.2.4.2), it says "TBD".

Chong Peng


This email message is for the sole use of the intended recipient(s) and may
contain confidential information. Any unauthorized use; review, disclosure
or distribution is prohibited. If you are not the intended recipient, please
contact the sender by return email and destroy all copies of the original
message. 
Copyright © 2002 MaXXan Systems, Inc. All rights reserved.


This email message is for the sole use of the intended recipient(s) and may
contain confidential information. Any unauthorized use; review, disclosure
or distribution is prohibited. If you are not the intended recipient, please
contact the sender by return email and destroy all copies of the original
message. 
Copyright © 2002 MaXXan Systems, Inc. All rights reserved.
Prev by Date: Re: iSCSI: Logout request
Next by Date: Re: iSCSI: RE: iSCSI 4.1 & 4.2
Prev by thread: RE: questions about FCIP connection failure detection
Next by thread: RE: questions about FCIP connection failure detection
Index(es):
- Date
- Thread
Home
Last updated: Tue Apr 30 18:18:38 2002
9907 messages in chronological order