|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI: remove recovery from transport-layer connection failure(?)Somesh, Are you referring to 802.1Q, 802.1D, or HSRP or just EtherChannel in general? Doug > -----Original Message----- > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of > GUPTA,SOMESH (HP-Cupertino,ex1) > Sent: Monday, October 02, 2000 1:48 PM > To: julian_satran@il.ibm.com; ips@ece.cmu.edu > Subject: RE: iSCSI: remove recovery from transport-layer connection > failure(?) > > > Julian, > > If the scenario you point out is correct (a single command lasting > for such a long time), then of course we need a mechanism where > we can restart the command from the approximate point of failure. > However that would be failures lasting for "more than a fraction > of a sec". > > First of all, a TCP connection does not indicate a failure that > quickly. Secondly, there are ways to recover from a path failure > and still preserve a TCP connection in High-Availability environments. > I am sure most system vendors would be implementing such techniques. > > Somesh > > -----Original Message----- > From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com] > Sent: Sunday, October 01, 2000 1:52 AM > To: ips@ece.cmu.edu > Subject: Re: iSCSI: remove recovery from transport-layer connection > failure(?) > > > > > Steph, > > Assume than in the new wonderfull SAN world you have started a > disk-to-tape > (or disk-to-disk) long third party copy. The SAN is fine and the copy > proceeds for an hour > but the lousy initiator-to-copy-manager link (on which > accidentally no data > transfer took place) fails for a fraction of a second. > Should we restart the command under-the-cover or drop it or ask > the parties to provide state information to a specific SCSI > restart driver? > > And we can build many similar scenarios. > > I think that whatever we can do simplify exception handling we should do > (the same arguments that hold for multiple connections hold here too). > > I would add that in Ideal world - I would like to have transport > "splice" a > new TCP > connection with an old TCP connection but failing this to happen (again > SCTP is doing it already or not?) we should take care that simple events > like a cable taken-out > in some obscure part of the network will only seldom affect higher layers. > > Julo > > Stephen Bailey <steph@cs.uchicago.edu> on 27/09/2000 07:16:12 > > Please respond to Stephen Bailey <steph@cs.uchicago.edu> > > To: ips@ece.cmu.edu > cc: (bcc: Julian Satran/Haifa/IBM) > Subject: Re: iSCSI: remove recovery from transport-layer connection > failure(?) > > > > > > Currently, iSCSI is spec'ed to recover from transport-layer > > connection failures. > > > > The main motivation for this decision was to support tape backup > > applications that are quite sensitive to any failures that get > > propogated to their layer. > > > > So, perhaps we can remove the requirement of recovering from > > transport-layer connection failures in iSCSI. This would simplify > > the protocol somewhat. > > > > Thoughts? > > I'm all for eliminating command recovery. > > There seem to be several reasons advanced for command recovery. > > The first seems to be based upon an inappropriate analogy to FCP. > Command recovery had to be added to FCP-2 because the FC layer is > unreliable. A single dropped FC frame leads to a failed FCP command. > This clearly upsets tape operation even when the link is performing > nominally. In FCP, without command recovery, with some observable > frequency, you will get an expected error that leads to complete, > irrecoverable failure of a transfer stream. The other thing that > makes FCP-2 command recovery work well is when you are doing a write, > which is 90% (maybe it's 99%?) of tape operation, the target can > return an early indication of most frame drops, rather than waiting > for a timer to expire. > > TCP's reliability solves this problem in another way. By the time you > get a TCP connection failure, you have already exhausted a set of > reliability mechanisms which guarantee, with high certainty, that > further data can not be transferred between the two endpoints. > > `the two endpoints' phrase suggests the other reason advanced for > command recovery. That is, to permit path failover for commands which > are not idempotent, such as tape write sequential. The > problem with this, is that it is not clear HOW iSCSI command recovery > can actually work properly, given a TCP connection failure indication. > It takes a long time for a TCP connection to fail, and by that time, > I'm not sure recovery would reasonably be possible. Perhaps I'm in > error on this assumption. Can a tape guru (Joe from Exabyte?) comment > on whether recovery would be possible after many seconds (tens, > hundreds) have elapsed? > > The SCSI layer has never been solely responsible for ensuring reliable > backup. Macro scale things go wrong with tape (run off the end, get > eaten, etc..) with relatively high frequency. A low level backup > engine like tar or dump will fail on a SCSI error, and that's OK. > There must also be a higher level software component like Amanda, > which manages retries, including operator intervention, to ensure > reliable backup. > > It seems like whether iSCSI has a command recovery mechanism should be > a function of whether somebody can stand up and say for sure that it > solves a real problem. So far it only seems like it MIGHT solve a > problem. Who can say `this solves MY problem!'? > > Steph >
Home Last updated: Tue Sep 04 01:06:54 2001 6315 messages in chronological order |