|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI: remove recovery from transport-layer connection failure(?)Steph, Any IP-SCSI spec could be seen both as a connection to a controller and as bridge to existing drives. With 5+ milli-second network latency, controllers will remain adjacent to the client as a means of protection against this latency in much the same manner a controller protects from drive latency. As both modes of operation are legitimate, assumptions about transport should be tempered by these possibilities. SCTP does provide for a more immediate recovery. It would also be irresponsible to promote modification to TCP to support features already found within SCTP. Doug > -----Original Message----- > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of > Stephen Bailey > Sent: Tuesday, September 26, 2000 9:16 PM > To: ips@ece.cmu.edu > Subject: Re: iSCSI: remove recovery from transport-layer connection > failure(?) > > > > Currently, iSCSI is spec'ed to recover from transport-layer > > connection failures. > > > > The main motivation for this decision was to support tape backup > > applications that are quite sensitive to any failures that get > > propogated to their layer. > > > > So, perhaps we can remove the requirement of recovering from > > transport-layer connection failures in iSCSI. This would simplify > > the protocol somewhat. > > > > Thoughts? > > I'm all for eliminating command recovery. > > There seem to be several reasons advanced for command recovery. > > The first seems to be based upon an inappropriate analogy to FCP. > Command recovery had to be added to FCP-2 because the FC layer is > unreliable. A single dropped FC frame leads to a failed FCP command. > This clearly upsets tape operation even when the link is performing > nominally. In FCP, without command recovery, with some observable > frequency, you will get an expected error that leads to complete, > irrecoverable failure of a transfer stream. The other thing that > makes FCP-2 command recovery work well is when you are doing a write, > which is 90% (maybe it's 99%?) of tape operation, the target can > return an early indication of most frame drops, rather than waiting > for a timer to expire. > > TCP's reliability solves this problem in another way. By the time you > get a TCP connection failure, you have already exhausted a set of > reliability mechanisms which guarantee, with high certainty, that > further data can not be transferred between the two endpoints. > > `the two endpoints' phrase suggests the other reason advanced for > command recovery. That is, to permit path failover for commands which > are not idempotent, such as tape write sequential. The > problem with this, is that it is not clear HOW iSCSI command recovery > can actually work properly, given a TCP connection failure indication. > It takes a long time for a TCP connection to fail, and by that time, > I'm not sure recovery would reasonably be possible. Perhaps I'm in > error on this assumption. Can a tape guru (Joe from Exabyte?) comment > on whether recovery would be possible after many seconds (tens, > hundreds) have elapsed? > > The SCSI layer has never been solely responsible for ensuring reliable > backup. Macro scale things go wrong with tape (run off the end, get > eaten, etc..) with relatively high frequency. A low level backup > engine like tar or dump will fail on a SCSI error, and that's OK. > There must also be a higher level software component like Amanda, > which manages retries, including operator intervention, to ensure > reliable backup. > > It seems like whether iSCSI has a command recovery mechanism should be > a function of whether somebody can stand up and say for sure that it > solves a real problem. So far it only seems like it MIGHT solve a > problem. Who can say `this solves MY problem!'? > > Steph >
Home Last updated: Tue Sep 04 01:07:02 2001 6315 messages in chronological order |