|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Towards Consensus on TCP Connections> On recovery, a big concern was the tape backup issue. Do SCSI > applications recover gracefully today from failed SCSI connections? > My understanding was that many tape backups program abort the backup. Tape is hard. The main reason is that when an error occurs on a {READ, WRITE} SEQUENTIAL, you don't really know what state the tape is in. Maybe the tape has advanced by the length of the failed operation Maybe not. Maybe the tape has been eaten. Maybe the tape has been ejected. It's really hard to do anything at ANY layer except go into heavy duty recovery (rewind and try again). There are two distinct applications of tape with different requirements, backup and streaming data recording. Backup is by far the most common application. Many backup applications don't attempt recovery because they assume that correcting the problem will probably require operator intervention. Amanda is an example of a backup application which does recovery correctly, and in essence, it operates a layer above the backup applications that actually touch the tape. It is responsible for buffering the data (on a disk), notifying the operator of the failure, and trying it again on some arranged schedule, or on operator request. The best thing you can do to improve tape behavior in either the backup or streaming application is to improve the reliability of the data transport, which is exactly what iSCSI does simply by using TCP. The problem FC had was that when you write an arbitrary amount of data, eventually you WILL get a media layer error and then you're lost. With FC error rates, this is usually only a problem for the streaming data application. Nonetheless, although the streaming data application is the minority, the customers are high profile and have huge installations. > Related to recovery, when a TCP/SCSI connection closes, what ramifications > does it have on device state (like mode pages, PREVENT/ALLOW REMOVAL, > RESERVE/RELEASE, etc.)? Where does SCSI specify this? This is a good question. FC sorta blew this one originally. Reservations did not even persist across hot plugs of uninvolved equipment in FC-AL. As a result, you have clustering software that rereserves every few seconds `just in case'. In the case of FC, the obvious solution was to reserve by node name (WWN which is unique to the device, as opposed to port name, which is unique to the attachment point). The mistake was drawing too direct an analogy between parallel SCSI and FC. Parallel SCSI had limited addressing and relatively stable topology, and FC had wider addressing and a much more dynamic topology. Currently FC is somewhat mired in backward compatability issues with respect to these recovery topics. Hopefully iSCSI will follow a more enlightened course. Steph
Home Last updated: Tue Sep 04 01:07:55 2001 6315 messages in chronological order |