|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI: error recoveryAt 02:46 PM 10/30/00 -0800, Matt Wakeley wrote: >julian_satran@il.ibm.com wrote: > > > Matt, > > > > I think I read your note and I still maintain that the target will fare > > better and the initiator does not have to do anything different. > > > > When failing over the initiator will reissue the command (including all > > scatter gather lists) to the new HBA. It is the target that will send only > > the buffers he has and as long as the initiator is not scoreboarding it > > does not have to do anything different the second time than first. > >You are making the *big* assuption that an iSCSI initiator will "confirm" the >receipt of this "numbered" data after the data has been transfered to >initiator >host memory. What if it's buffered on the card somewhere, and the card dies >and the system fails over to a different card? (or perhaps an I/O subsystem >fails and the system "fails over" to a standby subsystem) How is the initiator >going to be absolutely sure that the "partial" I/O on the first card plus the >"partial" I/O on the second card equal a complete error free I/O? Acknowledgements should not be generated until a responder has received the data and placed it into the fault zone, i.e. the location where if a failure occurs, the session is aborted. If the NIC generates the acknowledgement, then it should have either delivered it to host memory or upon its failure detection, the host will fail the session. To do anything else adds undue complexity with little real application benefit. Hence, for fail-over from one set of hardware to another, there should be a clean indication of where one restarts the operation. In general, a sequence number on all data units can provide a faster recovery by no repeating the entire data set's retransmission. Is this worth it? For large transfers, i.e. measured in MB, yes; for small transfers, no. Again, there should be only one way to accomplish this in the spec and my preference would be to always sequence number all of these transactions and have the command interpretation decide whether to enforce that sequence number and the recovery starting point upon failure. Simplifies hardware and provides future flexibility. Mike
Home Last updated: Tue Sep 04 01:06:34 2001 6315 messages in chronological order |