|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] SNACK and recoveryThis turns out to be a matter not just of rarity, but also one of consequences. As Mark points out, for tapes and similar devices, the consequences are disastrous - the backup aborts, and when "those in charge" come in the next morning, they have no usable backup tape, and are very unhappy. While Jon says "streams devices must support abort and retry for extreme errors in any case", the abort may well be the entire backup and the retry might be next weekend ... not a good situation. Over in Fibre Channel world, FCP-2 contains recovery support that resulted from the discovery that despite the fact that non- delivery of a Fibre Channel frame (Class 2 or 3 - it doesn't matter which) is "extremely rare": - Buffer overrun is prevented by both link and end-to-end buffer usage controls. - FC switches are engineered to not drop frames to the maximum extent possible due in part to these consequences. - There's a 32-bit CRC covering the entire FC frame. failure to deliver a frame happens often enough that a recovery mechanism is needed to avoid tape backup aborts and the like. Unlike TCP, Fibre Channel has no built-in retransmit mechanism. In contrast to Fibre Channel, we are dealing with something rarer because TCP retransmit will take care of most things that can go wrong in switches and there's a 16 bit checksum whose failure will trigger retransmits. What this appears to come down to is: - Does a 16-bit TCP checksum catch enough of the corruption events to make it acceptable to take drastic measures like aborting a backup when a 32 bit CRC fails on a response that made it through the 16 bit checksum? The discussion's been a bit convoluted. Some simple yes/no answers to the above question accompanied by short reasoning would be appreciated. I think Julian's said "no" and quoted a filesystem number that we're awaiting a reference to. Just to muddy the waters further, let me point out that tape targets tend to be less complex than disk targets. Tapes don't reorder commands, and often don't even queue them. Saving the last N responses is not that difficult when the responses go out in the order that the commands came in (easier to organize saving them), and the initiator has to be very careful about the number of commands in flight to avoid disasters caused by dropped commands (should lead to reasonable results from relatively small values of N). --David --------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 42 South St., Hopkinton, MA 01748 +1 (508) 435-1000 x75140 FAX: +1 (508) 497-8500 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------
Home Last updated: Tue Sep 04 01:05:10 2001 6315 messages in chronological order |