|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: SNACK and recovery> - Does a 16-bit TCP checksum catch enough of > the corruption events to make it acceptable to > take drastic measures like aborting a backup > when a 32 bit CRC fails on a response that > made it through the 16 bit checksum? Absolutely. Events which create end-to-end integrity check errors are as handily caught by TCP checksum as a CRC. Link errors are caught by link integrity checks, so that is not for the e2e check to protect. The remaining errors which are detectable by an e2e check have a signature that most any check that's not blind stupid will detect. For example, back in the day, VMS's clustering software ran on Ethernet, and there were many problems as a result of an early generation Ethernet controller (my group...) corrupting data. So, the VMS folks said, to heck with performance, we're going to put a checksum on every cluster packet. Problem absolutely solved. I don't know what the checksum algorithm was, but it was not a CRC. It was more like the TCP checksum. The TCP checksum escape evidence in the papers seems to be primarly in paths which are not actually protected by it (host end points). Looking at it from the other direction, backups have historically always had to handle occasional problems, which has resulted in the implementation of high-level recovery mechanisms. Who can say with absolute certainly, and first-hand experience that there WILL be a high frequency of checksum escapes which don't also escape a CRC? It seems a somewhat unlikely scenario, and my concern is that we're making, complicated, incremental improvements for handling a situation which will not occur. It would be one thing if there were NO e2e check, or if the e2e check also had to protect against link errors, or if the existing e2e check were completely trivial, but that is just not the case here. Steph
Home Last updated: Tue Sep 04 01:05:08 2001 6315 messages in chronological order |