RE: I-D ACTION:draft-cavanna-iscsi-crc-vs-cksum-00.txt

To: julian_satran@il.ibm.com
Subject: RE: I-D ACTION:draft-cavanna-iscsi-crc-vs-cksum-00.txt
From: vince_cavanna@agilent.com
Date: Wed, 7 Mar 2001 11:17:05 -0700
Cc: pat_thaler@agilent.com, ips@ece.cmu.edu, Dafna_Sheinwald@il.ibm.com, vince_cavanna@agilent.com
Content-Type: text/plain;charset="ISO-8859-1"
Sender: owner-ips@ece.cmu.edu
Hi Julian,

I will attempt to answer your questions about the probability of undetected
error since your question strikes to the heart of what I was trying to point
out in my I-D on CRCs vs. Checksums, i.e. where CRCs shine and where they do
are not as good as many people think.

For a link where random noise with Gaussian probability distribution and
White spectral density is the source of errors, it is easy to determine the
probability, Pu, of undetected error. This type of link is not only amenable
to analysis but is a good model for many well designed links.

For this type of link the BER (bit error ratio) is determined by the
probability that the noise amplitude will exceed the signal amplitude and
can be easily shown to be BER(z) = 1/2 erfc(z/sqroot(2)) where erfc is the
complementary error function and z is the SNR (signal to noise ratio), that
is, the signal amplitude as a multiple of the standard deviation [note1] of
the noise. For example, at a SNR of 7, the BER is 10^(-12). With this noise
mechanism and small BER,  patterns with a small number of erroneous bits are
far more probable than those with large numbers or errors and the
probability of getting a certain number of errors, x,  decreases
monotonically and rapidly with increasing x. I have made a mathcad worksheet
showing this dramatically in graphical form. Ask me if you want it.

The number of errors in a pattern is a random variable with binomial
distribution. The probability of x errors in n bits, where the probability
of each bit being in error is BER, may be gotten from the binomial
distribution with parameters n and p. The number of independent trials, n,
is the number of bits; the probability of success in each trial, p, is the
BER.

The binomial distribution allows us to calculate the probability of a
pattern with a certain number of errors and we can use it to find a bound
for the probability of x errors at a given BER [note 2].

Let's take an ethernet-like link using some kind of group encoding such as
8B:10B. It is trivial to design a CRC polynomial that will catch all two bit
errors. Two or less bits in error result in no more than 2 bursts of size 8
(due to 8B:10B encoding). If the CRC is designed to catch all such double
bursts (easy to do and the CRC-32 used in ethernet does) all such errors are
caught. An undetected error thus occurs only if there are 3 or more bits in
error on the link within a frame - this is conservative since most such
errors will also be caught but I will assume none are. I am also neglecting
the fact that it is easy to design a CRC to catch all 3 bit errors. I don't
want to take advantage of it because 3 bit errors may (again due to group
encoding) result in 3 bursts which are not so easy to detect.

With this error mechanism large bursts happen because of the group encoding.
Without group encoding it would be extremely unlikely to get a burst of size
3 or larger. 

The probability of 3 or more bits is approximately equal to the probability
of exactly 3 bits since as long as p << 0.5 the probability of a pattern
having x errors decreases rapidly as x increases.

For a BER of 10^(-12) the prob of 3 or more  errors in 1500*8 bits (ethernet
frame) is approximately the probability of exactly 3 errors in 1500*8 bits
which is 5.62 x 10^(-25). At 10 gig this represents a mean time between
undetected errors of 5.6 x 10^6 years. This assumes we catch all 2 bit
errors. If we could only catch single bit errors the mean time between
undetected errors would be 10 days! What a difference a bit makes!! Also the
probability of exactly 4 errors is 2.1 x10^(-33). Again, what a difference a
bit makes! I have a mathcad spreadsheet where I have done this analysis for
10 gig ethernet. Let me know if you are interested in a copy.

This is why CRC is so nice for a link with this error source. We can claim
practically complete protection. Ther are no holes.

For cases, such as iSCSI protection environment, where the BER is not
determined by the SNR then we may not be able to claim that a small number
of erroneous bits are far more probable than a large number of them and the
analysis is not as simple  and in such cases CRCs are not necessarily better
than checksums - it depends on how probable the patterns of errors that are
undetected are.

Vince


notes:

[1] strictly speaking I should have said rms value instead of standard
deviation but for a random process for which the time statistics are the
same as the ensemble statistics (an ergodic process) the two are equivalent.

[2] Binomial distribution wiht parameters n and p: P(x,n,p) =
(n!/(x!*(n-x))*(p^x(1-p)^(n-x))

|-----Original Message-----
|From: THALER,PAT (A-Roseville,ex1) 
|Sent: Tuesday, March 06, 2001 4:47 PM
|To: julian_satran@il.ibm.com; ips@ece.cmu.edu
|Cc: THALER,PAT (A-Roseville,ex1); ips@ece.cmu.edu;
|Dafna_Sheinwald@il.ibm.com; CAVANNA, VICENTE
|Subject: RE: I-D ACTION:draft-cavanna-iscsi-crc-vs-cksum-00.txt
|
|
|Julian,
|
|Good summary. One correction. "codes shorter than m" is 
|shorter than m values not m bits and m is about 64 K. 
|Therefore, it is shorter than about 63 kbytes for Adler and 
|shorter than 127 kbytes for Fletcher. 
|
|I'm thinking about your question about Pud but it will take me 
|a bit to come up with an answer. For the three bit error case, 
|I think the formulas used for CRC bit errors will be 
|approximately correct. For the burst errors and the errors 
|where two (Fletcher only) or three non-contiguous bytes are 
|corrupted, I'm not sure what to use for the probability that a 
|byte gets corrupted.
|
|Regards,
|Pat
|
|-----Original Message-----
|From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com]
|
|Pat,
|
|Thanks. I am not going to check every line for awhile and I 
|assume it is
|correct.
|For the benefit of the list the summary of the statement is that:
|
|-the burst protection is for 16 bit (I recall the text saying 32)
|-the numbers to be used for low noise symmetric channel protection are:
|               - dmin= 3 for codes shorter than m (approx 8kbytes)
|               - dmin= 2 for codes longer than m
|
|Do you have a formula to evaluate Pud for low noise channels 
|with BER = e?
|Any formula to evaluate the conditional probability of an 
|undetected error
|on bursts of 17, 18, 19 bits?
|
|Is there any reason to believe hat we can approximate them to 
|ones used for
|CRC?
|
|I would like to have all the codes readily comparable.
|
|Regards,
|Julo
|
|"THALER,PAT (A-Roseville,ex1)" <pat_thaler@agilent.com> on 06/03/2001
|19:24:13
|
|Please respond to "THALER,PAT (A-Roseville,ex1)" 
|<pat_thaler@agilent.com>
|
|To:   Julian Satran/Haifa/IBM@IBMIL, "THALER,PAT (A-Roseville,ex1)"
|      <pat_thaler@agilent.com>
|cc:   ips@ece.cmu.edu, Dafna_Sheinwald@il.ibm.com
|Subject:  RE: I-D ACTION:draft-cavanna-iscsi-crc-vs-cksum-00.txt
|
|
|
|
|Julian,
|
|It is relatively easy to investigate the Hamming distance and burst
|detection of Adler32 and Fletcher32 using mathematics. In the argument
|below, I will use "value" to identify the quantities that get 
|summed since
|Adler sums bytes and Fletcher32 sums 16-bits.
|
|Both Adler and Fletcher calculate for each value:
|
| S1 = (S1 + value) mod m
| S2 = (S2 + S1) mod m
|
|where m is 65521 for Adler and 65535 for Fletcher.
|
|Which means that for an n value string of values V1 to Vn:
| S1 = (V1 + V2 + ... + Vn (+ 1 for Adler)) mod m
| S2 = (n * V1 + (n-1) * V2 + ... + Vn (+ n for Adler)) mod m
|
|A two bit error can only change two values. For a two bit error to be
|undetectable the two changed values must produce no change to 
|S1 and S2.
|
|Let
| x be the change to the first errored value
| y be the change to the second errored value
| k be the separation between the first and second errored values.
|
|Then for an undetectable error:
|
| delta S1 = (x + y) mod m = 0
| delta S2 = ((k + 1) * x + y) mod m = 0
|          = (k * x + x + y) mod m =0
|          = (k * x) mod m = 0          because (x + y) mod m = 0
|
|Therefore, for an undetectable 2 bit error, k * x must be a 
|multiple of m.
|
|For Adler, m is prime and the maximum value of x is 255 so the 
|only way to
|satisfy the condition is for k to equal 65521. Then x can be 
|any value so
|it
|could be a one bit error such as changing the lsb of the first 
|value from 0
|to 1. y would then need to be the inverse changing the lsb of 
|the second
|value from 1 to 0.
|
|For Fletcher, m is factorable to 3, 5, 17, 257. For x to provide one or
|more
|of those factors, it would have to have more than a 1 bit change and y
|would
|have to have the inverse of that change which would mean an error of at
|least 4 bits. The only way to get a two bit undetectable error 
|is for k to
|equal 65535.
|
|So, Fletcher and Adler do have 2-bit undetectable errors if 
|the messages
|are
|longer than the modulus.
|
|What about burst protection? For Fletcher, we know that a 
|change of a value
|from all 0's to all 1's produces an an undetectable error so 
|we know that
|burst protection is less than 16. k = 1 for a burst spread across two
|consecutive values. Therefore, the only way for a burst spread 
|across two
|consecutive values to produce an undetectable error is for x to equal
|65535.
|A burst shorter than 16 can't do it.
|
|For Adler, corrupting two consecutive bytes can't produce an 
|undetectable
|error because that would be the case above where k equals 1. 
|There is no
|way
|for a burst across two consectutive values to be undetectable in Adler
|since
|x is less than the modulus.
|
|For a burst across three values:
|
|let x equal the error in the first value
|    y equal the error in the second value
|    z equal the error in the third value.
|
|For the burst to be undetectable
|
| Delta S1 = (x + y + z) mod m = 0
| Delta S2 = (3x + 2y + z) mod m = 0
|
|Since the maximum values of x, y, and z are 255 and m is more 
|than 6 times
|that, the sums above will always be less than m and we can 
|drop the mod m.
|
| Delta S2 = (3x + 2y + z) = 2x + y + (x + y + z) = 0
|          = 2x + y = 0                    because x + y + z = 0
|
|        y = - 2x
|
| Delta S1 = x + y + z = 0
|          = x - 2x + z = 0
|          = -x + z
|        z = x
|
|So an undetectable error is created when in three consecutive 
|values the
|errors are
|    x
|  -2x
|    x
|
|This could for instance be errors of 1, -2, and 1. Since the 
|error to the
|first and third values are the same, the error must span at 
|least 2 * the
|length of the value + 1. So for Adler, it must be a 17 bit 
|burst at least.
|
|Also, note that this can be created by a 3 bit error and is equally
|applicable to Adler and Fletcher.
|
|Therefore, for messages shorter than the modulus + 1, the 
|Hamming distance
|of Adler32 and Fletcher 32 is 3 and, for messages longer than that, the
|Hamming distance is 2.
|
|Regards,
|Pat Thaler
|
|
|
|
|
|-----Original Message-----
|From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com]
|
|Pat,
|
|I did not run a check. That is very expensive.
|With CRCs there are methods that enable you to run a check on the
|complement code
|(that has only 2**32 different patterns for any block length) 
|and derive
|from there the distances (Fujiwara has done this in 1989 for the IEEE
|CRC-32 and about some more recent experiments I'll get back to 
|the list).
|
|And that is the trouble with this whole line of argument.
|There are no numbers to prove Adler32 or Fletcher32 and there 
|are plenty
|for CRCs.
|
|The big question is then is there anybody out there that wants 
|to build a
|modern bridge based only on its beauty?
|
|Regards,
|Julo
|
|
|pat_thaler@agilent.com on 05/03/2001 23:27:49
|
|Please respond to pat_thaler@agilent.com
|
|To:   Julian Satran/Haifa/IBM@IBMIL
|cc:
|Subject:  RE: I-D ACTION:draft-cavanna-iscsi-crc-vs-cksum-00.txt
|
|
|
|
|Julian,
|
|I know that Hamming distance gets down to 2 for errors that 
|are separated
|by
|the modulus (or a multiple of it). Is there another case?
|
|Pat
|
|> - Adler and Fletcher are weak and there is no theory behind your
|> distribution statements, nor any simulation results as far 
|as I know.  We
|> found that on very simple sequences the Hamming distance gets
|> down to 2 (or
|> lower) and the burst protection is probably not better than 16 bit.
|There
|> is even a simple formula for what sequences will get you 
|false codes (see
|> bellow for a reference)
|>
|
|
|
|
|
Prev by Date: RE: iSCSI: Naming and Discovery Draft...
Next by Date: FW: PKI tech talk continued
Prev by thread: RE: I-D ACTION:draft-cavanna-iscsi-crc-vs-cksum-00.txt
Next by thread: RE: I-D ACTION:draft-cavanna-iscsi-crc-vs-cksum-00.txt
Index(es):
- Date
- Thread
Home
Last updated: Tue Sep 04 01:05:26 2001
6315 messages in chronological order