SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: I-D ACTION:draft-cavanna-iscsi-crc-vs-cksum-00.txt



    
    
    Vince,
    
    We have now a good handle and plethora of references for CRC behavior in
    low noise symmetric channels (the ones you consider) and on bursts.
    
    To complete the comparison we will need formulas for the checksums you
    propose (Adler and Fletcher 32).
    
    We will corroborate it with information from other sources wherever
    available.
    
    Regards,
    Julo
    
    vince_cavanna@agilent.com on 07/03/2001 20:17:05
    
    Please respond to vince_cavanna@agilent.com
    
    To:   Julian Satran/Haifa/IBM@IBMIL
    cc:   pat_thaler@agilent.com, ips@ece.cmu.edu, Dafna_Sheinwald@il.ibm.com,
          vince_cavanna@agilent.com
    Subject:  RE: I-D ACTION:draft-cavanna-iscsi-crc-vs-cksum-00.txt
    
    
    
    
    Hi Julian,
    
    I will attempt to answer your questions about the probability of undetected
    error since your question strikes to the heart of what I was trying to
    point
    out in my I-D on CRCs vs. Checksums, i.e. where CRCs shine and where they
    do
    are not as good as many people think.
    
    For a link where random noise with Gaussian probability distribution and
    White spectral density is the source of errors, it is easy to determine the
    probability, Pu, of undetected error. This type of link is not only
    amenable
    to analysis but is a good model for many well designed links.
    
    For this type of link the BER (bit error ratio) is determined by the
    probability that the noise amplitude will exceed the signal amplitude and
    can be easily shown to be BER(z) = 1/2 erfc(z/sqroot(2)) where erfc is the
    complementary error function and z is the SNR (signal to noise ratio), that
    is, the signal amplitude as a multiple of the standard deviation [note1] of
    the noise. For example, at a SNR of 7, the BER is 10^(-12). With this noise
    mechanism and small BER,  patterns with a small number of erroneous bits
    are
    far more probable than those with large numbers or errors and the
    probability of getting a certain number of errors, x,  decreases
    monotonically and rapidly with increasing x. I have made a mathcad
    worksheet
    showing this dramatically in graphical form. Ask me if you want it.
    
    The number of errors in a pattern is a random variable with binomial
    distribution. The probability of x errors in n bits, where the probability
    of each bit being in error is BER, may be gotten from the binomial
    distribution with parameters n and p. The number of independent trials, n,
    is the number of bits; the probability of success in each trial, p, is the
    BER.
    
    The binomial distribution allows us to calculate the probability of a
    pattern with a certain number of errors and we can use it to find a bound
    for the probability of x errors at a given BER [note 2].
    
    Let's take an ethernet-like link using some kind of group encoding such as
    8B:10B. It is trivial to design a CRC polynomial that will catch all two
    bit
    errors. Two or less bits in error result in no more than 2 bursts of size 8
    (due to 8B:10B encoding). If the CRC is designed to catch all such double
    bursts (easy to do and the CRC-32 used in ethernet does) all such errors
    are
    caught. An undetected error thus occurs only if there are 3 or more bits in
    error on the link within a frame - this is conservative since most such
    errors will also be caught but I will assume none are. I am also neglecting
    the fact that it is easy to design a CRC to catch all 3 bit errors. I don't
    want to take advantage of it because 3 bit errors may (again due to group
    encoding) result in 3 bursts which are not so easy to detect.
    
    With this error mechanism large bursts happen because of the group
    encoding.
    Without group encoding it would be extremely unlikely to get a burst of
    size
    3 or larger.
    
    The probability of 3 or more bits is approximately equal to the probability
    of exactly 3 bits since as long as p << 0.5 the probability of a pattern
    having x errors decreases rapidly as x increases.
    
    For a BER of 10^(-12) the prob of 3 or more  errors in 1500*8 bits
    (ethernet
    frame) is approximately the probability of exactly 3 errors in 1500*8 bits
    which is 5.62 x 10^(-25). At 10 gig this represents a mean time between
    undetected errors of 5.6 x 10^6 years. This assumes we catch all 2 bit
    errors. If we could only catch single bit errors the mean time between
    undetected errors would be 10 days! What a difference a bit makes!! Also
    the
    probability of exactly 4 errors is 2.1 x10^(-33). Again, what a difference
    a
    bit makes! I have a mathcad spreadsheet where I have done this analysis for
    10 gig ethernet. Let me know if you are interested in a copy.
    
    This is why CRC is so nice for a link with this error source. We can claim
    practically complete protection. Ther are no holes.
    
    For cases, such as iSCSI protection environment, where the BER is not
    determined by the SNR then we may not be able to claim that a small number
    of erroneous bits are far more probable than a large number of them and the
    analysis is not as simple  and in such cases CRCs are not necessarily
    better
    than checksums - it depends on how probable the patterns of errors that are
    undetected are.
    
    Vince
    
    
    notes:
    
    [1] strictly speaking I should have said rms value instead of standard
    deviation but for a random process for which the time statistics are the
    same as the ensemble statistics (an ergodic process) the two are
    equivalent.
    
    [2] Binomial distribution wiht parameters n and p: P(x,n,p) =
    (n!/(x!*(n-x))*(p^x(1-p)^(n-x))
    
    |-----Original Message-----
    |From: THALER,PAT (A-Roseville,ex1)
    |Sent: Tuesday, March 06, 2001 4:47 PM
    |To: julian_satran@il.ibm.com; ips@ece.cmu.edu
    |Cc: THALER,PAT (A-Roseville,ex1); ips@ece.cmu.edu;
    |Dafna_Sheinwald@il.ibm.com; CAVANNA, VICENTE
    |Subject: RE: I-D ACTION:draft-cavanna-iscsi-crc-vs-cksum-00.txt
    |
    |
    |Julian,
    |
    |Good summary. One correction. "codes shorter than m" is
    |shorter than m values not m bits and m is about 64 K.
    |Therefore, it is shorter than about 63 kbytes for Adler and
    |shorter than 127 kbytes for Fletcher.
    |
    |I'm thinking about your question about Pud but it will take me
    |a bit to come up with an answer. For the three bit error case,
    |I think the formulas used for CRC bit errors will be
    |approximately correct. For the burst errors and the errors
    |where two (Fletcher only) or three non-contiguous bytes are
    |corrupted, I'm not sure what to use for the probability that a
    |byte gets corrupted.
    |
    |Regards,
    |Pat
    |
    |-----Original Message-----
    |From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com]
    |
    |Pat,
    |
    |Thanks. I am not going to check every line for awhile and I
    |assume it is
    |correct.
    |For the benefit of the list the summary of the statement is that:
    |
    |-the burst protection is for 16 bit (I recall the text saying 32)
    |-the numbers to be used for low noise symmetric channel protection are:
    |               - dmin= 3 for codes shorter than m (approx 8kbytes)
    |               - dmin= 2 for codes longer than m
    |
    |Do you have a formula to evaluate Pud for low noise channels
    |with BER = e?
    |Any formula to evaluate the conditional probability of an
    |undetected error
    |on bursts of 17, 18, 19 bits?
    |
    |Is there any reason to believe hat we can approximate them to
    |ones used for
    |CRC?
    |
    |I would like to have all the codes readily comparable.
    |
    |Regards,
    |Julo
    |
    |"THALER,PAT (A-Roseville,ex1)" <pat_thaler@agilent.com> on 06/03/2001
    |19:24:13
    |
    |Please respond to "THALER,PAT (A-Roseville,ex1)"
    |<pat_thaler@agilent.com>
    |
    |To:   Julian Satran/Haifa/IBM@IBMIL, "THALER,PAT (A-Roseville,ex1)"
    |      <pat_thaler@agilent.com>
    |cc:   ips@ece.cmu.edu, Dafna_Sheinwald@il.ibm.com
    |Subject:  RE: I-D ACTION:draft-cavanna-iscsi-crc-vs-cksum-00.txt
    |
    |
    |
    |
    |Julian,
    |
    |It is relatively easy to investigate the Hamming distance and burst
    |detection of Adler32 and Fletcher32 using mathematics. In the argument
    |below, I will use "value" to identify the quantities that get
    |summed since
    |Adler sums bytes and Fletcher32 sums 16-bits.
    |
    |Both Adler and Fletcher calculate for each value:
    |
    | S1 = (S1 + value) mod m
    | S2 = (S2 + S1) mod m
    |
    |where m is 65521 for Adler and 65535 for Fletcher.
    |
    |Which means that for an n value string of values V1 to Vn:
    | S1 = (V1 + V2 + ... + Vn (+ 1 for Adler)) mod m
    | S2 = (n * V1 + (n-1) * V2 + ... + Vn (+ n for Adler)) mod m
    |
    |A two bit error can only change two values. For a two bit error to be
    |undetectable the two changed values must produce no change to
    |S1 and S2.
    |
    |Let
    | x be the change to the first errored value
    | y be the change to the second errored value
    | k be the separation between the first and second errored values.
    |
    |Then for an undetectable error:
    |
    | delta S1 = (x + y) mod m = 0
    | delta S2 = ((k + 1) * x + y) mod m = 0
    |          = (k * x + x + y) mod m =0
    |          = (k * x) mod m = 0          because (x + y) mod m = 0
    |
    |Therefore, for an undetectable 2 bit error, k * x must be a
    |multiple of m.
    |
    |For Adler, m is prime and the maximum value of x is 255 so the
    |only way to
    |satisfy the condition is for k to equal 65521. Then x can be
    |any value so
    |it
    |could be a one bit error such as changing the lsb of the first
    |value from 0
    |to 1. y would then need to be the inverse changing the lsb of
    |the second
    |value from 1 to 0.
    |
    |For Fletcher, m is factorable to 3, 5, 17, 257. For x to provide one or
    |more
    |of those factors, it would have to have more than a 1 bit change and y
    |would
    |have to have the inverse of that change which would mean an error of at
    |least 4 bits. The only way to get a two bit undetectable error
    |is for k to
    |equal 65535.
    |
    |So, Fletcher and Adler do have 2-bit undetectable errors if
    |the messages
    |are
    |longer than the modulus.
    |
    |What about burst protection? For Fletcher, we know that a
    |change of a value
    |from all 0's to all 1's produces an an undetectable error so
    |we know that
    |burst protection is less than 16. k = 1 for a burst spread across two
    |consecutive values. Therefore, the only way for a burst spread
    |across two
    |consecutive values to produce an undetectable error is for x to equal
    |65535.
    |A burst shorter than 16 can't do it.
    |
    |For Adler, corrupting two consecutive bytes can't produce an
    |undetectable
    |error because that would be the case above where k equals 1.
    |There is no
    |way
    |for a burst across two consectutive values to be undetectable in Adler
    |since
    |x is less than the modulus.
    |
    |For a burst across three values:
    |
    |let x equal the error in the first value
    |    y equal the error in the second value
    |    z equal the error in the third value.
    |
    |For the burst to be undetectable
    |
    | Delta S1 = (x + y + z) mod m = 0
    | Delta S2 = (3x + 2y + z) mod m = 0
    |
    |Since the maximum values of x, y, and z are 255 and m is more
    |than 6 times
    |that, the sums above will always be less than m and we can
    |drop the mod m.
    |
    | Delta S2 = (3x + 2y + z) = 2x + y + (x + y + z) = 0
    |          = 2x + y = 0                    because x + y + z = 0
    |
    |        y = - 2x
    |
    | Delta S1 = x + y + z = 0
    |          = x - 2x + z = 0
    |          = -x + z
    |        z = x
    |
    |So an undetectable error is created when in three consecutive
    |values the
    |errors are
    |    x
    |  -2x
    |    x
    |
    |This could for instance be errors of 1, -2, and 1. Since the
    |error to the
    |first and third values are the same, the error must span at
    |least 2 * the
    |length of the value + 1. So for Adler, it must be a 17 bit
    |burst at least.
    |
    |Also, note that this can be created by a 3 bit error and is equally
    |applicable to Adler and Fletcher.
    |
    |Therefore, for messages shorter than the modulus + 1, the
    |Hamming distance
    |of Adler32 and Fletcher 32 is 3 and, for messages longer than that, the
    |Hamming distance is 2.
    |
    |Regards,
    |Pat Thaler
    |
    |
    |
    |
    |
    |-----Original Message-----
    |From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com]
    |
    |Pat,
    |
    |I did not run a check. That is very expensive.
    |With CRCs there are methods that enable you to run a check on the
    |complement code
    |(that has only 2**32 different patterns for any block length)
    |and derive
    |from there the distances (Fujiwara has done this in 1989 for the IEEE
    |CRC-32 and about some more recent experiments I'll get back to
    |the list).
    |
    |And that is the trouble with this whole line of argument.
    |There are no numbers to prove Adler32 or Fletcher32 and there
    |are plenty
    |for CRCs.
    |
    |The big question is then is there anybody out there that wants
    |to build a
    |modern bridge based only on its beauty?
    |
    |Regards,
    |Julo
    |
    |
    |pat_thaler@agilent.com on 05/03/2001 23:27:49
    |
    |Please respond to pat_thaler@agilent.com
    |
    |To:   Julian Satran/Haifa/IBM@IBMIL
    |cc:
    |Subject:  RE: I-D ACTION:draft-cavanna-iscsi-crc-vs-cksum-00.txt
    |
    |
    |
    |
    |Julian,
    |
    |I know that Hamming distance gets down to 2 for errors that
    |are separated
    |by
    |the modulus (or a multiple of it). Is there another case?
    |
    |Pat
    |
    |> - Adler and Fletcher are weak and there is no theory behind your
    |> distribution statements, nor any simulation results as far
    |as I know.  We
    |> found that on very simple sequences the Hamming distance gets
    |> down to 2 (or
    |> lower) and the burst protection is probably not better than 16 bit.
    |There
    |> is even a simple formula for what sequences will get you
    |false codes (see
    |> bellow for a reference)
    |>
    |
    |
    |
    |
    |
    
    
    
    


Home

Last updated: Tue Sep 04 01:05:25 2001
6315 messages in chronological order