Some Thoughts on Digests

To: <ips@ece.cmu.edu>
Subject: Some Thoughts on Digests
From: "Jim Williams" <jimw@giganet.com>
Date: Tue, 5 Dec 2000 12:41:04 -0500
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;charset="Windows-1252"
Sender: owner-ips@ece.cmu.edu


draft-ietf-ips-iSCSI-02b.txt states:


>   CRC32 is effective only for limited data lengths (the probability of 
>   an error going undetected grows linearly with data length). When 
>   using CRC32-2K the digest size increases with data length.

I believe this is a bit of an oversimplification.  I will elaborate
a bit. 

The absolute probability of an undetected error will of course
increase at least linearly with message length because the probability
of an error occurring in the first place will increase linearly.
This is true also in the case where the message is chopped into
2K segments and a CRC is calculated for each segment.  If the
probability that a given segment has an undetected error is P,
then the probability that one of N segments has an error is 
approximately N*P.

What is the conditional probability that if there is an error
it will be undetected?  This is more complicated.  If the total
extent of the error is less than or equal to 32 bits, then the
probability it will be undetected by the CRC is zero.  Therefore
if one assumes that individual error events will be of extent
less than or equal to 32 bits, then the only way an undetected
error can occur is for at least two independent errors to 
occur in the same segment.  The probability of this occurring
will increase approximately linearly with the segment size.
I would say this with two caveats, however.  First the assumption
that individual error events will have extent less than or
equal to 32 bits is questionable.  And second, no matter
how long the segment, the conditional probability that
if the segment contains an error, it will be undetected by
the CRC will NEVER be more that 2^-32.

I would argue that based on this the added complexity of 
segmenting the message into 2K blocks for CRC computation
is not justified and the CRC should NOT be considered 
ineffective for large blocks of data.  (Unless you are
prepared to argue that a digest that misses one in 2^32
errors is ineffective.)

--------------

CRC Polynomial

I would argue that of all the prime polynomials of order 32,
the one selected for iSCSI is the WORST one.

If a block of data is protected by a CRC-32 and the result is
embedded inside another block which is also protected by
a CRC-32, then the combined protection is effectively 64 bits.
The maximum probability of an undetected error is 2^-64.  UNLESS
both CRCs use the same polynomial, in which case the combined
protection is no better than the protection of only the outer
CRC.

The proposed CRC of iSCSI uses the same polynomial as the 
Ethernet CRC.

Since iSCSI data will typically be contained inside Ethernet
frames, the iSCSI CRC should use a different polynomial than
Ethernet.

The CRC section of
http://www.ietf.org/internet-drafts/draft-dicecco-vitcp-01.txt
contains an example of a better CRC.

Arguments against using a CRC algorithm other than CCITT include
the following:

1.  The CCITT CRC-32 has been studied extensively and using
    a different algorithm would incur unnecessary risk.

2.  Using a standard CRC algorithm allows use of existing 
    hardware and software implementations.

3.  Referencing an existing CRC algorithm saves work in 
    adequately documenting the algorithm.

With respect to #1, I would hope this could be referred to
some acknowledged expert in the CRC field.  I am sure you
will find no studies on the effectiveness of the CRC which
depend on any properties of the polynomial not shared by
other polynomials such as the one called out in the above
example.

With respect to #2, it is unlikely that any existing hardware
can be used and likely that any iSCSI implementation will require
building new ASICs.  Designing the polynomial specific
section of a high speed CRC unit should take no more than
a few days.  Having done this, I can speak from experience.
For software implementations, the arguments are similar,
but the work is a lot less.

With respect to #3, since the entire algorithm stays the 
same except one constant (the polynomial) this should not
be too bad.  New test vectors would of course need to be
generated.

---------------

With respect to the HMAC functions, do we need both SHA
and MD5?  I would expect hardware vendors may choose one
or the other to implement in hardware.  It would be nice
if different suppliers chose the same one.

Arguably this is outside the scope of the standard, but
recommendations as to the preferred digest algorithms
to implement in hardware might result in better
interoperability of the resulting products that emerge
from the standard.

Prev by Date: Re: iSCSI draft 02: digests
Next by Date: A question on Zero Copy
Prev by thread: IESG review of IPS use of URG Pointer
Next by thread: Re: Some Thoughts on Digests
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:06:11 2001
6315 messages in chronological order