|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Choice of ESP alg. for IPS/IPSec - 3DES-CBC vs. 3DES-CBC-IHello, Re: Choice of ESP alg. in http://www.ietf.org/internet-drafts/draft-ietf-ips-security-06.txt Question: As noted, we need an algorithm implementable in hardware at speeds of up to 10Gbps, as well as being efficient for implementation in software at speeds of 100Mbps or slower. AES-CTR is an excellent solution. But then it will take time to get approved and further time to get "time tested" before being adopted. Even after adotion of AES-CTR, 3DES-CBC will need to co-exist for many years to come. 3DES-CBC does not gracefully scale to 10Gbps for two reasons: 1. Frequent rekeying at 10Gbps: This issue is discussed in depth in the draft. Although very inconvenient, state-of-art IKE stacks (esp. when running on off-load processor) can deal with it. 2. Lack of pipeline-ability: The feedback loop dictated by CBC prohibits pipelined high-speed VLSI implementation of the 3DES-CBC engine. The ANSI standard X9.52-1998 which specifies 3DES-CBC(TCBC) also specifies an equally standard variant called TCBC-I(say 3DES-CBC-Interleaved) with same security properties. The effort required to enhance existing software and VLSI implementations of 3DES-CBC to 3DES-CBC-I is "minor". 3DES-CBC can be realized simply thru' a degenerate usage of the 3DES-CBC-I module. On the positive side, it brings "substantial" savings in multi-gig VLSI implementation. Was the candidate ESP algorithm 3DES-CBC-I (superset of 3DES-CBC) considered for the SHOULD implement option? Eventually something like AES-CTR will pervade, but for the interm this is indeed a low-cost option to get to speeds up to 10Gbps. Comments on the VLSI implementation: A 3DES(not 3DES-CBC) engine by itself is highly pipeline-able and can pump 10Gbps even on an FPGA. However for 3DES-CBC, one has to wait for 3DES to be completed on a given 64-bit symbol before commencing 3DES on the next symbol. As a result, a "single" 3DES-CBC engine max throughput is somewhere above 1Gbps, depending on the process technology. As usual, there is a brute-force solution to the problem which requires use of multiple 3DES-CBC units. These engines take up significant silicon real estate. The implementation complexity is not just due to the multiplicity of 3DES-CBC units but more so due to all the "incidental" kitchen-sinks and bath-tubs that get thrown into the cauldron to support the multiplicity: scheduler, buffers per engine(think jumbo frames), keeping track of contexts (10Gbps traffic could all belong to the one connection or multiple connections), latency, power, ... 3DES-CBC-I partitions the symbol stream into three sub-streams so that a single engine with three pipeline stages can pump 3X throughput and hence bring about a 3X reduction in the kitchen-sink count and complexity. Further more: At the time 3DES-CBC-I was conceived multi-gig throughput at the network end-point was probably not anticipated(my guess). As a result, they stopped at tri-partitioning or 3-levels of interleaving(my guess). After all it is only the IP Storage application that is pioneering multi-gig IPSec throughput at the end point. If we used 8-levels of interleaving we can pump all 10Gbps of throughput through a single engine using current process technologies. No kitchen-sinks, no bath-tubs! Thoughts, Comments, Concerns ? -Shridhar Mukund
Home Last updated: Sat Dec 01 01:17:58 2001 7967 messages in chronological order |