[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Wot I `know' about COWS in hardware
I 
think the COBS/COWS should more potential than the markers proposal, 
or
the 
ULP framing without COBS. There is one case where it does add more 
overhead,
but 
the question is how prevelent is the scenario - when outbound zero copy 
is
enabled/possible and the NIC does checksum offload and 
cannot be changed to
do 
COBS. Of course changes are required :-).
 
I also 
hope that data-centers will use acclerated iSCSI/Clustering HBAs/NICs 
rather
than 
the current solutions. The current solutions MAY be useful for 
desktops/laptops
where 
hopefully there are plenty of spare cycles to do COBS in 
software.
 
COBS 
also has alignment benefits - the header could be aligned with the ULP 
PDU,
and 
the ULP PDU can be aligned with the TCP header and there are no 
false
positives. The alignment with the TCP header may not 
always happen (the mythical
box in 
the middle that does TCP resegmentation), but can be detected - in 
the
presence of such a box, the performance could reduce to 
the levels encountered
when 
IP fragmentation happens.
 
I 
think it is better to have a couple of inter-operable implementations that 
demonstrate
the 
benefit of any of the alternate proposals (especially markers vs cobs) before 
selecting
one.
  
Jim, 
  
There are some things attractive about 
  COWS - 
1. the hard work - touching 
  every data word has to be done only by the sender (on the normal path) and can 
  be easily included in NIC with accelerator cards that seem to do a good job on 
  the send side 
2. If you are doing 
  CRC or IPsec on a client in software there is no additional penalty (provided 
  you can include the code in the right layer of software) as no data gets 
  moved 
3.It does not have to 
  associated with a TCP packet alignment - and can work in face of TCP 
  segmentation 
Julo 
  
"Jim Pinkerton" 
  <jpink@microsoft.com> wrote on 17-12-2001 17:32:04:
> 
> 
  My main concern with this approach is that we could kill the product 
  but
> win the spec wars. Specifically, this approach means that 
  an
> end-customer has one of two choices in deploying the 
  technology:
> 
>    1) Upgrade both ends, and they'll 
  see the full benefit
>    2) Upgrade only the server side, and 
  see roughly 2-4 times the
> CPU
>       utilization 
  on the client if their current 
>       implementation is 
  optimized
>       on the client side (a mere 2x if they 
  are doing
> significant
>       receives that 
  already require a copy, more like 4x if
> they
>     
    are primarily doing sends, which currently has no bcopy
> 
  in
>       many OS implementations).
> 
> 
  This means that if they pick option 2) and their machines are CPU 
  bound,
> that the data center capacity to handle requests will 
  actually
> *decrease* if they deploy the technology. If the front end 
  has enough
> idle CPU cycles, then they probably could select option 
  2).
> 
> In my experience, we need to make sure we have a volume 
  solution to
> enable NIC vendors to make enough profit to fund the next 
  generation
> (otherwise RDMA/TOE is a one-shot deal and won't keep up 
  with the CPU
> architecture). This means we need a path to the front-end 
  boxes in the
> data center. My concern is that there is no half-measure 
  in the above
> scenario - the IT manager must upgrade the thousand 
  front-end boxes at
> the same time as they upgrade the back end, or 
  deploy asymmetric configs
> where some front end boxes are upgraded. I'm 
  not sure how attractive
> this deployment scenario is.
> 
> 
  Howard and Uri, can you comment on this issue?
> 
> 
> 
  
> Jim
> 
> 
> 
> 
> 
> 
> 
  
> > -----Original Message-----
> > From: Stephen Bailey 
  [mailto:steph@cs.uchicago.edu]
> > Sent: Monday, December 17, 2001 
  7:13 AM
> > To: uri@broadcom.com; howard.c.herbert@intel.com
> 
  > Cc: csapuntz@cisco.com; Jim Pinkerton; Julian_Satran@il.ibm.com;
> 
  > allyn@cisco.com
> > Subject: Wot I `know' about COWS in 
  hardware
> > 
> > Hi,
> > 
> > I haven't 
  gotten a chance to do a full implementation yet, but here's
> > some 
  architectural properties I believe to be true of a hardware COWS
> > 
  implementation:
> > 
> >   1) can be implemented `in 
  line' on receive
> >   2) requires an MTU-sized RAM on 
  send
> >   3) expected touches to send RAM is 2 per data word 
  (just the `fifo'
> >      write and read ops, no 
  editing), assuming the headers are merged
> >      on 
  the outbound side.
> >   4) worst case touches to send RAM is 3 
  per data word (assuming every
> >      word must be 
  edited)
> >   5) eliminates the need for the funky `make sure 
  you don't send
> >      anything that's false positive 
  under non-nominal conditions'
> >      behavior of the 
  key/length proposal (I kinda doubted hardware
> >     
   impls were going to do this anyway, since it was a SHOULD).
> > 
  
> > Basically, it looks OK to me.  Slowing the sender is much 
  better than
> > slowing the receiver.  Theoretically, we could 
  reverse the pointer
> > chain and allow in-line send, but RAM on 
  receive, but that seems
> > stupid to me.
> > 
> > 
  It's clearly a design tradeoff whether you chose to use the COWS
> > 
  send-side RAM for other purposes, or not.
> > 
> > I'm 
  hoping you guys can tell me whether you think this blows your
> > 
  budget, or other noteworthy (unfortunate) properties.  As you can
> 
  > tell, I have the utmost enthusiasm for the mission (Dr. 
  Chandra...).
> > 
> > Thanks,
> >   
Steph