|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: TCP RDMA option to accelerate NFS, CIFS, SCSI, etc.> From: "David R. Cheriton" <cheriton@cisco.com> > ... > Clearly, data is being received from hardware and software does not > get to touch it until it has been stored to some memory. My > assumption is that the storage system memory is arranged in fixed > size pages of disk/file pages. Without hardware RDMA to the storage > level, I believe one requires an extra copy, from whatever the > hardware delivers to what the storage system expects. Either > you use twice the bandwidth in the storage system memory system or > or else you have a separate memory system for the network, and > have software/processor power adequate to copy between at wire > speed (with all the associated support facilities for this processor.) > Unless there is something wrong with this reasoning, > it seems like a cost issue of providing the above hardware resources > vs. providing a NIC chip that can RDMA. Depending on how you are counting copies, that reasoning has been wrong in commercial UNIX systems for more than 10 years. Do you use the RDMA bits before IP checksum, the TCP checksum, and the medium FCS or checksum have been checked? If not, if you receive the entire link layer frame into some kind of temporary buffer or FIFO, probably in the "network interface card/controller," to check the trailing FCS and before using the RDMA bits, then commercial UNIX systems have been doing as you say to save copies since the late 1980's. As I said before, such systems were a part of what killed Protocol Engines Inc. If you do use the RDMA bits in the TCP header after 50-60 bytes of the frame have arrived, but before the frame FCS, aren't you worried about bit rot in the RDMA? > My guessitimate is that the software-only approach would be easily > 10 times more expensive here at the higher speed rates, of 10 Gbps. > If there is serious doubt about the merits of real hardware support, > we should try to quantify costs further at these speed ranges, IMHO. By "expensive," are you talking about dollars or bits/second? Regardless, if you look at the number of CPU cycles or gates in custom silicon required to support incoming page flipping in old, existing implementations, I bet you'll find that they are less "expensive" than any likely RDMA implementation. Power of 2 modular arithmetic is awfully cheap compared to parsing and validating TCP options. > ... > It would help me to have a more careful definition of the types of > attacks you have in mind. In an unsecure network with intruders, > presumably I can end up with bad data in the right buffer > or right data in the wrong buffer without using RDMA. > Do you view we have made things worse, and if so, how? > or are you objecting to us not making things better? Is it possible for a bad guy to use RDMA to put bad data into memory that is not a buffer? If the RID does no more than choose from a safe list of buffers, then how does RDMA usefully differ from the old FDDI, ATM, and HIPPI implementations that put incoming page-flippable data in buffers that get into user space with the data having been seen on the system bus the absolute minimum number of times for any scheme, including RDMA, once? Systems I've worked on have done mbuf allocation in the network interface hardware, including putting page-flippable payloads into page-mbufs that can eventually be flipped into user space. And of course, take care of the TCP or UDP checksum. Given the recently described extensions to readv(), absolutely all data received by a system like that would be page-flippable, and without needing the silicon or CPU cycles to parse RDMA options or requiring the sender to send RDMA options or even know that the receiver is being fast. Vernon Schryver vjs@rhyolite.com
Home Last updated: Tue Sep 04 01:08:17 2001 6315 messages in chronological order |