|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: TCP RDMA option to accelerate NFS, CIFS, SCSI, etc.] From: Charles Esson <charlese@cvs.com.au> ] I must have missed something. ] ] If we don't have this, you can take the destination port, convert to a ] table address, use the sequence number, ] do some calculations and come up with a buffer address and an offset. If ] you want to mess up the layering ] of your stack, they are all things you can do now. Standards committees don't like hashing. It looks complicated and insufficiently deterministic on an overhead projector. ] or using RDMA ] ... ] --->An attacker plays with the data so returned. ] ... That's a very good point. The tagging in RDMA cannot be used until after it has been validated by the receiver. The validating consists of looking at sequence numbers, RPC/XDR headers, etc. to figure out where the data can and should go, and then checking that the sender guessed right. Why not skip the last part and ignore the RDMA tag? Then why send the RDMA tag? ....... > From: Pete Zaitcev <zaitcev@metabyte.com> > Very well, but what about its companion document (SCOT)? > http://search.ietf.org/internet-drafts/draft-satran-scot-00.txt > It is published, isn't it? It was somewhat disturbing to see the > notice, but on the other hand it was honest. IBM could just as > easily come up silently with a silly software patent for RDMA option > or for SCSI over TCP idea as such. The IETF's protections against patent games are well intended, but nothing to worry about if you want to play them and nothing to rely upon if you don't. The history of IETF patent games demonstrates that the IETF is powerless to limit them (or worse), and that they're harder to play than the players hope. (E.g. PPP CCP and PPP 48-bit FCS, respectively) ...... } From: "Justin T. Gibbs" <gibbs@FreeBSD.org> } ... } >Can you elaborate on this? Suppose TCP "blindly" does zero copy everything to } >an app's buffer (for example, to a web browser's receive buffer) without } >RDMA. Then the browser app looks at the data and displays it. What is the } >difference RDMA makes in this case? Yes, RDMA can separate different messages } >in the buffer. But this can also be done by the browser app, not by TCP. } } You seem to be saying that in the common case zero copy is achievable. } Most implementations I've seen require the network driver to make } a guess about where the payload will be in an incoming packet so the header } can be stripped off and the payload dmaed to an aligned area. A page } flip is then performed to get the data where the user wants it, } imposing the restriction that your payload be page sized so you don't } leave gaps in the user's destination buffer. That is required only if you stick to the current API. Obvious, minor changes in the direction of some operating systems that existed before UNIX are sufficient to relax the page boundary requirement. To use RDMA, you have to change the API. } Certainly, with a more } intelligent network adapter that knows every protocol you can determine } exactly where the data is in each packet. If you add connection tracking } and sequence number sniffing to the nic with a mechanism to register user } buffers to connections, you can get zero copy every time*. Unfortunately } this is not very general purpose solution. Only standards committees and some academics care about "every protocol" or optimizing absolutely every application. The rest of us (including academics) only care about optimizing the important stuff. Also as you say, looking at sequence numbers in the interface and relaxing the sockets API rules about not touching any bytes in the buffer except those that are actually received lets you avoid copies all of the time. I don't see why that is not a general purpose solution, if you want one. } The point of RDMA seems to be } to allow nic manufacturers to add support for a single tcp option that, at } the very least, allows the nic to align the payload for you. Add RID } registration with the nic and you get the payload exactly where you want it } too. All without too much state information kept by the nic. I've been hearing since the mid-1980's proposals to do TSP lookups in the network interface instead of software because it is so incredibly difficult to find the right TSP quickly in software. I think those ideas are similar to the RDMA idea. They assume facts not in evidence, that there is a problem that needs to be solved, and that the solution is not worse than the nominal problem. There are reasons why such proposals appear in standards committees before implementations. ...... ] From: Lloyd Wood <l.wood@eim.surrey.ac.uk> ] Note the mentions of SCSI and SCSI/TCP and the tie-in with the ] proposed IP Storage efforts (recent ietf general list discussion). ] ] I'd still like to know _why_. ] ... ] SCSI DMA over TCP? What _is_ all this aiming for - trying to build ] distributed RAID arrays with really poor performance that are subject ] to WAN outages and DoS attacks? Why put SCSI over an protocol that measures RTT's, worries about congestion in routers, and that expects the error rates that come with 5000 miles of wire and 20 routers in the path? Does anyone really think that TCP/IP or even IP with it's 64K bit packet limit are remotely close to the right protocol, particularly given the existing and commercially available alternatives? A standards committee is the venue of first and last resort for such ideas, especially a committee that is related to currently trendy things like the SuperInfoHypeWay. .... ) From: julian_satran@il.ibm.com ) That is not completely accurate. You will need appreciably more silicon to ) do what you suggest. And you can do it only with information that "passes ) through the protocol" . Significantly silicon more than what to do what? Since the comment was addressed to me, I'll assume one 'what' was looking at sequence numbers, port numbers, and so forth to page flip. Clearly it takes more silicon to support page flipping in hardware than to not support page flipping in hardware. I will not agree that the required silicon is a big deal, not because I have a clue about floor plans and so forth (I don't), but because at a previous employeer I fought to keep the hardware guys from throwing in gates to do it. They had the silicon to spare and had heard so much about the wonderfulness of page flipping that they wanted to get in on the fun. Doing things in hardware is ok only if you absolutely must. Software is always better when it is good enough, because it is soft. ) The good thing about the proposal is that it can TAG whatever the ) application wants (and that can be several layers away from the protocol). ) You can't "page-flip" to buffers that you are not aware of. And page ) flipping wherever is applicable assumes also page boundaries for buffers. That's important only if you stick close to the sockets or UNIX read() API. If you are not ultra-conservative, and if you know a little of the history of file and device I/O API's, or of you think about such things for 10 seconds, then RDMA tagging becomes less interesting. To use RDMA tagging, you must abandone the UNIX read() API. If you change the API, then you may as well think about the whole problem instead of only a corner. If you let the operating system tell the application where the incoming data arrived, then you don't need elaborate hints from the sender to the receivers hardware to say where the receiving software will want the data. ) Vernon Schryver <vjs@calcite.rhyolite.com> on 25/02/2000 04:23:47 ) ) Please respond to Vernon Schryver <vjs@calcite.rhyolite.com> I did not write that! ..... ) From: Alan Cox <alan@lxorguk.ukuu.org.uk> ) ) > flip is then performed to get the data where the user wants it, ) > imposing the restriction that your payload be page sized so you don't ) > leave gaps in the user's destination buffer. Certainly, with a more ) ) Perhaps its about time the world put together an official, sane, ring buffer ) style mmap socket api. A lot of the requirement to align data is coming ) from the existing socket API. The IETF should not get involved in API's. There are plenty of other standards committees in that arena, as well as big commercial outfits including one in the U.S. Pacific Northwest. In other words, do you think the IETF would be more successful arguing with Microsoft about winsock than the IETF has been in dealing with Microsoft's obviously completely stupid and wrong PPP ideas? If you do get involved in standardizing such things, then *PLEASE* don't limit yourself to #$%$#@! ring buffers! The ancient Execelan and preceding (I've a mental block against the name starting with 'I') ring buffer notion was ok as an initial hack, but WRONG for something to go fast. To start, you don't need pointers or indeces that must be written by both the interface and the host. Vernon Schryver vjs@rhyolite.com
Home Last updated: Tue Sep 04 01:08:18 2001 6315 messages in chronological order |