|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: (iSCSI) A question on Zero CopyIts not that bad. iSCSI has an "built in" RDMA - that works as long as you don't loose an iSCSI header. The trouble with it is that it is not generic (and many of us including myself would love a generic mechanism). As for binding - if you do iSCSI hardware it does not matter but if you want plain vanilla TCP with a generic RDMA - nada - there is no such thing. Julo "Randall R. Stewart" <randall@stewart.chicago.il.us> on 05/12/2000 22:43:59 Please respond to "Randall R. Stewart" <randall@stewart.chicago.il.us> To: Stephen Byan <Stephen.Byan@quantum.com> cc: iSCSI <ips@ece.cmu.edu> Subject: Re: (iSCSI) A question on Zero Copy Stephen (and all others who have replied) Thanks for the confirmation.. I thought it was type (B)... See some comments below... Stephen Byan wrote: > > Randall R. Stewart [mailto:randall@stewart.chicago.il.us] wrote: > > > Does the iSCSI layer want: > > > > A) Plain Zero copy, where the upper layer (iSCSI) asks > > to read the next available "message" from the wire > > into a buffer passed to the transport by iSCSI? > > > > <OR> > > > > B) A directed Zero Copy, where the upper layer (iSCSI) asks > > to read a particular request to a specific buffer? > > I think most folks implementing iSCSI want class B zero copy, but it is > restricted to the case of solicited data. Commands and status can be class A > zero copy, or even just copied. > > I don't know what people are thinking about unsolicited data; it seems to me > that it must be buffered anonymously, and thence copied, but the > resource-poor environments with which I am familiar would opt not to support > unsolicited data at all. > > It's possible to imagine iSCSI implementations that use another kind of > zero-copy, where the iSCSI application simply lives with a scatter-gather > list of anonymous buffers allocated by the network stack. But I think it's > rather hard to implement iSCSI application code on top of the indirection of > scatter-gather lists. It's much easier to think about your [file system|disk > controller] cache blocks as named, contiguous regions of (possibly virtual) > memory, rather than a random collection of bits of anonymous buffers. I > think the anonymous buffer approach also has a memory utilization penalty, > and so is not too good in memory-constrained environments. So I vote for > class B zero-copy, which lets my application manage memory as named > contiguous buffers. > > I haven't the faintest idea how to achieve class B zero copy, without > putting the entire fast-path TCP processing and some of the iSCSI processing > into hardware state-machines running at wire-speed. > This was exactly my thoughts.. how does one achieve this without merging TCP and iSCSI together... since in order to get a class B, at any moment one must: A) Be able to tell what buffer a particular segment coming off the wire belongs with <and> B) Be able to always maintain the framing. Now with TCP I am faced with a stream of bytes. So unless you have some sort of option (the RDMA proposal) in the TCP header <OR> in the buffer being sent itself a direction as to what buffer address this goes with the TCP stack has no idea what buffer to shove the incoming segment in. In fact if you don't have the RDMA option you are stuck unless you totally merge TCP into iSCSI... since the TCP stack itself must become "iSCSI" aware... very bad in my view. Even in a SCTP stack, I don't see how this would work. You do have more flexibility with the streams and could do some sort of stream negotiation to say that stream N is going to supply data for this buffer.. but again there is no provision for the SCTP stack itself to do this in the API yet. We have no way of doing a "threaded blocking read of a stream number" which is what would be required. Now I know that this is not disallowed by rfc2960 but I don't know of anyones stack heading this way...nor did we put it in the sockets mapping draft... Hmm this is a very interesting problem. > Absent such wire-speed parsing of the headers, I think we're really talking > about a "copy-once" approach on receive, where the packets land in anonymous > buffers (possibly located on the ethernet PCI adapter), and then software > (possibly running on a processor located on the ethernet PCI adapter) parses > the IP, TCP, and iSCSI headers and then sets up a hardware DMA engine to > copy the payload to a buffer in main memory, and simultaneously perform the > checksum checking. Think of an Alteon Tigon ethernet chip on steriods, > running the TCP/IP fast-path code and some iSCSI application-specific code. > > I'd appreciate comments, critiques, and info on other approaches to the > problem :-) > > Regards, > -Steve > > Steve Byan > <stephen.byan@quantum.com> > Design Engineer > MS 1-3/E23 > 333 South Street > Shrewsbury, MA 01545 > (508)770-3414 > fax: (508)770-2604 -- Randall R. Stewart randall@stewart.chicago.il.us or rrs@cisco.com 815-342-5222 (cell) 815-477-2127 (work)
Home Last updated: Tue Sep 04 01:06:09 2001 6315 messages in chronological order |