Re: (iSCSI) A question on Zero Copy

To: ips@ece.cmu.edu
Subject: Re: (iSCSI) A question on Zero Copy
From: julian_satran@il.ibm.com
Date: Wed, 6 Dec 2000 11:19:28 +0200
Content-Disposition: inline
Content-type: text/plain; charset=us-ascii
Sender: owner-ips@ece.cmu.edu

Its not that bad. iSCSI has an "built in" RDMA - that works as long as you
don't loose an iSCSI header. The trouble with it is that it is not generic
(and many of us including myself would love a generic mechanism). As for
binding - if you do iSCSI hardware it does not matter but if you want plain
vanilla TCP with a generic RDMA - nada - there is no such thing.

Julo

"Randall R. Stewart" <randall@stewart.chicago.il.us> on 05/12/2000 22:43:59

Please respond to "Randall R. Stewart" <randall@stewart.chicago.il.us>

To:   Stephen Byan <Stephen.Byan@quantum.com>
cc:   iSCSI <ips@ece.cmu.edu>
Subject:  Re: (iSCSI) A question on Zero Copy

Stephen (and all others who have replied)

Thanks for the confirmation.. I thought it was
type (B)... See some comments below...

Stephen Byan wrote:
>
> Randall R. Stewart [mailto:randall@stewart.chicago.il.us] wrote:
>
> > Does the iSCSI layer want:
> >
> > A) Plain Zero copy, where the upper layer (iSCSI) asks
> >    to read the next available "message" from the wire
> >    into a buffer passed to the transport by iSCSI?
> >
> > <OR>
> >
> > B) A directed Zero Copy, where the upper layer (iSCSI) asks
> >    to read a particular request to a specific buffer?
>
> I think most folks implementing iSCSI want class B zero copy, but it is
> restricted to the case of solicited data. Commands and status can be
class A
> zero copy, or even just copied.
>
> I don't know what people are thinking about unsolicited data; it seems to
me
> that it must be buffered anonymously, and thence copied, but the
> resource-poor environments with which I am familiar would opt not to
support
> unsolicited data at all.
>
> It's possible to imagine iSCSI implementations that use another kind of
> zero-copy, where the iSCSI application simply lives with a scatter-gather
> list of anonymous buffers allocated by the network stack. But I think
it's
> rather hard to implement iSCSI application code on top of the indirection
of
> scatter-gather lists. It's much easier to think about your [file
system|disk
> controller] cache blocks as named, contiguous regions of (possibly
virtual)
> memory, rather than a random collection of bits of anonymous buffers. I
> think the anonymous buffer approach also has a memory utilization
penalty,
> and so is not too good in memory-constrained environments. So I vote for
> class B zero-copy, which lets my application manage memory as named
> contiguous buffers.
>
> I haven't the faintest idea how to achieve class B zero copy, without
> putting the entire fast-path TCP processing and some of the iSCSI
processing
> into hardware state-machines running at wire-speed.
>

This was exactly my thoughts.. how does one achieve this without merging
TCP and iSCSI together... since in order to get a class B, at any moment
one must:

A) Be able to tell what buffer a particular segment coming off
   the wire belongs with
<and>

B) Be able to always maintain the framing.

Now with TCP I am faced with a stream of bytes. So unless you
have some sort of option (the RDMA proposal) in the TCP
header <OR> in the buffer being sent itself a direction as
to what buffer address this goes with the TCP stack has no
idea what buffer to shove the incoming segment in. In fact if
you don't have the RDMA option you are stuck unless you totally
merge TCP into iSCSI... since the TCP stack itself must
become "iSCSI" aware... very bad in my view.

Even in a SCTP stack, I don't see how this would work. You do
have more flexibility with the streams and could do some sort
of stream negotiation to say that stream N is going to supply
data for this buffer.. but again there is no provision for the
SCTP stack itself to do this in the API yet. We have no way
of doing a "threaded blocking read of a stream number" which
is what would be required. Now I know that this is not disallowed
by rfc2960 but I don't know of anyones stack heading this way...nor
did we put it in the sockets mapping draft...

Hmm this is a very interesting problem.

> Absent such wire-speed parsing of the headers, I think we're really
talking
> about a "copy-once" approach on receive, where the packets land in
anonymous
> buffers (possibly located on the ethernet PCI adapter), and then software
> (possibly running on a processor located on the ethernet PCI adapter)
parses
> the IP, TCP, and iSCSI headers and then sets up a hardware DMA engine to
> copy the payload to a buffer in main memory, and simultaneously perform
the
> checksum checking. Think of an Alteon Tigon ethernet chip on steriods,
> running the TCP/IP fast-path code and some iSCSI application-specific
code.
>
> I'd appreciate comments, critiques, and info on other approaches to the
> problem :-)
>
> Regards,
> -Steve
>
> Steve Byan
> <stephen.byan@quantum.com>
> Design Engineer
> MS 1-3/E23
> 333 South Street
> Shrewsbury, MA 01545
> (508)770-3414
> fax: (508)770-2604

--
Randall R. Stewart
randall@stewart.chicago.il.us or rrs@cisco.com
815-342-5222 (cell) 815-477-2127 (work)

Prev by Date: Re: Some Thoughts on Digests
Next by Date: RE: iFCP vs FCIP
Prev by thread: Re: (iSCSI) A question on Zero Copy
Next by thread: digests
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:06:09 2001
6315 messages in chronological order