RE: A Transport Protocol Without ACK

To: "'Douglas Otis'" <dotis@sanlight.net>, Jim McGrath <Jim.McGrath@quantum.com>, "'Randall Stewart'" <rrs@cisco.com>
Subject: RE: A Transport Protocol Without ACK
From: Jim McGrath <Jim.McGrath@quantum.com>
Date: Wed, 20 Sep 2000 17:59:38 -0700
Cc: "'Y P Cheng'" <ycheng@advansys.com>, "'Ips@Ece. Cmu. Edu'" <ips@ece.cmu.edu>
Content-Type: text/plain;charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu

Doug,

I agreed that FCP offers some instructive ideas.  I would like to decouple
the allocation of initial credits from the login process (as per a previous
message) and allow the target to really dynamically allocate them on a per
initiator basis.  

On the general BB credit model, the only real issue I have there is that in
FC the credits are for frames (in FC often 2K bytes) not bytes.  I agree
that commands can just be stored by the target like any other data, but
there is big difference in size between a command frame and a user data
frame.  We don't need byte level granularity, but keeping the "credit unit"
to something like 512 byes (or smaller) would allow for more efficient
target memory management at modest controller complexity.  Note you could
still send things like 2K byte payloads, you just end up using 4 512 byte
credits rather than a single frame credit.  It was the coupling of 1 credit
per frame, and then the need for large frames for efficient bus utilization
that got us into trouble.

In FC one objection to making a lot more smaller credits is the number of
primitive tokens you would have to send (since each mapped to one credit),
but here we need control packets anyway, so we can free up and use credits
in bunches rather than individually (similar to what is done in FCP).

In theory an initiator could send down multiple commands and then start
sending down data sort of randomly between the commands, creating potential
starvation issues.  But no initiator that I know of does anything like that.
I've never seen one that will send down some write data, jump to another
command and send data, and then go back to the first command (maybe someone
else has?).  As long as the amount of data you can send with credits is
smaller than the TCP window size, then you should never get starvation as
far as I can tell (am I missing something)?

Jim

 


-----Original Message-----
From: Douglas Otis [mailto:dotis@sanlight.net]
Sent: Wednesday, September 20, 2000 12:17 AM
To: Jim McGrath; 'Randall Stewart'
Cc: 'Y P Cheng'; 'Ips@Ece. Cmu. Edu'
Subject: RE: A Transport Protocol Without ACK


Jim,

I understand a desire to stick with what works.  Regardless of the IP
transport, the traffic to each LUN will require some other flow control
mechanism.  Yes, we could re-invent the wheel as it applies to SCSI, but if
you examine FC Class 3 FCP, you will see an appropriate flow control
mechanism in place.  It uses Buffer-to-Buffer credit tokens generated by
comma frame delimiters.  These frame delimiters are defined within the
FC-encapsulation documentation.  Simple, direct and easy.  See
http://search.ietf.org/internet-drafts/draft-otis-fc-sctp-ip-00.txt. To
facilitate processing frame delimiters within software, both could be
presented before the frame rather than as shown in the rough draft.  Perhaps
even a Null CRC option could be added if one trusts SCTP checksum for
software implementations.

Once flow control is in place, there is little need for extending command
tags, CRN, or anything associated with the LUN as these structures are then
independent of transport bandwidth.  I doubt there is a great benefit in
having more than 256 commands pending on a nexus.  Expanding any field only
makes converting to a drive interface state-full, difficult and far less
reliable.  With the FCP flow control mechanism, T10 does not need to
redefine SCSI for initiators that overwhelm the target. The target would
have adequate control of resources.

Should IP-SCSI be driven by controller design?  Caching, volume management,
reservations, and nearly every feature offered by a controller is
significantly reduced in value should the controller be placed next to the
drive.  If you are in a facility 35 miles from a location holding drives,
you may find 50 miles of fiber transversed creating some 800+ micro-seconds
of round-trip time simply due to the speed of light.  You may shudder to
think about any NIC buffer.  Where would you want the controller and where
would you want the drive?  The controller must remain on the client side of
the network.  As such, drive design should steer the IP-SCSI standard.  At
least with FCP, the drive manufacturers have already spoken.  Those making
controllers will just have to make more of them and develop controller
locking protocols should this controller be part of a remote cluster.

If you examine FCP documentation, you will find that you can send data with
the command as an option.  You can also send the response at the end of data
as an option.  Every vital feature used to justify tossing FCP structures
become moot.  Should just an 8M byte FIFO buffer be placed between an IP
agent and a FC agent, as much as 65 milli-seconds of latency can be created.
Merely this additional latency will greatly facilitate rate differences
between these two agents.  FCP flow control and burst limits could easily
finish the task.

You speak of TCP as a proven technology, but TCP is not being suggested for
IP-SCSI.  TCP with some other mechanism is used to solve ills created by a
persistent single byte stream.  This is not proven technology, nor likely to
function properly without major tweaking.  At least if you wish to have a
hand at creating a suitable API for multi-object-streams far and away more
suitable for SCSI, now is the time.  Perhaps either Randall Stewart's U-SCTP
or a stale frame timer should be added to prevent overlapping retry
mechanisms if this protocol is used as a bridge to FC.

As far as the configuration effort, convert these requirements into LDAP
structures.  This would allow a single database to manage all aspects of
configuration.  Stuffing this information across the transport only weakens
security.  A bad idea and makes deciding who manages difficult.  Networks
will always have a means to identify equipment in some binary fashion, and
LDAP and DHCP servers combine this information into meaningful structures
with meaningful names.  All values required for the various transport layers
would be derived from these standard servers.

As far as what to do with Stream 0- revision negotiations, FC-domain
mapping, SRC-DST filtering done in purely binary form would be the best
means at getting equipment to accept commands without a high overhead.  The
equipment does not care what the binary number represents.  As far as a
clever means of doing remote DNS, SCTP has that covered.  Again, this
information comes from an LDAP server accessed by the driver and not the SAM
interface or the SCSI transport layer.

Yes, there are many options within FC that should be excluded.  If FCP
structures can be used, perhaps while holding one's nose, they should be.
There are far too many benefits for doing so, and too few benefits for not.
In the end, a better product would have a common set of structures to speak
SAN.  If you wish to make round wheels out of square blocks, don't let me
stop you.  I think I see a set of wheels already.

Doug

> -----Original Message-----
> From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
> Jim McGrath
> Sent: Tuesday, September 19, 2000 4:05 PM
> To: 'Randall Stewart'; Jim McGrath
> Cc: 'Y P Cheng'; 'Ips@Ece. Cmu. Edu'
> Subject: RE: A Transport Protocol Without ACK
>
>
>
> Actually the burden of proof issues is why I suggest we look at
> some things
> that are actually being used today (since you don't have to guess how they
> behave).  That is one of TCP's great strengths, and a bit of a
> weakness for
> SCTP (no offense to SCTP supporters, but it certainly does not have a big
> and long "track record" yet, and so I can understand the concerns
> others may
> have as to whether things would work out as well in practice as they do in
> proposal).
>
> Jim
>
> PS Personally, I'm a big believer is copying stuff that works, making the
> minimum amount of required changes, and then doing a rapid but controlled
> deployment (I've been involved in a lot of those sorts of things
> in ATA and
> SCSI).  Having been involved in both these sorts of endevors and the
> opposite (big, clean sheet of paper efforts, like 1394 (no offense to
> 1394/Firewire supporters, but I was working on it a decade ago)),
> I know how
> easy it is to underestimate the work required by the latter, and to be
> turned off by the "inelegance" of the former.  For me, life has become too
> short - I'm willing to accept inelegance as the price for speed of
> deployment.
>
>
> -----Original Message-----
> From: Randall Stewart [mailto:rrs@cisco.com]
> Sent: Tuesday, September 19, 2000 4:38 AM
> To: Jim McGrath
> Cc: 'Y P Cheng'; 'Ips@Ece. Cmu. Edu'
> Subject: Re: A Transport Protocol Without ACK
>
>
> Jim:
>
> Any transport protocol proposal is ok. As long as it can be seen and
> reviewed. So far I have seen only two TCP and SCTP.
>
> Oh, a little side note, any transport protocol proposed MUST be able to
> show TCP like behavior in the face of congestion. And I think, IMHO, that
> this means  that if it is NOT using RFC2581 procedures it MUST show that
> it does backoff and share with TCP. It also has a HEAVY burden of proof to
> show this facility at least in my mind and I would think in the
> IESG's mind
> as well...
>
> R
>
>
> Jim McGrath wrote:
>
> > I would expand your search to include non standard protocols (i.e.
> > proprietary ones) as well if they offered something and were adequately
> > understood by the outside world.  We do that in storage quite a lot -
> > indeed, some standard protocols are direct descendants of what were once
> > proprietary protocols (e.g. ATA, the most widely used desktop disk
> > interface, and ESCON, a dominant mainframe class interface
> (both of which
> > originated from IBM proprietary technologies)).
> >
> > Jim
> >
> > -----Original Message-----
> > From: Y P Cheng [mailto:ycheng@advansys.com]
> > Sent: Monday, September 18, 2000 5:52 PM
> > To: 'Ips@Ece. Cmu. Edu'
> > Subject: RE: A Transport Protocol Without ACK
> >
> > From: randall@stewart.chicago.il.us
> > > I see no viable transport protocol here and I don't see this
> > > conversation of any use unless you get exact details AND point
> > > to a internet draft that defines EXACTLY how it works (or possibly
> > > some other standards document).
> >
> > Both I2O and VI are transport protocols which define the format of a
> request
> > to a transport service provider, i.e. an adapter card.  I2O is used but
> not
> > limited to deliver SCSI requests and VI is used for any payload
> including
> IP
> > packets.  VI is mapped into FC with the device headers between the FC
> header
> > and data payload.  VI can certainly be used for delivery of
> SCSI requests
> > too.  Both protocols require the service provider to have reliable
> delivery
> > and reception.  VI defines different QoS.
> >
> > > > I don't claim any credit about this transport layer protocol. Every
> > fibre
> > > > channel and Infiniband adapter designer knows about this protocol --
> > > > although there is no standard.  I am sure the TCP
> accelerator card is
> > doing
> > > > the same.  This protocol is a great alternative to the use of TCP/IP
> and
> > > > should be incorporated into iSCSI.
> > >
> > > No it is not. You are not offering an alternative yet..
> >
> > I did not imply iSCSI should use I2O or VI.  In fact, the
> purpose iSCSI is
> > to map SCSI requests into IP packets as well as to define the delivery .
> It
> > seems to me that the working group has set its mind on TCP/IP and is
> > believing this is the only solution.  The consensus seems if
> there is any
> > other solutions that address flow control and congestion, it
> would end up
> > like TCP/IP.  I am simply pointing out if we keep an iSCSI request as a
> > single atomic transaction without separating it into the
> > TCP/IP-stream-oriented Writes and Reads that each deals with a
> single DU,
> > then, the deadlock problem goes away.  While the work group thinks we
> should
> > take advantage the flow control and congestion management of
> TCP/IP, there
> > are alternatives known as BB-credit and EE-credit management.  The fibre
> > channel adapters make reliable delivery, lost packet detection, and
> > retransmission without TCP/IP.
> >
> > Randall, you are right, I did not spent time to provide the
> working group
> a
> > draft defining such transaction-oriented protocol.  All I have
> provided is
> > an idea that besides TCP/IP.  The designers for SCSI and fibre channel
> > adapters have solved the head-of-queue blocking, the congestion, and
> > retransmission problems.  The transaction-oriented WRITE-REQUEST and
> > READ-RESPONSE, in my humble opinion, allows us to implement
> iSCSI simpler
> > than that of WRITE and READ stream requests.  The performance cost of
> > requiring ACKs on every DU with size greater than MTU on a network with
> long
> > latency is very expensive..  By defining a greater ACK granularity is an
> > attempt to solve this performance problem.  If we do wish to
> ACK on every
> > DU, then, on a long latency network, we must have a method to stream the
> > PDUs to ensure the performance.  The method should not consume a large
> > amount of memory space.  One should never ignore the TCP/IP
> memory-to-memory
> > copy overhead when the backbone will be running at OC-192 speed in the
> near
> > future.  Finally, please don't ever ask two NIC cards to
> synchronize with
> > each other.  It is really hard to do as those of us in business of
> designing
> > NIC cards can testify.
> >
> > Y.P. Cheng, CTO, ConnectCom Solutions Corp.
>
Follow-Ups:
- RE: A Transport Protocol Without ACK
  - From: "Douglas Otis" <dotis@sanlight.net>
Prev by Date: RE: iSCSI: Session Partial Resolution Clarifications
Next by Date: RE: iSCSI: Flow Control
Prev by thread: Re: A Transport Protocol Without ACK
Next by thread: RE: A Transport Protocol Without ACK
Index(es):
- Date
- Thread
Home
Last updated: Tue Sep 04 01:07:09 2001
6315 messages in chronological order