RE: A Transport Protocol Without ACK

To: <randall@stewart.chicago.il.us>
Subject: RE: A Transport Protocol Without ACK
From: "Y P Cheng" <ycheng@advansys.com>
Date: Sun, 17 Sep 2000 15:45:50 -0700
Cc: "'Ips@Ece. Cmu. Edu'" <ips@ece.cmu.edu>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;charset="iso-8859-1"
Importance: Normal
In-Reply-To: <39C41167.8078A463@stewart.chicago.il.us>
Sender: owner-ips@ece.cmu.edu

(My apology for this long reply.  I hope it worthies your reading.)
> From: randall@stewart.chicago.il.us
> [mailto:randall@stewart.chicago.il.us]
> I am a bit confused by the above Y.P. you state " by the returning of
> status PDU."... Both SCTP and TCP will carry a piggyback
> ACK with that PDU, so you end up accomplishing the same thing. What are
> you trying to say that I am missing???

The piggybacked ACK saves extra PDUs but does not solve the buffer
requirement for long latency.  In my example, if we have 20-milliseconds of
round-trip time on a IP network with gigabit backbone, in order to keep the
data streaming on the net we must have 2MB of buffer just in case we need to
retransmit the data.  By the way, in SCSI read/write data transfer, the
receiver sends nothing back until status phase.  Therefore, there is nothing
to piggyback on.  The SCSI protocol for data transfer is basically
half-duplex, not full-duplex.  Sending ACKs requires extra PUDs.

> Y.P. please enumerate the protocols that have this property that also
> provide TCP friendly congestion control. If you could enumerate the
> exact
> protocols and pointers to the specifications I would be more than
> glad to have a look at these and see if I can support them. Making vague
> references to "not limiting itself to TCP/IP" does not do anything for
> me and I think nothing for the WG. We need specific transport protocols
> listed that are capable of transporting iSCSI AND have TCP friendly
> congestion control principles built into them...

I am not an expert in making a transport protocol proposal.  However, let me
use a bottom-up approach by saying how an iSCSI transport layer should work.
Other people can help in making it an IETF proposal. I participated in this
discussion with an intention to provide the working group information on the
latest NIC adapter technology so the iSCSI proposal can better serve the NIC
adapter industry as well as the community who uses TCP/IP. In this response,
I will address two topics:
  1) The inefficiency of using TCP/IP to implement iSCSI
  2) What we can do in the transport layer to overcome the inefficiency
     of iSCSI on TCP/IP (In here, I am stealing ideas from VI, TCP/RDMA, and
FCP.)

(Disclaimer: my apology in advance if my view on TCP/IP is incorrect herein.
After all, I am a career adapter designer.)  For iSCSI to use TCP/IP, it
uses SOCKET, CONNECT or BIND to first make a connection point which is a (IP
address, TCP port) pair.  The asymmetric model provides a second TCP port; a
multi-path to another node has a second IP address. After connecting -- with
one or more connection points and paths -- the iSCSI creates multiple PDUs:
command, data, and status.  A SCSI initiator uses a WRITE call to tell an IP
NIC to send the PDUs.  The iSCSI driver is aware that there could be
multiple NIC cards.  A SCSI target LISTEN to the incoming PDUs.  It may
listen to multiple NIC cards.  I will not repeat the queuing and blocking
problems of the iSCSI driver in dealing with multiple application software
with many TCP/IP ports, and the issues of connecting to multiple targets or
initiators.  We will address only the performance issue of the stream- and
connection-oriented delivery of TCP/IP.  As in the example of my previous
posting, to keep write data streaming on a 1 gigabit connection with 20
milliseconds round-trip latency time, the initiator must have 2000 1K
buffers hanging around for retransmitting lost data packets.  If it has 200
1K buffers allocated for a target, the initiator can only send 200K of data
in 2 milliseconds and wait for 18 milliseconds for the first ACK to come
back.  Therefore, it runs at 10% of the possible maximum throughput.  A
target uses RTT to control how much resources each initiator can consume.
However, it has no choice but to provide 2000 1K buffers to receive the
incoming data for maximum possible performance.  To get TCP/IP data, the
target uses READs to get data from the IP NIC cards.  The memory-to-memory
copy to process the TCP stack looking for a TCP port number in the IP packet
is the greatest culprit of all of the TCP/IP performance problem. Companies
like Alacritec builds special TCP adapter to solve the performance problem
to doing the port look up in the adapter.

The good news is the above performance problem has already been addressed by
VI and FCP implementation in the latest NIC adapters.  Here is my proposed
iSCSI transport layer protocol: A TRANSACTION ORIENTED WITH BULK
ACKNOWLEDGMENT protocol.

Instead of using READs and WRITEs for data streams for TCP/IP, A iSCSI
driver should send a SCSI request or response to a transport layer using
SEND-REQUEST and RECEIVE-RESPONSE MESSAGEs.  These message contains the IP
end-point connection, SCSI command bytes, and data buffer descriptors that
supplied by the application software.  Each message describes a transaction
EXCHANGE, which can have an exchange-ID. (iSCSI calls the Initiator Task
Tag, although a task can have multiple SCSI commands.)  The iSCSI driver
still use SOCKET, CONNECT, and BIND to create connections.

It is true using a total connectionless protocol like UDP to transmit 10
megabytes of data on a busy Internet, we will be forever trying to
retransmit due to the lost-frame error.  However, instead of sending an ACK
for every data frame, we can steal the ideal from fibre channel by breaking
down a transaction exchange into data sequences each with a collection of
data frames.  The receiver needs only to acknowledge a sequence which has a
unique sequence ID.  A sequence with lost data frame will be retransmitted.
Using sliding window, multiple sequences can be transmitted.  This is how we
keep data frames streaming on a network with long latency time.  The size of
data sequence is of course network dependent.

Having the data descriptors provided by application software for a
transaction layer is the greatest benefit of this proposal.  There is no
data buffering like TCP/IP.  The transport layer does not have to allocate a
huge buffer to keep data frames streaming on a network with long latency
delay.  It uses the buffers provided the application software.  It can
always retransmit a data frame because the application software must stay
around until the transaction exchange is complete.  In VI, the application
software allocates a memory segment, gives it a handle, and passes to a
remote node to allow remote DMA.  Therefore, the data descriptors of this
transport protocol can be simply a memory-handle for a memory segment
previously created.  TCP/RDMA is copying this idea.

Each transaction exchange is executed by a NIC driver atomically.  Hundred
or even thousands of SEND-REQUEST and RECEIVE-RESPONSE can be outstanding in
the driver.  After sending the SCSI command PDUs, the data and status PDUs
are handled on demand by the NIC driver.  There is no queue and deadlock
problem.  The detection of lost data frame is a function of the transport
layer which specifies the QoS (Quality of Service).

Flow control is done by EE-credit granting by a receiver so no one can
overflow its resources.  This is the same as the Max---RN discussed in
iSCSI. Congestion control is managed by alternative NIC or IP endpoints.
Both should be a part of the transport protocol.

I don't claim any credit about this transport layer protocol.  Every fibre
channel and Infiniband adapter designer knows about this protocol --
although there is no standard.  I am sure the TCP accelerator card is doing
the same.  This protocol is a great alternative to the use of TCP/IP and
should be incorporated into iSCSI.

Follow-Ups:
- Re: A Transport Protocol Without ACK
  - From: "Randall R. Stewart" <randall@stewart.chicago.il.us>

References:
- Re: A Transport Protocol Without ACK
  - From: "Randall R. Stewart" <randall@stewart.chicago.il.us>

Prev by Date: Re: iSCSI: 2.2.6. Naming & mapping
Next by Date: RE: A Transport Protocol Without ACK
Prev by thread: Re: A Transport Protocol Without ACK
Next by thread: Re: A Transport Protocol Without ACK
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:07:14 2001
6315 messages in chronological order