iSCSI single control channel

To: ips@ece.cmu.edu
Subject: iSCSI single control channel
From: meth@il.ibm.com
Date: Fri, 18 Aug 2000 10:26:19 +0300
Content-Disposition: inline
Content-type: text/plain; charset=us-ascii
Sender: owner-ips@ece.cmu.edu

This is a repost of my proposal (from the end of June) for a single control
channel with multiple data channels, as per Julian's request that I repost
this memo. It simplifies the current iSCSI specification proposal in that
it
does away with the reference numbers and simplifies recovery.

- Kalman

-------------------------------------------------------------------------------------------------------------------

Proposal to support single Control Channel with multiple Data Channels
in the iSCSI protocol.

by Kalman Meth
27 June 2000

In our discussions on the iSCSI protocol, we came to the conclusion that
we needed to send data over multiple channels in order to make best use
of the available network resources. We also were inclined to
have all of the channels acting in a symmetric manner so as to simplify
the protocol by not having to deal differently with some channels.
This allows vendors to introduce uniform iSCSI NICs for all of the
network connections that will be exploited by iSCSI.

We decided on allowing commands to be sent over any of the multiple
connections, with the command's data and status being sent in the same
channel that was used to issue the command.
The use of multiple channels to pass commands introduced a complication
of servicing the commands on the receiving end in the original order
that the commands were issued. We had a further complication when one
of the connections failed; how do we determine which command got lost
on a broken connection, and what actions are required to recover from
the failed connection. The solution we found to these problems
(introducing a Command Reference Number and placing the commands back
in order on the receiver's end) introduced flow control problems,
such as maintaining a window on commands to ensure that we don't overrun
the reference count, and that we don't block up all of the channels
just because one channel failed and its lost command causes us to fill
up the command queue on the target (while we wait for the lost command
to arrive).

I would like us to go back and consider a variation of the model we
originally proposed with one Command Channel and multiple Data Channels.
Some ideas that came up during our discussions are included below and
also apply to the symmetric model.

Session establishment: as in existing draft.
Naming: as in existing draft with adjustments from design discussions.
security: as decided in design discussions.
(0) none
(1) challenge/response
(2) IPSec or SSL

Normal case:

An iSCSI session between and initiator and a target consists of a
number of TCP connections. Each TCP connection between initiator and
target requires an iSCSI login. The first established connection of a
session between initiator and target (numbered 0) is the Control
Connection (also called Control Channel).
Subsequent connections between the same initiator and target can be
added to an existing session upon request of the initiator during login.
These connections are numbered 1,2,3, etc, and are called Data
Connections (also called Data Channels).
An initiator may establish several sessions with the same target, each
session having its own Control Channel and its own set of Data Channels.

All SCSI commands and task management messages will go over the Control
Connection. Order is maintained within a single session by virtue of all
commands going through the same TCP connection.
The iSCSI packets for RTT and Data may go over any of the channels.
iSCSI Login must be performed on each of the connections.
iSCSI Ping may be performed over any of the connections.

It is recommended that large data transfers be performed on the Data
Channels (rather than the Control Channel) so as to ensure that the
Control Channel is always free. It is permissible, however, to
establish a single connection and perform all iSCSI operations on that
single channel.

On a READ or WRITE command, the initiator specifies on which channel it
expects to perform the data transfer. This gives the initiator and
target a chance to set up buffers for DMA ahead of time.
Once a data transfer for a particular SCSI command begins on a
particular Data Channel, all subsequent data that is transferred for the
same SCSI command is to be transferred over the same Data Channel.
On RTT, the target confirms on which channel it is expecting the data
transfer. An RTT request will be sent over the same channel as the
expected data transfer (as was specified by the initiator).
If the target decides (for whatever reason) that it wants to receive the
data transfer on another channel, it sends the RTT over the Control
Channel with an indication as to which Data Channel it wants to use.
It is understood that this may entail a performance
cost on the initiator's side to now move the data transfer to another
Data Channel (which may be another NIC, thus requiring DMA to be set
up all over again). A target will usually change the connection for
a data transfer only in case of some problem it has with the originally
specified connection (unresponsive connection, or couldn't handle
large amount of data on specified connection, etc).

Commands may be sent with immediate data (in the Control Channel) if the
immediate data is small (say less than 8K), thereby avoiding the need to
later match up the data with the corresponding command. A bit in the
iSCSI command header indicates that there is immediate data.
An initiator may also send unsolicited data (no RTT) over the Data
Channels, in case the initiator and target have agreed (during login
on the Control Channel) to not use RTT.

The initiator and target may renegotiate the use (or non-use) of RTT
between commands, using an iSCSI Text command.
The initiator sends the request to the target and does not send any
other commands to the target until the target has responded.
The change in using RTT will take affect with the command following the
response of the target.

The status of a READ command is sent with the last data packet,
thus allowing hardware implementations to perform a single interrupt
when the entire data transfer has completed.
Similarly, a flag in a data packet sent from initiator to target
indicates the last data buffer in an unsolicited WRITE operation.
If the initiator sends unsolicited data for a WRITE operation
(i.e. without an RTT) over one of the Data Channels, it is possible
that the data will arrive before the command arrived on the Control
Channel. It is also possible that the target will not have enough
buffers to receive the unsolicited data. The target has the option of
placing the unsolicited data in reserve buffers or of completely
discarding the data. If the target discards the data, the target will
later issue an RTT to instruct the initiator to resend the data.

Multiple iSCSI NICs:

One argument to support the symmetric model was to allow having
identical iSCSI NICs to handle all iSCSI connections. In the
symmetric model, since all channels look alike, all of the (identical)
NICs can be fully utilized.

We argue that even in the model with one Control connection and many
Data Connections that we can still utilize the NICs to their maximum.

The main operations to be implemented by iSCSI NICs will be to send
data packets and RTTs. Data Channels can be spread across these iSCSI
NICs. The less frequent iSCSI operations (and especially recovery)
can be performed in software in a device driver.
Note also that a Control Channel and a Data Channel can
go over the same wire (NIC) even if they are different TCP connections.
In order to handle additional iSCSI operations in hardware,
vendors can introduce fancier NICs that also handle some other iSCSI
operations.

A target may use one NIC to handle the Control channel from one
initiator, and another NIC to handle the Control channel from another
initiator. Thus, even if all NICs can handle the entire iSCSI set,
they can still be utilized to the maximum by using each NIC for the
Control Channel of a different session. Similarly, if an initiator has
devices on several targets, it can use each NIC to handle the Control
Connection of a different session.
An initiator can also open multiple sessions with the same tartet
using a different NIC for the Control Channel of the different sessions.

Recovery:

An initiator must hold on to data it has sent via a WRITE operation
until it has received the status for the corresponding command.
Even if the initiator sends immediate data (in the Control Channel) or
unsolicited data (in one of the Data Channels), the target may discard
the data in case it didn't have the resources to handle the data at that
instant. The target may then request that the data be resent with an
RTT.
A target need not keep a copy of the data buffers it has sent, if
such data can be regenerated from the storage device.
However, the target must keep around the status information until it has
been acknowledged by the initiator. The initiator sends Status Ack info
(a new iSCSI message type) over the Control Channel.
If strict ordering between commands is needed (such as reading and
writing of the same device) then the application must perform the
proper synchronization by not issuing the second command until it has
received the status of the first command (as in linked commands).

If it seems that a connection has stopped functioning, then either
the initiator or the target may issue an iSCSI Ping command to determine
if the connection is still alive. (A bit in the Ping header determines
which side initiated the ping operation.) If the Ping operation times
out, then it may be assumed that the connection is not functioning
properly. When a Ping operation fails, the connection should immediately
be closed.

Note: It is not required to support iSCSI level recovery.
It is sufficient for the initiator to report failure for the commands
that did not complete and let the upper layer protocol handle the
recovery.
In this case, all channels of the session should be closed, all data
structures should be cleaned up, and a new session may be established
between the initiator and target.

There is an advanced recovery mechanism that MAY be implemented by
the initiator and target, as described below.

Data that was sent over a failed Data Connection will have to be
resent over another Data Connection.
On a WRITE operation, the target will eventually issue an RTT over the
Control Connection to inform the initiator as to which other Data
Connection to use. (Is it OK to wait for the target to figure out that
the connection is down? Can the initiator somehow bring this to the
target's attention?)
On a READ operation, the initiator will indicate to the target which
data it wants resent from the failed data transfer. This is done
using an RTT (sent from the initiator to the target over the Control
Channel) to resend the data from some preivous READ operation.
(This is the only time an RTT is sent from the initiator to a target.)
(Since the status of that READ operation did not arrive at the
initiator and it was never acknowledged, the target will have kept
the relevant information about the corresponding command.)

If the Control Channel stops working properly (agin, determined by a
time out on an iSCSI Ping operation) then the initiator must know
which commands made it to the target and were not lost, which commands
were completed whose status got lost, and which commands never made it
to the target.

Upon setting up a new session (by establishing a Control Channel), the
initiator may specify whether it is in fact starting a new session or
taking over an existing session. When taking over an existing
session, the initiator must specify the identifiers of the session to
be taken over.
The target then stops transmitting on the old Control Channel, and
transfers all of the old session resources to the new Control Channel.

The target returns to the initiator the Initiator Task Tag of the last
command it received on the old Control Channel.

For each command that was sent to the target before the specified
Initiator Task Tag, the initiator queries (new iSCSI Query command)
about the status of that command. (The information about the status of
those commands will not have been discarded since the target never
received an Ack about them from the initiator.)
Some of the commands may have had incomplete data transfers (use special
iSCSI status code), and the target and initiator will re-issue RTTs to
recover the data from those commands. Once the initiator has received
(and acknowledged) the status of all pending commands, the initiator
sends an iSCSI Sync message to the target to inform it that they are
back in sync, and that all commands before the specified Initiator Task
Tag have been satisfactorily accounted for.

Prev by Date: RE: RE: Towards Consensus on TCP Connections
Next by Date: RE: FC/IP vs. iSCSI & Towards Consensus on TCP Connections
Prev by thread: RE: FC/IP vs. iSCSI
Next by thread: Re: iscsi single control channel
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:07:49 2001
6315 messages in chronological order