|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] single vs multiple channels for iSCSI commandsProposal to support single Control Channel with multiple Data Channels in the iSCSI protocol. by Kalman Meth 27 June 2000 In our discussions on the iSCSI protocol, we came to the conclusion that we needed to send data over multiple channels in order to make best use of the available network resources. We also were inclined to have all of the channels acting in a symmetric manner so as to simplify the protocol by not having to deal differently with some channels. This allows vendors to introduce uniform iSCSI NICs for all of the network connections that will be exploited by iSCSI. We decided on allowing commands to be sent over any of the multiple connections, with the command's data and status being sent in the same channel that was used to issue the command. The use of multiple channels to pass commands introduced a complication of servicing the commands on the receiving end in the original order that the commands were issued. We had a further complication when one of the connections failed; how do we determine which command got lost on a broken connection, and what actions are required to recover from the failed connection. The solution we found to these problems (introducing a Command Reference Number and placing the commands back in order on the receiver's end) introduced flow control problems, such as maintaining a window on commands to ensure that we don't overrun the reference count, and that we don't block up all of the channels just because one channel failed and its lost command causes us to fill up the command queue on the target (while we wait for the lost command to arrive). I would like us to go back and consider a variation of the model we originally proposed with one Command Channel and multiple Data Channels. Some ideas that came up during our discussions are included below and also apply to the symmetric model. Session establishment: as in existing draft. Naming: as in existing draft with adjustments from design discussions. security: as decided in design discussions. (0) none (1) challenge/response (2) IPSec or SSL Normal case: An iSCSI session between and initiator and a target consists of a number of TCP connections. Each TCP connection between initiator and target requires an iSCSI login. The first established connection of a session between initiator and target (numbered 0) is the Control Connection (also called Control Channel). Subsequent connections between the same initiator and target can be added to an existing session upon request of the initiator during login. These connections are numbered 1,2,3, etc, and are called Data Connections (also called Data Channels). An initiator may establish several sessions with the same target, each session having its own Control Channel and its own set of Data Channels. All SCSI commands and task management messages will go over the Control Connection. Order is maintained within a single session by virtue of all commands going through the same TCP connection. The iSCSI packets for RTT and Data may go over any of the channels. iSCSI Login must be performed on each of the connections. iSCSI Ping may be performed over any of the connections. It is recommended that large data transfers be performed on the Data Channels (rather than the Control Channel) so as to ensure that the Control Channel is always free. It is permissible, however, to establish a single connection and perform all iSCSI operations on that single channel. On a READ or WRITE command, the initiator specifies on which channel it expects to perform the data transfer. This gives the initiator and target a chance to set up buffers for DMA ahead of time. Once a data transfer for a particular SCSI command begins on a particular Data Channel, all subsequent data that is transferred for the same SCSI command is to be transferred over the same Data Channel. On RTT, the target confirms on which channel it is expecting the data transfer. An RTT request will be sent over the same channel as the expected data transfer (as was specified by the initiator). If the target decides (for whatever reason) that it wants to receive the data transfer on another channel, it sends the RTT over the Control Channel with an indication as to which Data Channel it wants to use. It is understood that this may entail a performance cost on the initiator's side to now move the data transfer to another Data Channel (which may be another NIC, thus requiring DMA to be set up all over again). A target will usually change the connection for a data transfer only in case of some problem it has with the originally specified connection (unresponsive connection, or couldn't handle large amount of data on specified connection, etc). Commands may be sent with immediate data (in the Control Channel) if the immediate data is small (say less than 8K), thereby avoiding the need to later match up the data with the corresponding command. A bit in the iSCSI command header indicates that there is immediate data. An initiator may also send unsolicited data (no RTT) over the Data Channels, in case the initiator and target have agreed (during login on the Control Channel) to not use RTT. The initiator and target may renegotiate the use (or non-use) of RTT between commands, using an iSCSI Text command. The initiator sends the request to the target and does not send any other commands to the target until the target has responded. The change in using RTT will take affect with the command following the response of the target. The status of a READ command is sent with the last data packet, thus allowing hardware implementations to perform a single interrupt when the entire data transfer has completed. Similarly, a flag in a data packet sent from initiator to target indicates the last data buffer in an unsolicited WRITE operation. If the initiator sends unsolicited data for a WRITE operation (i.e. without an RTT) over one of the Data Channels, it is possible that the data will arrive before the command arrived on the Control Channel. It is also possible that the target will not have enough buffers to receive the unsolicited data. The target has the option of placing the unsolicited data in reserve buffers or of completely discarding the data. If the target discards the data, the target will later issue an RTT to instruct the initiator to resend the data. Multiple iSCSI NICs: One argument to support the symmetric model was to allow having identical iSCSI NICs to handle all iSCSI connections. In the symmetric model, since all channels look alike, all of the (identical) NICs can be fully utilized. We argue that even in the model with one Control connection and many Data Connections that we can still utilize the NICs to their maximum. The main operations to be implemented by iSCSI NICs will be to send data packets and RTTs. Data Channels can be spread across these iSCSI NICs. The less frequent iSCSI operations (and especially recovery) can be performed in software in a device driver. Note also that a Control Channel and a Data Channel can go over the same wire (NIC) even if they are different TCP connections. In order to handle additional iSCSI operations in hardware, vendors can introduce fancier NICs that also handle some other iSCSI operations. A target may use one NIC to handle the Control channel from one initiator, and another NIC to handle the Control channel from another initiator. Thus, even if all NICs can handle the entire iSCSI set, they can still be utilized to the maximum by using each NIC for the Control Channel of a different session. Similarly, if an initiator has devices on several targets, it can use each NIC to handle the Control Connection of a different session. An initiator can also open multiple sessions with the same target using a different NIC for the Control Channel of the different sessions. Recovery: An initiator must hold on to data it has sent via a WRITE operation until it has received the status for the corresponding command. Even if the initiator sends immediate data (in the Control Channel) or unsolicited data (in one of the Data Channels), the target may discard the data in case it didn't have the resources to handle the data at that instant. The target may then request that the data be resent with an RTT. A target need not keep a copy of the data buffers it has sent, if such data can be regenerated from the storage device. However, the target must keep around the status information until it has been acknowledged by the initiator. The initiator sends Status Ack info (a new iSCSI message type) over the Control Channel. If strict ordering between commands is needed (such as reading and writing of the same device) then the application must perform the proper synchronization by not issuing the second command until it has received the status of the first command (as in linked commands). If it seems that a connection has stopped functioning, then either the initiator or the target may issue an iSCSI Ping command to determine if the connection is still alive. (A bit in the Ping header determines which side initiated the ping operation.) If the Ping operation times out, then it may be assumed that the connection is not functioning properly. When a Ping operation fails, the connection should immediately be closed. Note: It is not required to support iSCSI level recovery. It is sufficient for the initiator to report failure for the commands that did not complete and let the upper layer protocol handle the recovery. In this case, all channels of the session should be closed, all data structures should be cleaned up, and a new session may be established between the initiator and target. There is an advanced recovery mechanism that MAY be implemented by the initiator and target, as described below. Data that was sent over a failed Data Connection will have to be resent over another Data Connection. On a WRITE operation, the target will eventually issue an RTT over the Control Connection to inform the initiator as to which other Data Connection to use. (Is it OK to wait for the target to figure out that the connection is down? Can the initiator somehow bring this to the target's attention?) On a READ operation, the initiator will indicate to the target which data it wants resent from the failed data transfer. This is done using an RTT (sent from the initiator to the target over the Control Channel) to resend the data from some preivous READ operation. (This is the only time an RTT is sent from the initiator to a target.) (Since the status of that READ operation did not arrive at the initiator and it was never acknowledged, the target will have kept the relevant information about the corresponding command.) If the Control Channel stops working properly (agin, determined by a time out on an iSCSI Ping operation) then the initiator must know which commands made it to the target and were not lost, which commands were completed whose status got lost, and which commands never made it to the target. Upon setting up a new session (by establishing a Control Channel), the initiator may specify whether it is in fact starting a new session or taking over an existing session. When taking over an existing session, the initiator must specify the identifiers of the session to be taken over. The target then stops transmitting on the old Control Channel, and transfers all of the old session resources to the new Control Channel. The target returns to the initiator the Initiator Task Tag of the last command it received on the old Control Channel. For each command that was sent to the target before the specified Initiator Task Tag, the initiator queries (new iSCSI Query command) about the status of that command. (The information about the status of those commands will not have been discarded since the target never received an Ack about them from the initiator.) Some of the commands may have had incomplete data transfers (use special iSCSI status code), and the target and initiator will re-issue RTTs to recover the data from those commands. Once the initiator has received (and acknowledged) the status of all pending commands, the initiator sends an iSCSI Sync message to the target to inform it that they are back in sync, and that all commands before the specified Initiator Task Tag have been satisfactorily accounted for.
Home Last updated: Tue Sep 04 01:08:13 2001 6315 messages in chronological order |