|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] a working version of the draftDear colleagues, Except for the stuff that will be coming from Paul and Luciano I have now (attached) a working version of the draft - that is supposed to contain - in broad terms - the thinks we worked on in Haifa: - an ordering scheme for the session for command AND status delivery - a recovery scheme - a component structure easier to parse But bear with me! I think I will spend the better part of my time until next Friday polishing it before delivery. I just wanted you all to have it for tomorrows call of the design team. Tomorrow we will probably want to discuss both the requirement document and the draft in detail. An administrative issue - write only to one of the lists (preferably the ips or iSCSI?). With David Nagle's help it is up and collecting mail from both ips and scsi-tcp (many of our mail items appear twice). And let's all of us wish good luck to the new chairmen. Julo --------------------------- Internet-Draft J. Satran <draft-satran-iSCSI-03.txt> D. Smith Expires December 28, 2000 K. Meth IBM C. Sapuntzakis Cisco Systems Randy Haagens Hewlett-Packard Co. Efri Zeidner SANGate Paul Von Stamwitz Adaptec Luciano Dalle Ore Quantum June 28, 2000 iSCSI (Internet SCSI) Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are work- ing documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also dis- tribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet- Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 0.0.1. Acknowledgements A large group of people contributed through their review, comments and valuable insights to the creation of this document - too many to mention them all. Nevertheless, we are grateful to all of them. We are especially grateful to those that found the time and pati- ence to participate in our weekly phone conferences and intermedi- ate meetings in Almaden and Haifa and thus helped shape this docu- ment: Matt Wakeley (Agilent), Jim Hafner, John Hufferd, Prasenjit Sarkar, Meir Toledano, John Dowdy, Steve Legg, Alain Azagury (IBM), Dave Nagle (CMU), David Black (EMC), John Matze (Veritas), Mark Bakke, Steve DeGroote, Mark Shrandt (NuSpeed), Gabi Hecht Satran, Smith, Sapuntzakis, Meth [Page 1] iSCSI June 2000 (Gadzoox), Robert Snively (Brocade), Nelson Nachum (StorAge) Satran, Smith, Sapuntzakis, Meth [Page 2] iSCSI June 2000 Table of Contents 1. Abstract 2. Overview 2.1. SCSI Concepts 2.2. iSCSI Concepts & Functional Overview 2.3. iSCSI Login 2.4. iSCSI Full Feature Phase 2.5. iSCSI Connection Termination 2.6. Naming 3. Message Formats 3.1. Template Header 3.2. SCSI Command 3.3. SCSI Response 3.4. Asynchronous Event 3.5. SCSI Task Management Message 3.6. SCSI Task Management Response 3.7. Ready To Transfer (RTT) 3.8. SCSI Data 3.9. Text Command 3.10. Text Response 3.11. Login Command 3.12. Login Response 3.13. Ping Command 3.14. Ping Response 3.15. Third Party Commands 3.16. Opcode Not Understood 4. Error Handling iSCSI 5. Notes to Implementors 5.1. Small TCP Segments 5.2. Multiple Network Adapters 5.3. Autosense 5.4. TCP RDMA option 5.5. Data Connections Options 6. Security Considerations 6.1. Data Integrity 6.2. Login Process 6.3. IANA Considerations 7. Authors' Addresses 8. References and Bibliography 9. Appendix A - Examples 9.1. Read operation example 9.2. Write operation example 10. Appendix B - Login/Text keys Satran, Smith, Sapuntzakis, Meth [Page 3] iSCSI June 2000 1. Abstract The Small Computer Systems Interface (SCSI) is a popular family of protocols for communicating with I/O devices, especially storage devices. This memo describes a transport protocol for SCSI that operates on top of TCP. The iSCSI protocol aims to be fully com- pliant with the requirements laid out in the SCSI Architecture Model - 2 [SAM2] document. 2. Overview 2.1. SCSI Concepts The endpoint of most SCSI commands is a "logical unit" (LUN). Exam- ples of logical units include hard drives, tape drives, CD and DVD drives, printers and processors. Within the logical unit the abstract entity that executes the SCSI commands is named the device-server. A "target" is a collection of logical units, in general of the same kind, and is directly addressable on the net- work. In large installations a target is known also as a "control unit". The target corresponds to the server in the abstract SAM client-server model. An "initiator" creates and sends SCSI com- mands to the target. The initiator corresponds to the client in the abstract SAM client-server model. A "task" is a linked set of SCSI commands. Some LUNs support multiple pending (queued) tasks. The target uses a "task tag" to distinguish between tasks. Only one command in a task can be outstanding at any given time. A SCSI command results in an optional data phase and a response phase. In the data phase, information travels either from the initiator to the target, as in a WRITE command, or from target to initiator, as in a READ command. In the response phase, the target returns the final status of the operation, including any errors. A response terminates a SCSI command. 2.2. iSCSI Concepts & Functional Overview The following conceptual layering model is used in this document to specify Initiator and target actions and how those relate to transmitted and received Protocol Data Units: - SCSI layer builds/receives SCSI CDB (Command Data Blocks) and relays/receives them with the remaining command execute parameters (cf. SAM-2) to/from the - iSCSI layer that is building/receiving iSCSI PDUs and relaying/receiving them to/from - one or more TCP connections that form an initiator-target "session" Communication between initiator and target occurs over one or more TCP connections. The TCP connections are used for sending control messages, SCSI commands, parameters and data within iSCSI protocol Satran, Smith, Sapuntzakis, Meth [Page 4] iSCSI June 2000 data units (iSCSI PDU) The group of TCP connections linking an ini- tiator with a target form a session (loosely equivalent to a SCSI nexus); a session is defined by a session ID (composed of a initia- tor part and a target part). TCP connections can be added and removed from a session. iSCSI supports ordered command delivery within a session and limited command and data recovery. All SCSI commands presented to iSCSI get a "command reference number" and this number can be used by a receiving target for ordered delivery. A sliding window mechanism is used to limit the number of outstand- ing commands. For descriptive purposes it is assumed that the iSCSI layer is implementing the sliding window mechanism. 2.3. iSCSI Login The purpose of iSCSI login is to enable a TCP connection for iSCSI use, authenticate the parties, authorize the initiator to send SCSI commands and mark the connection as belonging to a iSCSI session. A session is used to identify to a target all the connections with a given initiator. The targets listen on a well-known TCP port for incoming connections. The initiator begins the login process by connecting to that well-known TCP port. As part of the login pro- cess, the initiator and target MAY wish to authenticate each other. This can occur in many different ways. For example, the endpoints may wish to check the IP address of the other party. If the TCP connection uses transport layer security [TLS], certificates may be used to identify the endpoints. Also, iSCSI includes commands for identifying the initiator and passing an authenticator to the tar- get (see Appendix B). Once suitable authentication has occurred, the target MAY authorize the initiator to send SCSI commands. How the target chooses to authorize an initiator is beyond the scope of this document. The target indicates a successful authentication and authorization by sending a login response with "accept login". The login message includes a session ID - composed with an initia- tor part ISID and a target part TSID. For a new session the TSID is null. As part of the response the target will generate a TSID. Session specific parameters can be specified only for the first login of a session (TSID null)(e.g the maximum number of connec- tions that can be used for this session). Connection specific parameters (if any) can be specified for any login. Thus a session is operational once it has at least one connection and a pending login can't affect a whole session. After authentication and authorization, other parameters may be negotiated using the highly extensible Text Command message that allows arbitrary key:value pairs to be passed. Any message sent on a TCP connection before this connection gets into full feature phase at the initiator should be rejected by the initiator. A message reaching a target on a TCP connection before the full feature phase will be reject with an iSCSI check condition bit. Satran, Smith, Sapuntzakis, Meth [Page 5] iSCSI June 2000 2.4. iSCSI Full Feature Phase Once the initiator is authorized to do so, the iSCSI session is in iSCSI full feature phase. The initiator may send SCSI commands and data to the various LUNs on the target by mapping them in iSCSI messages that go over the established iSCSI session. For SCSI com- mands that require data and/or parameter transfer, the (optional)data and the status for a command must be sent over the same TCP connection that was used to deliver the SCSI command (con- nection allegiance). Thus if an initiator issues a READ command, the target must send the requested data followed by the status to the initiator over the same TCP connection that was used to deliver the SCSI command. If an initiator issues a WRITE command, the ini- tiator must send the data for that command and the target must return the status over the same TCP connection that was used to deliver the SCSI command. During iSCSI Full Feature Phase, the initiator and target may interleave unrelated SCSI commands, their SCSI Data and responses, over the session. Outgoing SCSI data (initiator to target - user data or command parameters)is sent as either unsolicited data or solicited data. Unsolicited data can be part of an iSCSI command PDU ("immediate data") or an iSCSI data PDU. Solicited data are sent in response to Ready To Transfer PDUs. Targets are operating in either solicited (RTT) data mode or unsolicited (non RTT) data mode. An initiator must always honor an RTT data request. It is considered an error for an initiator to send unsolicited data PDUs to a target operating in RTT mode (only solicited data). By default, immediate data is limited to 64Kbytes and an initiator is allowed to send immediate data (subject to lim- itations specified somewhere else in this document)even to targets working in RTT mode. An initiator may request, at login, to send immediate data of any size and a target may indicate the size of immediate data blocks it is ready to accept in its response. A target is allowed to silently discard data and request retransmis- sion through RTT. Initiators will not perform any scoreboarding for data and the residual count calculation is to be performed by the targets. Incoming data is allways solicited. However an ini- tiator will be able to request retransmission of all or part of the target data. SCSI Data packets are matched to their corresponding SCSI commands by using Tags that are specified in the protocol. Initiator tags for pending commands are unique initiator-wide for a session. Target tags for pending commands are unique target-wide for the session. Although the above mechanisms are designed to accomplish efficient data delivery and a large degree of control over the data flow it is recognized that some specific sequences involving ordered execution and a mix of solicited and immediate data can result in deadlocks. It is for this reason that discarding data by a target is considered a legitimate action. Examples of such sequences are presented in appendix C together recovery Satran, Smith, Sapuntzakis, Meth [Page 6] iSCSI June 2000 scenarios. Outgoing commands are numbered by iSCSI (CmdRN) and response PDUs (target to initiator) will continuously update the initiator about the maximum command number that can be sent(sliding window). Each iSCSI session to a target is treated as if it ori- ginated from a different initiator. 2.5. iSCSI Connection Termination Connection termination is assumed to be an exceptional event. Graceful TCP connection shutdowns are done by sending TCP FINs. Graceful connection shutdowns MUST only occur when there are no outstanding tasks that have allegiance to the connection. A target SHOULD respond rapidly to a FIN from the initiator by closing its half of the connection as soon as it has finished all outstanding tasks that have allegiance to the connection. Closing a connection that has outstanding tasks may require recovery actions and will Be described elsewhere in this document. 2.6. Naming Targets are named using an URL type name of the format: scsi://<domain-name>[/modifier] The name used to connect will be optionally included in the login in order to enable the target to present different views. This is the Target Acquired Name (TAN). We will not attempt to define which components of the name will participate in the name resolu- tion process and which ones will be used only for "view" defini- tion. The syntactic sugar included might be used to introduce structure for management purposes but has no specific significance for this standard. Example: scsi://diskfarm1.acme.com scsi://computingcenter.acme.com/peripherals/diskfarm1 When a target has to act as an initiator for a third party command it will use the TAN during login as required by the authentication mechanism. A domain name that contains exactly four numbers separated by dots (.), where each number is in the range 0 through 255, will be interpreted as an IPv4 address. Examples: 10.0.0.1/tapefarm1 10.0.0.2 Satran, Smith, Sapuntzakis, Meth [Page 7] iSCSI June 2000 3. Message Formats All multi-byte integers specified in formats defined in this docu- ment are to be represented in network byte order (i.e., big endian). 3.1. Template Header and Opcodes All iSCSI messages and responses have a header of the same length (40 bytes). Additional data may be added, as necessary, beginning with byte 40. The fields of Opcode and Length appear in all message and response headers. The other most commonly used fields are Ini- tiator Task Tag, Logical Unit Number, and Flags, which, when used, always appear in the same location of the header. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode | Opcode-specific fields | +---------------+---------------+---------------+---------------+ 4| Length of Data (after 40 byte Header) | +---------------+---------------+---------------+---------------+ 8| LUN or Opcode-specific fields | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20/ Opcode-specific fields / +/ / +---------------+---------------+---------------+---------------+ 40 3.1.1. Opcode The Opcode indicates which iSCSI type of message or response is encapsulated by the header. Valid opcodes for messages (sent by initiator to target) are: 0x00 Ping Command (from initiator to target) 0x01 SCSI Command (encapsulates a SCSI Command Descriptor Block) 0x02 SCSI Task Management Message 0x03 Login Command 0x04 Text Command 0x05 SCSI Data (for WRITE operation) Valid opcodes for responses (sent by target to initiator) are: Satran, Smith, Sapuntzakis, Meth [Page 8] iSCSI June 2000 0x80 Ping Response (from target to initiator) 0x81 SCSI Response (contains SCSI status and possibly sense information or other response information) 0x82 SCSI Task Management Response 0x83 Login Response 0x84 Text Response 0x85 SCSI Data (for READ operation) 0x86 Ready To Transfer (RTT - sent by target to initiator when it is ready to receive data from initiator) 0x87 Asynchronous Event (sent by target to initiator to indicate certain special conditions) 0x88 Opcode Not Understood 0x89 Open Data Connections Response (optional) 3.1.2. Length The Length field indicates the number of bytes, beyond the first 40 bytes, that are being sent together with this message header. It is anticipated that most iSCSI messages and responses (not counting data transfer messages) will not need more than the 40 byte header, and hence the Length field will contain the value 0. It is expected that larger than 16 byte CDBs and parameter data will fol- low the header. 3.1.3. LUN The LUN specifies the Logical Unit for which the command is tar- geted. If the command does not relate to a Logical Unit, this field is either ignored or may be used for some other purpose. According to [SAM2], a Logical Unit Number can take up to a 64-bit field that identifies the Logical Unit within a target device. The exact format of this field can be found in the [SAM2] document. 3.1.4. Initiator Task Tag The initiator assigns a Task Id (or tag) to each SCSI task that it issues. (Recall that a task is a linked set of SCSI commands.) This Tag is a initiator-wide unique identifier that can be used to uniquely identify the Task. 3.1.5. Opcode-specific fields These field have different meanings for different messages. Satran, Smith, Sapuntzakis, Meth [Page 9] iSCSI June 2000 3.2. SCSI Command Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x01) |I|R|A|Rsv|ATTR | CmdRN +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Expected Data Transfer Length | +---------------+---------------+---------------+---------------+ 24| SCSI Command Descriptor Block (CDB) | + + 28| | + + 32| | + + 36| | +---------------+---------------+---------------+---------------+ 40/ Additional Data (Command Dependent) / +/ / +---------------+---------------+---------------+---------------+ 3.2.1. Flags. The Flags field for a SCSI Command consists of on byte. Byte 2 b0 (I) Immediate Data from initiator to target (write/control). b1 (r) set when data is expected to flow from target to ini- tiator (read). b2 (A) set to turn off Autosense for this command (see [SAM2]). b3-4 Reserved (should be 0) b5-7 used to indicate Task Attributes. Autosense refers to the automatic return of sense data to the ini- tiator in case a command did not complete successfully. If autosense is turned off, the initiator must explicitly request that Satran, Smith, Sapuntzakis, Meth [Page 10] iSCSI June 2000 sense data be sent to it after some command has completed with a CHECK CONDITION status. 3.2.2. Task Attributes The Task Attribute field (ATTR) can have one of the following integer values (see [SAM2] for details): 0 Untagged 1 Simple 2 Ordered 3 Head of Queue 4 ACA 3.2.3. Command Reference Number (CmdRN) The Command Reference Number (CmdRN) is provided by the initiator to assist in performing ordered delivery for iSCSI commands. CmdRN is reflecting the value of a 16 bit counter maintained by the ini- tiator and increased by 1 for every command received by the iSCSI delivery mechanism. The counter is set to an initial value at ses- sion initiation (default is 0) and when sending target resets (0). 3.2.4. Expected Data Transfer Length The Expected Data Transfer Length field states the number of bytes of data that the initiator expects will be sent for this (READ or WRITE) SCSI operation in SCSI Data packets. For a WRITE operation, the initiator uses this field to specify the number of bytes of data it expects to transfer for this operation (not counting data headers). For a READ operation, the initiator uses this field to specify the number of bytes of data it expects the target to transfer to the initiator (not counting data headers). If no data will be transferred in SCSI Data packets for this SCSI operation, this field should be set to 0. Upon completion of a data transfer, the target will inform the ini- tiator of how many bytes were actually processed (sent or received) by the target. 3.2.5. SCSI Command Descriptor Block (CDB) There are 16 bytes in the CDB field, designed to accommodate the largest currently defined CDB. If, in the future, larger CDBs are allowed, the spill-over of the CDB may extend beyond the 40-byte. Satran, Smith, Sapuntzakis, Meth [Page 11] iSCSI June 2000 3.2.6. Command-Data Some SCSI commands require additional parameter data to accompany the SCSI command. This data may be placed beyond the 40-byte boun- dary of the iSCSI header. Alternatively user data can be placed in the the same PDU (in both cases we talk about immediate data). The Length field is set to the length of this data beyond the 40-byte header (i.e. includes the CDB extension if present). The CDB length is: Length + 16 - I*ExpectedDataTransferLength Satran, Smith, Sapuntzakis, Meth [Page 12] iSCSI June 2000 3.3. SCSI Response Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x81) | Rsvd(0) |O|U| MaxCmdRN | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Reserved (0) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Residual Count | +---------------+---------------+---------------+---------------+ 24| Command Status|iSCSI Status | StatRN | +---------------+---------------+---------------+---------------+ 28/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 40/ Response or Sense Data (optional) / +/ / +---------------+---------------+---------------+---------------+ 3.3.1. Flags The SCSI Response has its own set of flags, that differs from the flags for a SCSI Command. Byte 2 b0 (U) set for Residual Underflow. In this case, the Resi- dual Count indicates how many bytes were not transferred out of those expected to be transferred. b1 (O) set for Residual Overflow. In this case, the Residual Count indicates how many bytes could not be transferred because the initiator's Expected Data Transfer Length was too small. b2-7 not used (should be set to 0). Bits 0 and 1 are mutually exclusive. 3.3.2. MaxCmdRN Indicates the maximum CmdRN the initiator should send. It will set an internal limit register. The initiator will refrain from sending Satran, Smith, Sapuntzakis, Meth [Page 13] iSCSI June 2000 commands numbered past MaxCmdRN (considering also wrap-around). Attention should be paid to the fact that response PDUs can arrive in "wrong order". The internal limit register can only be advanced by incoming responses (considering also wraparounds). It is assumed that any target will accept less than 64k outstanding com- mands. 3.3.3. Residual Count The Residual Count field is valid only in case either the Residual Underflow bit or Residual Overflow bit is set. If neither bit is set, the Residual Count field should be 0. If the Residual Under- flow bit is set, the Residual Count indicates how many bytes were not transferred out of those expected to be transferred. If the Residual Overflow bit is set, the Residual Count indicates how many bytes could not be transferred because the initiator's Expected Data Transfer Length was too small. 3.3.4. Command Status The Command Status field is used to report the SCSI status of the command (as specified in [SAM2]). 3.3.5. iSCSI Status The iSCSI Status field is used to report the status of the command before it was sent by the target to the LUN. The values are given below. 0 Good status 1 iSCSI check If the iSCSI field is not 0 the command status will indicate CHECK CONDITION 3.3.6. Response or Sense Data If Autosense was not disabled in the originating CDB and the Com- mand Status was CHECK CONDITION (0x02), then the Response Data field will contain sense data for the failed command. Some sense codes will relate to iSCSI check conditions (e.g. excessive number of outstanding commands, immediate data blocks too large etc.). If the Command Status is Good (0x00) then the Response Data field will contain data from the data phase of the CDB. The Length parameter specifies the number of bytes in this field. If no error occurred, and no data is needed for the response to the SCSI Command the Length field is 0. Note that if the Command Status was CHECK CON- DITION but Autosense was disabled, then sense data must be Satran, Smith, Sapuntzakis, Meth [Page 14] iSCSI June 2000 explicitly requested by the initiator with a new SCSI command. 3.3.7. StatRN - Status Reference Number StatRN is a reference number that the target iSCSI layer generates whenever it issues a response by incrementing an internal counter. A gap in StatRN indicates a lost status (possible due to connection failure) and be recovered by reissuing the outstanding command with the original TaskID and CmdRN. Satran, Smith, Sapuntzakis, Meth [Page 15] iSCSI June 2000 3.4. Asynchronous Event An Asynchronous Event may be sent from the target to the initiator without corresponding to a particular command. The target specifies the status for the event and sense data. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x87) | Reserved (0) | MaxCmdRN | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Reserved (0) | + + 20| | +---------------+---------------+---------------+---------------+ 24| Command Status|iSCSI Status| Reserved (0) | +---------------+---------------+---------------+---------------+ 28|Event Indicator| Reserved (0) | +---------------+---------------+---------------+---------------+ 32| Reserved (0) | + + 36| | +---------------+---------------+---------------+---------------+ 40/ Sense Data / +/ / +---------------+---------------+---------------+---------------+ 3.4.1. iSCSI Status Some Asynchronous Events are strictly related to iSCSI while others are related to SAM-2. The codes returned for iSCSI Asynchronous Events are: 2 Target is being reset. 3.4.2. Event Indicator The following values are defined. (See [SAM2] for details.) 1 An error condition was encountered after command Satran, Smith, Sapuntzakis, Meth [Page 16] iSCSI June 2000 completion. 2 A newly initialized device is available. 3 Some other type of unit attention condition has occurred. 4 An asynchronous event has occurred. Sense Data accompanying the report identifies the condition. The Length parameter is set to the length of the Sense Data. 3.4.3. MaxCmdRN - inform about this value other initiators after a tar- get Reset Satran, Smith, Sapuntzakis, Meth [Page 17] iSCSI June 2000 3.5. SCSI Task Management Message Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x02) | Msg indicator | Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 40 3.5.1. Msg Indicator The Task Management functions provide an initiator with a way to explicitly control the execution of one or more Tasks. The Task Management functions are summarized as follows (for a more detailed description see the [SAM2] document): 1 Abort Task---aborts the task identified by the Task Tag field. 2 Abort Task Set---aborts all Tasks issued by this initia- tor on the Logical Unit. .ti -5 3 Clear ACA---clears the Auto Contingent Allegiance condition. 4 Clear Task Set---Aborts all Tasks (from all initiators) for the Logical Unit. 5 Logical Unit Reset. 6 Target Reset. For the functions above except <Target Reset>, a SCSI Task Manage- ment Response is returned, using the Initiator Task Tag to identify the operation for which it is responding. For the <Target Reset> function, the target cancels all pending operations. The target may send an Asynchronous Event to all attached initiators notifying them that the target is being reset. The target then closes all of its TCP connections to all initiators (all sessions are ter- minated). Satran, Smith, Sapuntzakis, Meth [Page 18] iSCSI June 2000 3.6. SCSI Task Management Response Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x82) | Msg indicator | Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Reserved (0) | +---------------+---------------+---------------+---------------+ 24| Response | Reserved (0) | +---------------+---------------+---------------+---------------+ 28/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 40 For the functions <Abort Task, Abort Task Set, Clear ACA, Clear Task Set, Logical Unit reset>, the target performs the requested Task Management function and sends a SCSI Task Management Response back to the initiator. The target includes all of the information the initiator provided in the SCSI Task Management Message, so the initiator can know exactly which SCSI Task Management Message was serviced. In addition, the target provides a Response indication which may take on the following values: 0 Function Complete 1 Function Rejected .RE .RE For the <Target Reset> func- tion, the target cancels all pending operations. The tar- get may send an Asynchronous Event to all attached ini- tiators notifying them that the target is being reset. The target then closes all of its TCP connections to all initiators (terminates all sessions). 3.6.1. MaxCmdRN - maximum CmdRN the target will accept Satran, Smith, Sapuntzakis, Meth [Page 19] iSCSI June 2000 3.7. Ready To Transfer (RTT) When an initiator has submitted a SCSI Command with data passing from the initiator to the target (WRITE), the target may specify which blocks of data it is ready to receive. In general, the target may request that the data blocks be delivered in whatever order is convenient for the target at that particular instant. This information is passed from the target to the initiator in the Ready To Transfer (RTT) message. In order to allow write operations without RTT, the initiator and target must have agreed to do so by both sending the AllowNoRTT:yes key-pair attribute to each other (either during Login or through the Text Command/Response mechanism). Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x86) | Reserved (0) | MaxCmdRN | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Reserved (0) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Desired Data Transfer Length | +---------------+---------------+---------------+---------------+ 24| Data Offset | +---------------+---------------+---------------+---------------+ 28| Target Transfer Tag | +---------------+---------------+---------------+---------------+ 32| Reserved (0) | + + 36| | +---------------+---------------+---------------+---------------+ 40 3.7.1. MaxCmdRN - maximum CmdRN the target will accept 3.7.2. Desired Data Transfer Length and Data Offset The target specifies how many bytes it wants the initia- tor to send as a result of this RTT message. The target Satran, Smith, Sapuntzakis, Meth [Page 20] iSCSI June 2000 may request the data from the initiator in several chunks, not necessarily in the original order of the data. The target, therefore, also specifies a Data Offset indicating the point at which the data transfer should begin, relative to the beginning of the total data transfer. 3.7.3. Target Transfer Tag The target assigns its own tag to each RTT request that it sends to the initiator. This can be used by the target to easily identify data it receives, and can also be used as an RDMA tag [RDMA]. Satran, Smith, Sapuntzakis, Meth [Page 21] iSCSI June 2000 3.8. SCSI Data The typical data transfer specifies the length of the data payload, the Transfer Tag provided by the receiver for this data transfer, and a buffer offset. The typical SCSI Data packet for WRITE (from initiator to target) has the following format: Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x05) | Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Transfer Tag | +---------------+---------------+---------------+---------------+ 12| Data Offset | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 40/ Payload / +/ / +---------------+---------------+---------------+---------------+ The typical SCSI Data packet for READ (from target to initiator) has the following format: Satran, Smith, Sapuntzakis, Meth [Page 22] iSCSI June 2000 Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x85) | (0) |S|O|U| MaxCmdRN or (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| Transfer Tag | +---------------+---------------+---------------+---------------+ 12| Data Offset | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Residual Count | +---------------+---------------+---------------+---------------+ 24| Command Status|iSCSI Status | StatRN | +---------------+---------------+---------------+---------------+ 28/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 40/ Payload / +/ / +---------------+---------------+---------------+---------------+ 3.8.1. Length The length field specifies the total number of bytes in the following payload. 3.8.2. Transfer Tag The Transfer Tag identifies the operation to which this data transfer belongs. When the transfer is from the target to the initiator, the Transfer Tag is the Initia- tor Task Tag that was sent with the SCSI command. When the transfer is from the initiator to the target, the Transfer Tag is the Target Transfer Tag when RTT is enabled, or the Initiator Task Tag when RTT is disabled. 3.8.3. Buffer Offset The Buffer Offset field contains the offset of the fol- lowing data against the complete data transfer. The sum of the buffer offset and length should not exceed the expected transfer length for the command. Satran, Smith, Sapuntzakis, Meth [Page 23] iSCSI June 2000 3.8.4. Flags The last SCSI Data packet sent from a target to an ini- tiator for a particular SCSI command that completed suc- cessfully may optionally also contain the Command Status for the data transfer. In this case Sense Data cannot be sent together with the Command Status. If the command completed with an error, then the response and sense data must be sent in a SCSI Response packet and must not be sent in a SCSI Data packet. Byte 2 b0-1 as in an ordinary SCSI Response b2 (S) set to indicate that the Command Status field contains status. b3-7 not used (should be set to 0). If the (S) bit is set, then there is meaning to the extra fields in the SCSI Data packet (MaxCmdRN, Command Status, Residual Count, StatRN) Satran, Smith, Sapuntzakis, Meth [Page 24] iSCSI June 2000 3.9. Text Command The Text Command is provided to allow the exchange of information and for future extensions. It permits the initiator to inform a target of its capabilities or to request some special operations. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x04) | Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 40/ Text / +/ / +---------------+---------------+---------------+---------------+ 3.9.1. Length The length, in bytes, of the Text field. 3.9.2. Initiator Task Tag The initiator assigned identifier for this Text Command. 3.9.3. Text The initiator sends the target a set of key:value pairs in UTF-8 Unicode format. The key and value are separated by a ':' (0x3A) delimiter. Many key:value pairs can be included in the Text block by separating them with null ' ' (0x00) delimiters. Some basic key:value pairs are described in Appendix B. The target responds by sending its response back to the initiator. The target and ini- tiator can then perform some advanced operations based on their common capabilities. Manufacturers may introduce new keys by prefixing them with their (reversed) domain Satran, Smith, Sapuntzakis, Meth [Page 25] iSCSI June 2000 name, for example, com.foo.bar.do_something:0000000000000003 Any key that the target does not understand may be ignored without affecting basic function. Once the target has processed all the key:value pairs, it responds with the Text Response command, listing the parameters that it supports. It is recommended that Text operations that will take a long time should be placed in their own Text command. If the Text Response does not contain a key that was requested, the initiator must assume that the key was not understood by the target. Satran, Smith, Sapuntzakis, Meth [Page 26] iSCSI June 2000 3.10. Text Response The Text Response message contains the responses of the target to the initiator's Text Command. The format of the Text field matches that of the Text Command. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x84) | Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 40/ Text Response / +/ / +---------------+---------------+---------------+---------------+ 3.10.1. Length The length, in bytes, of the Text Response field. 3.10.2. Initiator Task Tag The Initiator Task Tag matches the tag used in the ini- tial Text Command and is used by the initiator to relate the Text Commands with the appropriate Text Responses. 3.10.3. Text Response The Text Response field contains responses in the same key:value format as the Text Command. Appendix B lists some basic Text Commands and their Responses. If the Text Response does not contain a key that was requested, the initiator must assume that the key was not understood by the target. Satran, Smith, Sapuntzakis, Meth [Page 27] iSCSI June 2000 3.11. Login Command After establishing a TCP connection between an initiator and a target, the initiator should issue a Login Command to gain further access to the target's resources. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x03) | Reserved (0) | CmdRN or Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8| CID | RecoverCID or 0 | +---------------+---------------+---------------+---------------+ 12| Reserved (0) | +---------------+---------------+---------------+---------------+ 16| ISID |TSID | +---------------+---------------+---------------+---------------+ 20/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 40/ Login Parameters in Text Command Format / +/ / +---------------+---------------+---------------+---------------+ 3.11.1. CID a unique id for this connection within the session 3.11.2. For a connection used to recover a lost TCP connection (see later the ID of the failed connection) the initiator provides the CID of the failed connection. A simple tar- get may reject recovery. In this case the initiator will terminate all outstanding commands with a check condi- tions and reset the target. The initiator may provide some basic parameters in order to enable the target to determine if the initiator may in fact use the target's resources. The format of the parameters is as specified for the Text Command. Targets may require keys to indicate the Domain Name of the ini- tiator and the target, and perhaps also an Authenticator key. The initiator may also provide additional Satran, Smith, Sapuntzakis, Meth [Page 28] iSCSI June 2000 parameters to the target in Text Command format, if the initiator so desires. Keys and their explanations are listed in Appendix B. Whenever desired an initiator will identify its view of the target as in: Target:<domain-name>[/modifier][:port] implying that the target is known as: scsi://<domain-name>[/modifier] and it should be connected through port "port" (the default well known port has an IANA defined value of xx) Initiators can use the same type of naming implying machine and optional a principal (e.g. operating system image) as in: Initiator:<domain-name>[/modifier] implying that the initiator is known as: iSCSI://<domain-name>[/modifier] Thus the parameters passed for a plain-text password authentication are: Initiator:<domain-name>[/modifier] Target:<domain-name>[/modifier] Authenticator:open-sesame The modifier iSCSI-SYS is reserved for administrative functions. ISID and TSID form collectively the SSID (session id). A TSID of 0 indicates a leading connection. Only a leading connection login can carry session specific parameters, e.g. max-connections-requested, the maximum immediate data length requested, etc.. CmdRN is significant only if TSID is 0 and indicates the starting Command reference number for this session; it should be 0 for all other Instances. Satran, Smith, Sapuntzakis, Meth [Page 29] iSCSI June 2000 3.12. Login Response The target responds to the Login Command with a Login Response. It is sufficient for the target to respond with a Status indicating that the Login is accepted. In fact, the target may completely ignore the parameters that were sent to it and may provide service to any ini- tiator that connects to it. The target may also return parameters using the format of the Text Response opcode, if it so desires. In particular, the target may want to provide its Authenticator key, so that the initiator can be sure that it is in fact talking with the correct tar- get. The initiator can request that the target provide the Authenticator parameter by specifying the SendAuthenticator:yes key:value pair. Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x83) | Reserved (0) | MaxCmdRN or Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 16| ISID | TSID | +---------------+---------------+---------------+---------------+ 20| Reserved (0) | +---------------+---------------+---------------+---------------+ 24| Status | Reserved (0) | +---------------+---------------+---------------+---------------+ 28/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 40/ Login Parameters in Text Command Format / +/ / +---------------+---------------+---------------+---------------+ The format of the Login Response is the same as the Text Response, with the addition of one field. 3.12.1. Status The Status returned in a Login Response is one of the following: 0 accept login (will now accept SCSI commands) Satran, Smith, Sapuntzakis, Meth [Page 30] iSCSI June 2000 1 reject login 2 additional authentication required 3 reject recovery In the case that the Status is "accept login" the initia- tor may proceed to issue SCSI commands. In the case that the Status is "reject login" the initiator should immedi- ately close down its end of the TCP connection, thus freeing up the target's port for some other connection. The target also has the option of immediately closing down its end of the TCP connection. In the case that the Status is "additional authentication required" the ini- tiator must provide additional authentication information by issuing the Text Command with the appropriate key:value pairs. (This may be required if the authenti- cation method is based on a challenge/response algo- rithm.) Upon receipt of the necessary authentication, the target will issue a Login Response with the "accept login" Status. SCSI Commands will not be accepted until the target provides a Login Response with the "accept login" Status. The TSID is an initiator identifying tag set by the target. A 0 in the returned TSID indicates that either the target supports only a single connection or that the ISID has already been used as a leading ISID. In both cases the target is rejecting the login. MaxCmdRN indicates the maximum CmdRN the initiator should send. When reaching this number (considering number wrap-around) the initiator should refrain from sending further commands until the initiator does receive a new MaxCmdRN that advanced past the old value. Satran, Smith, Sapuntzakis, Meth [Page 31] iSCSI June 2000 3.13. Ping Command Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x0) | Reserved (0) | MaxStatRN | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 40/ Ping Data (optional) / +/ / +---------------+---------------+---------------+---------------+ The Ping Command can be used to verify that a connection is still active. It may be useful in the case where an initiator has been waiting a long time for the response to some command, and the initiator suspects that there is some problem with the connection. When a target receives the Ping Command, it should respond with a Ping Response, duplicating as much of the data as possible that was pro- vided in the Ping Command (if such data was present). If the initiator does not receive the Ping Response within some period of time (determined by the initiator), or if the data returned by the Ping Response is different from the data that was in the Ping Command, the initiator may conclude that there is a problem with the connection. The initiator will then close the connection and may try to establish a new connection. 3.13.1. MaxStatRN - the next StatRN expected 3.13.2. Length The length of the optional Ping Data. 3.13.3. Initiator Task Tag An initiator assigned identifier for the operation. Satran, Smith, Sapuntzakis, Meth [Page 32] iSCSI June 2000 3.13.4. Ping Data Binary data that will be reflected in the Ping Response. Satran, Smith, Sapuntzakis, Meth [Page 33] iSCSI June 2000 3.14. Ping Response Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x80) | Reserved (0) | MaxCmdRN | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 40/ Return Ping Data / +/ / +---------------+---------------+---------------+---------------+ When a target receives the Ping Command, it should respond with a Ping Response, duplicating the data and Initiator Task Tag that was provided in the Ping Command, if present. Satran, Smith, Sapuntzakis, Meth [Page 34] iSCSI June 2000 3.15. Third Party Commands -INCOMPLETE! There are some third-party SCSI commands, such as COPY and EXTENDED COPY, that require one target (Target A) to act as an initiator to other targets (e.g., Target B). Some such commands can be extended in a straightforward way to accommodate new forms of addressing, and this should be done to address targets using iSCSI. These extensions are not straightforward for all commands, and they may not be able to encompass the full name space and authentication information needed for iSCSI in some con- texts. Thus iSCSI also provides a facility for assigning local short-form aliases to full addressing/authorization information for targets, and the aliases can be used in the SCSI commands and parameter data. The alias informa- tion is specified as Text following the header of the SCSI command specifying the third-party command. The header will thus appear as follows: Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x01) |1|0|A|Rsv|ATTR | CmdRN | +---------------+---------------+---------------+---------------+ 4| Length (!= 0) | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Expected Data Transfer Length | +---------------+---------------+---------------+---------------+ 24| SCSI Command Descriptor Block (CDB) | + + 28| | + + 32| | + + 36| | +---------------+---------------+---------------+---------------+ 40/ Extended CDB if any / +/ Parameters needed for Target B / / / +---------------+---------------+---------------+---------------+ Satran, Smith, Sapuntzakis, Meth [Page 35] iSCSI June 2000 The Length field will not be zero. Rather, it will con- tain the length of extend CDB if any an the length of the alias information which may include the name of Target B and an Authentication key in Text Command format. An example of the data for this command might be: LocalName:TargetB FullName:sj.foo.com/controller1 OriginalAuthenticator:open-sesame Upon receiving a third-party command, Target A will per- form login operations with the identified targets. In effect, Target A will become an initiator to Target B. Among the parameters provided to Target B, Target A may specify the authentication information from the initia- tor. The Text provided by Target A when it performs the Login command to Target B may contain the keys Target (referring to Target B) and Initiator (referring to Tar- get A), and it may also contain the keys Authenticator (of Target A), OriginalInitiator and OriginalAuthentica- tor (referring to authenticator of the original initia- tor). Satran, Smith, Sapuntzakis, Meth [Page 36] iSCSI June 2000 3.16. Opcode Not Understood Byte / 0 | 1 | 2 | 3 | / | | | | |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Opcode (0x88) | Reserved (0) | +---------------+---------------+---------------+---------------+ 4| Length | +---------------+---------------+---------------+---------------+ 8/ Reserved (0) / +/ / +---------------+---------------+---------------+---------------+ 40/ Header of Bad Message / +/ / +---------------+---------------+---------------+---------------+ 80 It may happen that a target receives a message with an Opcode that it doesn't recognize. This may occur because of a new version of the protocol that defines a new Opcode, or because of some corruption of a message header. The target returns the header of the message with the unrecognized opcode as the data of the response. Satran, Smith, Sapuntzakis, Meth [Page 37] iSCSI June 2000 4. iSCSI Error Handling 4.1. Communications Errors For any outstanding SCSI command it is assumed that iSCSI in conjunction with SCSI at the initiator is able to keep enough information to be able to rebuild the command PDU, that outgoing data is available (in host memory) for retransmission while the command is outstanding and that at target iSCSI and specialized TCP implementations are able to recover unacknowledged data packets from a clos- ing connection or, alternatively the target has means to re-read data from a device-server. It is further assumed that a target will keep a the "status & sense" for com- mands it has executed while the total number of outstand- ing commands and executed commands Does not exceed it's limit. A target will sequentially number the delivered responses and thus enable initiators to tell when a response is missing and what response they miss. Under those conditions iSCSI will be able to keep a ses- sion in operation provided that it has at is able to keep/establish at least one TCP connection between the initiator and target in a timely fashion. Unfortunately the maximum admissible recovery time is a function of the target and for some devices and communications networks recovery may be complex and may percolate to upper software layers. It is assumed that targets and/or ini- tiators will recognize a failing connection by either transport level means (TCP) or by a gap in the command stream that does not get filled for a long time, or by failing iSCSI ping (the later should be used by periodi- cally by highly reliable implementations). The recovery involves the following steps: -abort offending TCP connection(s) (target & initia- tor) and recover at target all unacknowledged read- data .ti -2 -create one or more new TCP connections (within the same session) and associate all out- standing commands from the failed connection(s) to the FIRST new connection at both initiator and tar- get -the initiator will reissue all outstanding commands with their original CmdRN and TaskID -upon receiving the new/restarting commands the target will resume command execution; for write commands it means requesting data retransmission through RTT, for reads retransmitting recovered data and for "terminated" commands retransmitting the status & Satran, Smith, Sapuntzakis, Meth [Page 38] iSCSI June 2000 sense while retaining the original StatRN. If data recovery is not possible the target will either pro- vide data from the media or redo the operation (if the operation is not idem-potent the device server may fail the operation). 4.2. Protocol Errors The authors recognize that mapping framed messages over a "stream" connection (like TCP) makes the proposed mechan- isms vulnerable to simple software framing errors and introducing framing mechanisms may be onerous for perfor- mance and bandwidth. Command reference numbers and the above mechanisms for connection drop and reestablishment will help handle this type of mapping errors. 4.3. Session Errors If all the connections of a session fail and can't be reestablished in a short time or if initiators detect protocol errors repeatedly an initiator may choose to terminate a session an establish a new session (indicat- ing old session termination?). It will terminate all out- standing request with an iSCSI error indication before initiating a new session. A target that detects one of the above errors will take the following actions: - Reset the TCP connections (close the session). - Abort all Tasks in the task set for the correspond- ing initiator. Satran, Smith, Sapuntzakis, Meth [Page 39] iSCSI June 2000 5. Notes to Implementers This section notes some of the performance and reliabil- ity considerations of the iSCSI protocol. This protocol was designed to take advantage of the a generic Remote DMA TCP options [RDMA], although it can still operate effectively without this TCP extension. 5.1. Small TCP Segments It is recommended that TCP segments be limited in size to no more than 8K bytes. One reason is to ensure that seg- ments won't get broken into smaller packets, thereby pos- sibly breaking the assumptions for RDMA and the informa- tion in the RDMA header. Another reason we recommend small segments is to allow a stronger type of checksum, possibly utilizing CRC, which is practical only for smaller segments. 5.2. Multiple Network Adapters The iSCSI protocol assumes that the Task Tags will also serve as RDMA tags. The iSCSI protocol allows multiple connections, not all of which need go over the same net- work adapter. If multiple network connections are to be utilized with RDMA, the iSCSI protocol command-data- status allegiance to one TCP connection insure that there is no need to replicate information across network adapters or otherwise require them to cooperate. 5.3. Autosense Autosense refers to the automatic return of sense data to the initiator in case a command did not complete success- fully. If autosense is turned off, the initiator must explicitly request that sense data be sent to it after some command has completed with a CHECK CONDITION status. The default for iSCSI is to work with Autosense enabled. Note that even if a SCSI target/LUN does not support Autosense, it may still be possible for iSCSI to work with Autosense. This can be accomplished as follows. Whenever a CHECK CONDITION status is about to be returned, the iSCSI component on the target immediately queries the target/LUN for the sense data. iSCSI can then return the sense data to the initiator together with the CHECK CONDITION status. It is not necessary for iSCSI to wait for the initiator to explicitly request the sense data; the target iSCSI code can perform this operation Satran, Smith, Sapuntzakis, Meth [Page 40] iSCSI June 2000 automatically, even for devices/LUNs that do not ordi- narily provide automatic sense data. 5.4. TCP RDMA option The TCP RDMA option [RDMA] is an annotation on individual TCP segments that can reduce the number of copies neces- sary at the receiver. The RDMA option succinctly describes the portion of a TCP payload that holds bulk data. 5.5. TCP Connection Options Some targets may want to inform (or negotiate with) an initiator concerning some parameters related to bandwidth, Quality of Service, or some other available features on its various network connections. These are exchanged between the initiator and the target using Text Commands and Responses. Satran, Smith, Sapuntzakis, Meth [Page 41] iSCSI June 2000 6. Security Considerations 6.1. Data Integrity We assume that end-to-end data integrity can be assured by TCP, by adding a more powerful checksum option when- ever this is considered important, or replacing the checksum by a weaker one (or even "nullifying it") for applications in which data integrity is not important and recovery from data errors could be harmful (e.g., audio or video distribution streams). 6.2. Login Process In some environments, a target will not be interested in authenticating the initiator. In this case, the target can simply ignore some or all of the parameters sent in a Login Command, and the target can simply reply with a basic Login Response indicating a successful login. Some targets may want to perform some kind of authentication. The Authenticator key is defined for this purpose. Vari- ous authentication schemes can be used, including encrypted passwords and trusted certificate authorities. Once the initiator and target are confident of the iden- tity of the attached party, the established channel is considered secure. It is anticipated that most target devices will not bother with all of the possible checks, but the protocol provides sufficient means to perform the checks, if required by the target. 6.3. IANA Considerations There will be a well known port for iSCSI connections. These well known ports will have to be registered with IANA. A checksum type will also have to be registered with IANA. Satran, Smith, Sapuntzakis, Meth [Page 42] iSCSI June 2000 7. Authors' Addresses Julian Satran Kalman Meth IBM, Haifa Research Lab MATAM - Advanced Technology Center Haifa 31905, Israel Phone +972 4 829 6211 Email: Julian_Satran@vnet.ibm.com meth@il.ibm.com Daniel F. Smith IBM Almaden Research Center 650 Harry Road San Jose, CA 95120-6099, USA Phone: +1 408 927 2072 Email: dfsmith@almaden.ibm.com Costa Sapuntzakis Cisco Systems, Inc. 170 W. Tasman Drive San Jose, CA 95134, USA Phone: +1 408 525 5497 Email: csapuntz@cisco.com Efri Zeidner SANGate Israel efri@sangate.com Comments may be sent to Julian Satran, Daniel Smith, Costa Sapuntzakis, or Kalman Meth. Satran, Smith, Sapuntzakis, Meth [Page 43] iSCSI June 2000 8. References and Bibliography [RDMA] Internet Draft: TCP RDMA option (work in pro- gress) [SAM2] ANSI X3.270-1998, SCSI-3 Architecture Model (SAM-2) [TLS] The TLS Protocol, RFC 2246, T. Dierks et al. [ALTC] Internet Draft: Alternative checksums (work in progress) [CAM] ANSI X3.232-199X, Common Access Method-3 (Cam- 3) [CRC] ISO 3309, High-Level Data Link Control (CRC 32) [RFC793] Transmission Control Protocol, RFC 793 [RFC1122] Requirements for Internet Hosts-Communication Layer, RFC 1122, R. Braden (editor) [SBC] ANSI X3.306-199X, SCSI-3 Block Commands (SBC) [SCSI2] ANSI X3.131-1994, SCSI-2 [SPC] ANSI X3.301-199X, SCSI-3 Primary Commands (SPC) Satran, Smith, Sapuntzakis, Meth [Page 44] iSCSI June 2000 9. Appendix A - Examples 9.1. Read operation example |Initiator Function| Message Type | Target Function | +------------------+-----------------------+----------------------+ | Command request |SCSI Command (READ)>>> | | | (read) | | | +------------------+-----------------------+----------------------+ | | | Prepare Data Transfer| +------------------+-----------------------+----------------------+ | Receive Data | <<< SCSI Data | Send Data | +------------------+-----------------------+----------------------+ | Receive Data | <<< SCSI Data | Send Data | +------------------+-----------------------+----------------------+ | Receive Data | <<< SCSI Data | Send Data | +------------------+-----------------------+----------------------+ | | <<< SCSI Response |Send Status and Sense | +------------------+-----------------------+----------------------+ | Command Complete | | | +------------------+-----------------------+----------------------+ Satran, Smith, Sapuntzakis, Meth [Page 45] iSCSI June 2000 9.2. Write operation example +------------------+-----------------------+---------------------+ |Initiator Function| Message Type | Target Function | +------------------+-----------------------+---------------------+ | Command request |SCSI Command (WRITE)>>>| Receive command | | (write) | | and queue it | +------------------+-----------------------+---------------------+ | | | Process old commands| +------------------+-----------------------+---------------------+ | | | Ready to process | | | <<< RTT | WRITE command | +------------------+-----------------------+---------------------+ | Send Data | SCSI Data >>> | Receive Data | +------------------+-----------------------+---------------------+ | Send Data | SCSI Data >>> | Receive Data | +------------------+-----------------------+---------------------+ | | <<< RTT | | +------------------+-----------------------+---------------------+ | Send Data | SCSI Data >>> | Receive Data | +------------------+-----------------------+---------------------+ | | <<< SCSI Response |Send Status and Sense| +------------------+-----------------------+---------------------+ | Command Complete | | | +------------------+-----------------------+---------------------+ Satran, Smith, Sapuntzakis, Meth [Page 46] iSCSI June 2000 10. Appendix B - Login/Text keys 10.1. Target Target:domainname[/modifier] Examples: Target:disk-array.sj-bldg-h.cisco.com Target:disk-array.sj-bldg-h.cisco.com/disk3 This key is provided by the initiator of the TCP connec- tion to the remote endpoint. The Target key specifies the domain name of the target, since that information is not available from the TCP layer. The target is not required to support this key. The initiator should send this key in the first login message. The Target key might be used by the target to learn the intended initiator view of the target. 10.2. Initiator Initiator:[domainname[/modifier]] Examples: Initiator:sample.foobar.org Initiator:cluster.foobar.org/machine1 Initiator: The Initiator key enables the initiator to identify itself to the remote endpoint. The domain name should be that of the initiator. A zero-length domain name is interpreted as "other side of TCP connection". The target may silently ignore this key if it does not support it. For more security, a certificate-based protocol [TLS] may be used on the channel and take precedence over this pro- tocol. 10.3. Authenticator Authenticator:<UTF8-String> Examples: Authenticator:open-sesame The authenticator is a secret that the initiator uses to gain access to the target's LUNs. Satran, Smith, Sapuntzakis, Meth [Page 47] iSCSI June 2000 10.4. SendAuthenticator SendAuthenticator:yes Response: Authenticator:<UTF8- String> Examples: SendAuthenticator:yes -> Authenticator:alakazam The SendAuthenticator key is used to request from the party on the other side of the TCP connection to send its Authenticator. iSCSI devices may refuse to grant access until proper authentication has been performed by the parties involved. 10.5. AllowNoRTT AllowNoRTT:<yes|no> Response: AllowNoRTT:<yes|no> Exam- ples: AllowNoRTT:yes -> AllowNoRTT:yes The AllowNoRTT key is used to allow an initiator to send data to a target without the target having sent an RTT to the initiator. The default action is that RTT is required, unless both the initiator and the target send this key-pair attribute specifying AllowNoRTT:yes. Once AllowNoRTT has been set to 'yes', it cannot be set back to 'no'. 10.6. OriginalInitiator OriginalInitiator:[domainname[/modifier]] Examples: OriginalInitiator:sample.foobar.org The OriginalInitiator key is used to perform a proxy login from one target to another target in order to per- form a third-party operation (like COPY) for some initia- tor. The first target acts as the initiator for the second target, but it must provide the authorization information of the original initiator. Satran, Smith, Sapuntzakis, Meth [Page 48] iSCSI June 2000 10.7. Target2 Target2:domainname[/modifier] Examples: Target2:sample.foobar.org Target2:sample.foobar.org/raid2 The Target2 key is used in a third-party SCSI command (like COPY) between targets that do not lie on the same SCSI fabric. The initiator must specify the name of the distant target to the original target, so that the origi- nal target can Login to the distant target and then per- form the third-party command. Expires 15 December 2000 Satran, Smith, Sapuntzakis, Meth [Page 49]
Home Last updated: Tue Sep 04 01:08:13 2001 6315 messages in chronological order |