Re: iSCSI: comments on draft

To: ips@ece.cmu.edu
Subject: Re: iSCSI: comments on draft
From: julian_satran@il.ibm.com
Date: Mon, 13 Nov 2000 08:38:19 +0200
Content-Disposition: inline
Content-type: multipart/mixed; Boundary="0__=vA5jYuuJLPskVtnN6ESkB5yWeKETSDe03a8LyID6x5Jr6KyczavQKakq"
Sender: owner-ips@ece.cmu.edu



Jim,

Thanks for your careful reading.


On most of the editorial changes I will not comment here - I will correct
as much as I can
up to the end of next week.

Some non-editorial comments:

   timers - I am aware that they are not a proper protocol issue as they
   don't appear on the wire. They are somewhat similar to TCP timers.  The
   intent was to help gateways to FCP do some stateless recovery.  I think
   we should consider them carefully and decide how many of them we want,
   if the values have to be agreed by initiator and target (I had a private
   mail suggesting that) or that, in extremis, we can live without them at
   all (as Matt suggested).  If we include them we have to make their
   implementation mandatory -   if gateway recovery can't assume they are
   implemented then they are worthless.
   At connection termination the finish was meant to convey "wait until
   they finish" - I did not  mean abort.  I will try a formulation that
   clearly expresses this intent.
   On Bidi as I said in a previous note - I am still considering how to fit
   it in
   On the AE for abort - it was optional and I heard a strong voice for
   making it mandatory (of the sort we call in this group consensus) so I
   did.  I personally consider that a check condition and ACA are required
   anyhow and AE will give you the "management trigger" for initiators
   otherwise idle.
   Loging Resets - the place IP devices today log is in MIB that can be
   retrieved through SNMP.  I will try to clarify this - but I will not
   make any specific reference to a MIB field as this group has not yet
   work on the proposed MIB
   LUN or Reserved in data out.    The LUN is required for solicited data
   as the target task tag is not required to be unique target-wide.  I will
   attempt to spell it out.
   The 8 byte descriptors apply to the unmap function I will try to spell
   it out.
   The purpose of map and unmap is to enable target addressing in all IP
   formats in binary as well as DNS style formats. When T10 will have it we
   can make the commands obsolete. Maps implementation is a entirely up to
   the target. However values mapped are assumed to be persistent until
   unmapped.    In the text I a working on I state that no third party
   address can be used if it was not mapped and appropriate error will be
   returned if this rule is violated or the target has lost the maps due to
   a power down or reset. As for proxy tokens - how are they covered in
   third party commands? I will clean-up a bit the wording and will add a
   statement about table reset. However the mechanisms used by the  target
   to implement the mapping and to keep it are implementation specific. The
   clarification we have to add is that SRAs are guaranteed walid only on a
   given session (you are not supposed to hand them from initiator to
   initiator) bu the target is not required to check the SRA ownership
   validity
   The IP in IPsec might lead you to think that it provides only link level
   security but IPsec provides (depending on the policy) end-to-end
   security
   For errors in format we will introduce a specific sense - iSCSI format
   error
   I value your insight in SCSI working and you helping us out on this -
   and it was from this that the team realized during the drafting meeting
   in Haifa that if an OS+CPU has already a signature, the AccessID, we
   could as well use it and not invent a new one.  I don't see exactly what
   layering principles are violated here - an ID is an ID and it does not
   make sense to have 10 of them.  The principals involved are the same.  I
   can hardly understand the rest of your comment. Isn't the AccessID
   identifying the initiator to the target?  Why should we maintain
   different IDs for the same principals?  (layering is an abstraction
   techniquue that does hardly apply here )

Thanks again,
Julo

Please respond to "Jim Hafner/Almaden/IBM" <hafner@almaden.ibm.com>

To:   ips@ece.cmu.edu
cc:
Subject:  iSCSI: comments on draft



Folks,

Here's some rather extensive comments on the draft. Some are editorial,
some are technical (minor and major) and some are questions. My apologies
for the length.  I tried to <snip> as much as possible and still leave
enough context.

<JIM>
The naming and discovery team (NDT) is moving in the direction that
targets may also listen on other TCP ports, so if that is adopted,
this will need to be reworded (as will other parts of this document
when NDT is done.
</JIM>


1.1 <snip>
   A “SCSI transportŸ maps the client-server SCSI protocol to a specific
   interconnect. Initiators are one endpoint of a SCSI transport. The
   “targetŸ is the other endpoint. A “targetŸ can have multiple LUs
   behind it. Each logical unit has a number called a LUN.

<JIM>
I would rephrase this as "Each logical unit has an address within a
target called a LUN".
</JIM>

   A SCSI task is a SCSI command or possibly a linked set of SCSI
   commands. Some LUNs support multiple pending (queued) tasks. The
   queue of tasks is managed by the target, though. The target uses an
   initiator provided "task tag" to distinguish between tasks. Only one
   command in a task can be outstanding at any given time.

<JIM>
I would also be careful throughout to not use the term LUN for logical
unit; I would replace "LUNs" in the second sentence above by "logical
units".  [There is one other occurence noted below for this colloqueal
misuse of the term LUN.]
</JIM>

<snip> (still 1.1)

   Each SCSI command results in an optional data phase and a response
   phase. In the data phase, information can travel from the initiator
   to target (e.g. WRITE), target to initiator (e.g. READ), or in both
   directions. In the response phase, the target returns the final
   status of the operation, including any errors. A response terminates
   a SCSI command.

<JIM>
The first sentence can be interpreted that the response phase is
optional. Is this intended or do we want something like "involves an
optional data phase and a required response phase".  [Note: is the
word "results" the right one here?]
</JIM>

   Command Data Blocks (CDB) are the data structures used to contain the
   command parameters to be handed by an initiator to a target. The CDB
   content and structure is defined by [SAM] and device class specific
   SCSI standards.

<JIM>
"device-type specific" to use the SAM words.
</JIM>


1.2.3 Timers and timeouts
<snip>

<JIM>
Why are the initiator timers mandatory?  Isn't it up to the
implementation to decide if there are timing requirements.  There is
no target requirement here, so how do you even know this is working?
</JIM>

<snip>

1.2.6 iSCSI Full Feature Phase

   Once the initiator is authorized to do so, the iSCSI session is in
   iSCSI full feature phase. The initiator may send SCSI commands and
   data to the various LUNs on the target by wrapping them in iSCSI
<JIM> LUNs->logical units </JIM>

<snip>
                                     An initiator MAY request, at login,
   to send immediate data blocks of any size. If the initiator requests
   a specific block size the target MUST indicate the size of immediate
   data blocks it is ready to accept in its response.  Beside iSCSI,
   SCSI also imposes a limit on the amount of unsolicited data a target
   is willing to accept. The iSCSI immediate data limit MUST not exceed
   the SCSI limit.

<JIM>
We should give a reference to where this limit is defined and
specified in the SCSI world (Mode Page 02h, disconnect/reconnect page,
First Burst Size) in SPC-2.
</JIM>

1.2.7 iSCSI Connection Termination

   Connection termination is assumed an exceptional event.
   Graceful TCP connection shutdowns are done by sending TCP FINs.
   Graceful connection shutdowns MUST only occur when there are no
   outstanding tasks that have allegiance to the connection.  A target
   SHOULD respond rapidly to a FIN from the initiator by closing it's
   half of the connection as soon as it has finished all outstanding
   tasks that have allegiance to the connection.  Connection termination
   with outstanding tasks may require recovery actions.

<JIM>
Can/should this have some definition of what "finish all outstanding
tasks means"?  E.g., Abort tasks -- if you don't abort, where are you
going to send status?
</JIM>

2.2 SCSI Command

<snip>

<JIM>
Is there a strong reason to put the Bidi stuff AFTER all the other
stuff and not within the context of the main header (similar question
as here in the context of the response PDU) or to not have it like
FCP-2 where the DL field is *after* the CDB and the proposed FCP-2 has
the Bidi-READ DL field after that normal DL field?

It looks kludgy (spelling?) to have a separation of the two DL fields
by other stuff.
</JIM>


2.2.6 CDB - SCSI Command Descriptor Block

   There are 16 bytes in the CDB field to accommodate the largest
   currently defined CDB.  Whenever larger CDBs are used, the CDB
   spillover MAY extend beyond the 48-byte header.

<JIM>
There are larger than 16byte CDBs defined already (see SPC-2 and SBC-2).
Perhaps a better phrasing is "to accomodate the size of the most
commonly used CDBs".
</JIM>

2.3.1 Byte 1 - Flags

<snip>
      b4-6 not used (SHOULD be set to 0)

<JIM>
Should we use the T10 style "reserved" model for unused bits and
require that the target check that these are zero, so that in the
future, if a definition is given to them, we won't have to worry about
bad initiators that didn't initialize these bits correctly?

Related, what's the error path for the target if something is wrong
(like "o" and "u" both set? (See next item.)

Note: initiator's don't need to care about bad fields because there's
nothing they can do about it!
</JIM>

<snip> (this next is on Task Management)
2.6.1 Function

<snip>
   For the <Clear Task Set> the target MUST send an Asynchronous Event
   to all other attached initiators to inform them that all pending
   tasks are cancelled and then enter the ACA state for any initiator
   for which it had pending tasks.

<JIM>
So we are requiring AE?  All other protocols don't (AFAIK).

Also, we should be aware of the new SCSI status that has been approved
for SAM-2, called TASK ABORTED. This is used (under certain conditions
to deal with legacy hosts) to inform an initiator that its tasks were
aborted by the actions of another initiator.  I think that an
initiator that has requested the TASK ABORTED status (via Control Mode
Page) should NOT be given the AE and it should be handled by this new
status.

I'd like to hear from other people (particularly others who are closer
to SAM-2) on the need for AE in this case as well.
</JIM>

<snip>
   In addition, for the <Target Cold Reset> the target then MUST
   terminate all of its TCP connections to all initiators (all sessions
   are terminated). However, if the target finds that it cannot send the
   required response or AE it MUST continue the reset operation and it
   SHOULD log the condition for later retrieval.

<JIM>
Are we spec'ing specifics on the content of this "log" and on the
methods for retrieving that log?  And who can ask for the log?
</JIM>

2.8 SCSI Data

<snip>

   Byte /    0       |       1       |       2       |       3       |
      /              |               |               |               |
     |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|
     +---------------+---------------+---------------+---------------+
    0|F| 0x05        |1|0| Reserved (0)                              |
     +---------------+---------------+---------------+---------------+
    4| Length                                                        |
     +---------------+---------------+---------------+---------------+
    8| LUN or Reserved (0)                                           |
   12|                                                               |
     +---------------+---------------+---------------+---------------+
   16| Initiator Task Tag                                            |
     +---------------+---------------+---------------+---------------+
   20| Target Task Tag (solicited) or Reserved (0) (unsolicited)     |
     +---------------+---------------+---------------+---------------+
   24| Reserved (0)                                                  |
     +---------------+---------------+---------------+---------------+
   28| ExpStatRN                                                     |
     +---------------+---------------+---------------+---------------+
   32/ Reserved (0)                                                  /
     /                                                               /
     +---------------+---------------+---------------+---------------+
   40| Buffer Offset                                                 |
     +---------------+---------------+---------------+---------------+
   44| Reserved (0)                                                  |
     +---------------+---------------+---------------+---------------+
   48/ Payload                                                       /
    +/                                                               /
     +---------------+---------------+---------------+---------------+


<JIM>
This specifies that bytes 8-15 are LUN or Reserved (0).  Which is it?
Under what conditions is the LUN required?  What happens if the LUN
doesn't match the one that task tag implies (or is that not a
problem?)?
</JIM>

<snip>
2.15 Map Command


   Byte /    0       |       1       |       2       |       3       |
      /              |               |               |               |
     |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|
     +---------------+---------------+---------------+---------------+
    0|F| 0x07        |1|0| Function  | Reserved (0)                  |
     +---------------+---------------+---------------+---------------+
    4| Length                                                        |
     +---------------+---------------+---------------+---------------+
    8| Reserved (0)                                                  |
     +                                                               +
   12|                                                               |
     +---------------+---------------+---------------+---------------+
   16| Initiator Task Tag                                            |
     +---------------+---------------+---------------+---------------+
   20| Reserved (0)                                                  |
     +---------------+---------------+---------------+---------------+
   24| CmdRN                                                         |
     +---------------+---------------+---------------+---------------+
   28| ExpStatRN                                                     |
     +---------------+---------------+---------------+---------------+
   32/ Reserved (0)                                                  /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
   48| Descriptor Type               | Descriptor Length             |
     +---------------+---------------+---------------+---------------+
   52/ Descriptor                                                    /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
   ---------------------------------------------------------------------
     +---------------+---------------+---------------+---------------+
     | Descriptor Type               | Descriptor Length             |
     +---------------+---------------+---------------+---------------+
     / Descriptor                                                    /
    +/                                                               /
     +---------------+---------------+---------------+---------------+


        or


     +---------------+---------------+---------------+---------------+
   48| 8 byte Descriptor                                             |
    +|                                                               |
     +---------------+---------------+---------------+---------------+
   ---------------------------------------------------------------------
     +---------------+---------------+---------------+---------------+
   N | 8 byte Descriptor                                             |
    +|                                                               |
     +---------------+---------------+---------------+---------------+

<JIM>
I don't understand the format of this command, especially in the "or"
case.  This looks like just a list of 8 byte descriptors.  To what do
they MAP?
</JIM>

<snip>
2.15.1 Function

<snip>
   Address/access control descriptors follow the header.  For the map
   function the following descriptor types are defined:

      0    Binary IP Version 4 TCP address (IP+Port) followed by a
      selector string; length should be 6+the selector length+1
      1    Binary IP Version 6 TCP address (IP+Port) followed by a
      selector string; length should be 18+the selector length+1
      2    iSCSI URL (domain name terminated with null followed by a
      selector followed by null)
      3    FC address & port - in case access control is based on
      transport ID
      4    access proxy token

   Details for 3 & 4 have to be coordinated with T10

   For the unmap function the descriptors are standard 8 byte SRAs (SCSI
   Reference Address)


<JIM>
Where are the SRA's specified in this format?  Am I really missing
something?

There is only one reason for this Map command (AFAIK).  Namely, to map
long ipstyle addressing mechanisms (e.g., IPv6) to a smaller 8byte
alias for the purposes of third party addressing in SCSI commands like
EXTENDED COPY and some of the XOR commands.  The limitation of the
current spec (SPC-2) on those commands is that the Target identifier
is only 8 bytes long (and cannot be extended).  Specifically, the only
need is for name or address resolution of the Target Device and NOT
for the logical unit (that is already handled by other 8 byte fields
in the target descriptors).

SPC-2 and SPC-3 already have descriptors for FC address & port (SPC-2)
and IPv4 (just approved for SPC-3) so there is no need for that here.
Note also that the IPv4 target descriptor included a 2byte field for
"Protocol" (that is, UDP, TCP, etc.).

I firmly believe that T10 will approve a SCSI version of this command
(very soon; there is definite movement in this direction as there is
need for this both in iSCSI and in SRP (formerly known as SVP)) so
that this is NOT needed at all; I personally recommend removing this
so as not to create confusion later.

I may be uninformed, but what is a "selector string"?

Are maps initiator specific or global for the target?  Are they
volatile?  Under what conditions can the target clean out its map
table? Can I blow away someone else's map values?  Can I query the
mapping table, either for the entire table or for the specific mapping
of a particular SRA?

I don't know what access control information is relevant here. In
particular:

-- What does FC address & port mean in this context for access
controls based on TransportID (note the spelling as well)?

-- There is *no need* or value in access proxy token in this context.
That is handled *completely* in the SCSI Access Controls as approved
by T10.  Proxy Tokens are indirect handles to identify a logical unit
and NOT for identification of a Target Device.
</JIM>

<snip>
2.21 Third Party Commands

   There are some third-party SCSI commands, such as (EXTENDED) COPY and
   COMPARE that involve more than one target. In it's most general form
   those commands involve the "original target" called the COPY-Manager
   and a (variable) number of other machines called source and
   destination. The whole operation is described by one "master CDB"
   delivered to the Copy manager and a series of descriptor blocks; each
   descriptor block addresses a source and destination target and LU and
   a description of the work to be done in terms of blocks or bytes as
   required by the device types. The relevant SCSI standards do not
   require full support of the (EXTENDED) COPY or COMPARE nor do they
   provide a detailed execution model.  We will assume, in the spirit of
   [SPC-2], that a COPY manager will read data from a source and write
   them to a destination.

   To address them an iSCSI COPY manager will use information provided
   to it through map commands and the SRAs and flags provided in the
   descriptors - allowing for iSCSI and FC sources and destinations.

   Enabling a FC COPY manager to support iSCSI sources and destinations
   is subject to coordination with T10.

<JIM>
Note that COPY and COMPARE have now been made Obsolete in SPC-2 so
there probably isn't a reason to mention them here.

The language of the first paragraph reads to me like an editorial
comment on SPC-2.  I would suggest wording for this section that more
closely resembles that of FCP-2.
</JIM>

<snip>

6.2.2.4 Encryption

   This mode provides for the end-to-end encryption (e.g. IPsec). In
   addition to authenticating the client, it provides end-to-end data
   integrity and protects against man-in-the-middle attacks,
   eavesdropping, message insertion, deletion, and modification.
<JIM>
I thought IPsec provides only link-level security and NOT end-to-end.
Am I wrong?
</JIM>

<snip>
Appendix A
02 Authentication

<snip>

   The authentication methods to be used are public key, user/password
   or challenge/response.
<JIM>
We don't allow for Kerberos in here (or something like that)?
</JIM>

   If public key is selected then each party MUST use:

      authenticate:<user-id>,<blob>

   where user-id is the SCSI access-id of the host-OS for the initiator
   or the World-Wide-Name for the target and blob is the public-key
   blob.

<JIM>
As the author of the SCSI access controls, I am personally not
comfortable with use of the AccessID (note the spelling) in this
context for authentication.  Note that AccessIDs are NOT used for
security in SCSI, only for identification.  For layering reasons and
others, I feel that iSCSI security should be based on independent
principles between the iSCSI entities and NOT on SCSI related
concepts.

Additionally, AccessIDs are NOT required for initiators in SCSI and so
cannot (should not?) be required here.

Finally, let me mention that a (weak) reason to use AccessIDs in this
context is that a given target may decide to reject a login for a
particular AccessID if that initiator has no accessible logical units.
On the other hand, the initiator may have two other reasons (in the
context of SCSI access controls) for connecting to a target *even if*
that initiator has no accessible logical units.  These include:
1) the initiator needs to deliver a SCSI ACCESS CONTROL IN/OUT command
to that target to query or change access controls (weak authentication
of the CDB itself is done with a "key" embedded in the command or the
parameter data).
2) the initiator holds a proxy token that (indirectly) references a
logical unit on that target (this token is also embedded in the CDB or
parameter data).
So, either we include these additional tags in parallel with AccessID
for the purposes of authentication OR we don't use any of them.

I vote for not using any of them for this purpose.  I don't object to
including a key:value for these three things as additional data in
login (especially after the security context has been established) but
I think they need to be divorced from the authentication proceedure
completely.
</JIM>



Jim Hafner
IBM Research

Prev by Date: RE: ISCSI: Urgent Flag requirement violates TCP.
Next by Date: RE: ISCSI: Urgent Flag requirement violates TCP.
Prev by thread: iSCSI: comments on draft
Next by thread: Re: iSCSI: comments on draft
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:06:27 2001
6315 messages in chronological order