Draft San Diego minutes

To: ips@ece.cmu.edu
Subject: Draft San Diego minutes
From: Black_David@emc.com
Date: Mon, 15 Jan 2001 15:54:02 -0500
Content-Type: text/plain;charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu
Here's the initial draft of the San Diego minutes.
Comments need to be made in the next few days, as
the final version for the proceedings will probably
be sent in on Friday.

Thanks,
--David
---------- IPS Meeting Minutes, Monday December 11, 2000

EMC will be sending out an IPR notice regarding a patent related to iSCSI
and FCIP.  David  will be sending the information to the IPS mailing list
this week.

Interim meeting being scheduled for week of January 15, to coincide with T10
in
Orlando - Grosvenor resort.

-- Framework document - Mark Carlson.
	Describes environments for IP Storage.  Includes terms, background
on
		various protocols.  This is a living document.
	Currently more of a survey.
	This document will coordinate with Naming and Discovery.
	Looking for more co-authors, please contact Mark if you are
interested.

-- Framing discussion -- Randy Haagens and Allyn Romanow
- Allyn and Randy were asked to compose this presentation by the ADs.
	Purpose was to try to clarify the problem and present a range of
solutions.
- Framing is a common challenge with for both iSCSI, FCIP as well as non IPS
	documents.  While framing is not explicitly required, a solution for
a
	more effective iSCSI specification is highly desirable.  The focus
of
	the presentation was understanding the requirements of framing (i.e.
the
	problem). Reaching consensus on a solution was not one the goals of
the
	presentation. Allyn started the presentation by pointing out that
this
	topic will also be discussed on Monday night in the TSVWG.
- The problem: TCP reassembly can be costly, and in some instances not
feasible.
	Also, there is limited host memory and host bus bandwidth, so we
want to avoid
	manipulating the data more than once.  Best would be one use of the
bus and
	memory - zero copy.  Note:  This is not the same as TCP zero copy.
In TCP,
	typically wait for all the data to arrive, then copy data to host. 
- In outbound direction, data can be transferred directly from memory to the
protocol
	controller and out onto the wire.  In the inbound direction, when
received out
	of order, requires data to be put in reassembly buffer until all
data is received.
- One solution: Direct Memory Placement (Payload steering; data steering;
RDMA) --
	In order to conserve host memory bandwidth, CPU cycles and reduce
on-board
	memory requirements, it is desirable to deliver iSCSI data directly
to host
	buffers, avoiding the overhead of TCP reassembly buffers.  The TCP
reassembly
	buffer can be 250MB for a 10Gbps link with 200ms round-trip time.
At 1Gbps,
	reassembly is possible but very costly.  But when get to 10Gbps
speeds or above,
	reassembly is no longer feasible.  So, the goal is to get rid of a
separate
	TCP reassembly buffer.  Can decode ULP (iSCSI) headers and place
payload
	directly in host memory without intermediate buffers.  This would
not be a
	conventional NIC card; instead it would be very iSCSI aware, but it
would not
	necessarily process the iSCSI headers, but just use them to
determine where
	to place the data.  As in TCP, the iSCSI stream is presented to the
iSCSI
	protocol processor in-order.
- In this solution, must address loss of ULP sync - when a segment
containing a
	ULP header is dropped or delayed, ULP sync is lost.  Direct data
placement cannot
	continue; data must be diverted to a reassembly buffer.  Goal is to
recover ULP
	sync at the next ULP header.  There are both TCP aware and TCP
unaware solutions
	to recovering ULP sync.
- TCP unaware approaches:
	a) SCTP - issues include not widely deployed
	b) Special Characters - requires byte by bytes processing
	c) Fixed length ULP messages - Inefficient for short ULP messages
	d) Periodic Marker - Best solution for this class of approaches
		Sublayer of a framing protocol.  Managable; relatively easy
to
		implement in hardware Marker 4 byte field number of ULP
bytes
		remaining in current PDU.  Marker inserted and removed by
		framing protocol; e.g. iSCSI.  After loss of sync, locate
next mearker;
		use to locate the next ULP PDU.  Markers are transmitted
twice in a row;
		ensures markers cannot be split by stream
fragmentation/segmentation.
- TCP aware Approaches
	a) URGent pointer - disallowed
	b) PSH bit - disallowed
- Another TCP aware approach can be considered by the TSV working group.
	Allyn Romanow presented  what the TSV working group works on. The
working
	group works on small items in the transport area that do not need a
full
	working group as well as TCP/UDP transport issues.
- Allyn Romanow presented a technique for demarcating message boundaries
using TCP
	option.  This consists of using one of the reserved bits in the TCP
header
	to extend TCP to support this type of framing. Then can add up to 40
bytes
	before the TCP payload.  Problem is that these reserved bits are a
scarce
	resource; need to evaluate the need for the change.  Also any time a
change
	to TCP is proposed, there is tension, e.g. tension between the need
to update
	TCP and stability of TCP.
- Procedure for standardizing a TCP option consists of 
	a) The IESG has to approve new work items for the TSV wg.
	b) Ask the Transport Services (TSV) working group to adopt this as a
WG item
	c) Pros-and cons will be discussed on the TSV wg mailing list. If it
supported,
		hopefully the spec will be wrapped at the next IETF (roughly
3 month time
		frame). If no support, it's dead. The advantage of the TSV
wg is that
	transport experts will be able to contribute feedback.
	d) If supported, will be adopted at next IETF meeting.

Advantage is that people who are experts in transport will be able to
contribute, and
that this will not be an  iSCSI specific solution.  IPS should follow this
process and
contribute.  Make sure that the solution (since not iSCSI specific) meets
the needs
of this group.

This is a very common problem, that is worthy of consideration at the
transport layer.
Addresses areas beyond IPS

Allison pointed out that TCP option is not the only approach. TCP header
bits could
potentially be used for framing.

The flag approach may send many packets that are less than MSS. This is
potentially a
risky change to TCP.

- Message Boundary Option
	Two approaches.  Not drafts, very introductory.
Flag approach --  Costa has written up; will post as draft.
	The flag approach may send many packets that are less than MSS.
This is
	potentially a risky change to TCP.  ULP header is aligned with first
byte of
	TCP payload.
Offset Approach -- 4 bytes.  2 byte offset indicates offset into TCP payload
of
	first ULP header in the segment.  Write-up forthcoming.

Discussion - Lead by Steve Bellovin
	Steve Requested the group concentrate on Requirements

Somesh Gupta (HP) proposed another option -- periodic alignment instead of
periodic
marker.  There could be a requirement in iSCSI that an upper-layer header
appear
every n kbytes in the TCP stream.  Padding could be used to make sure this
happens.
Requires API change to TCP/IP stack.
		
Ed Cox indicated that this issue has originated multiple times in past.  It
needs to be a
general case, not IPS specific.  Randy/Allyn argues that this is the general
case, in
that ULP usually have own PDU size info.

Somebody thought that you would need to encode multiple message boundaries
in any TCP
options or  otherwise use just one upper layer PDU per TCP segment. This
would either
imply small packets or lots of overhead. The reply was that we don't need to
identify
every boundary. We can use the length fields in the ULP frames to find the
next ULP
header in the packet.

Steph Bailey (Genroco) asked how do you handle losing the first part of the
packet;
don't you get into the  same situation trying to avoid?  Pointed out that
the
message boundary proposals don't address the issue of the message being very
long.
Can't have unbounded ULP, since if we do, and lose the ULP header, have to
be prepared to buffer an unlimited amount of data in anonymous buffers .
With max ULP,
must buffer up to max length of ULP, and this must be a reasonable size.
	
Julian Satran said that the mechanism should be generic enough that other
ULPs can
use it easily.  Venkat seconded this suggestion, saying that we should also
treat
RDMA, VI, etc. in the proposal. Want to have something that would have wider
application, so to reduce HW dev costs.

Someone from Sun asked when when is the TCP option examined - only when
there is
a loss on the receive side, in order to recover.

Question asked how does this relate to RDMA?  Allyn response -- RDMA
different but
related.  RDMA proposal either implicitly or explicitly addresses framing.
May make
sense to do a generalized RDMA protocol that would make use of this framing
mechanism.

There was consensus at the end of the discussion - that framing is best done
at
the transport layer and  should be done generically.

Modifying TCP vs not :  Luciano Dalle Ore pointed out that deployment of a
general
solution, will take much longer to get a solution.  Will not be mandatory.
May run
into interoperability problems.  Options are truly options some will be able
to support;
others will not.  If inband, can spec it now, possibly require it.

Question asked based on past experience w/options in TCP, how long will this
take to
propagate? Allyn responded once defined, based on previous experience, how
long to get
procedure defined - a couple of months (by March, 2001); work over mailing
list.
Will not hold up iSCSI development effort.  Deployment 1-2 years.  

Allison pointed out that there is motivation to get this done/deployed, so
deployment
could occur much  quicker.   When SACK done, was no motivation to get
adopted.  

Question asked if this framing would be mandatory.  David indicated it would
probably
not be.

Asked about necessity of attending tonight's meeting.  Allyn responded that
the proposal
will be discussed tonight, but much more discussion on reflector.
Recommended IPS
participants sign up for TSV reflector.

Somebody asked: What if we're unlucky and lose multiple iSCSI headers in a
TCP window?
Well, you have to buffer proportional to the number of headers that you
lose. Also, the
sending rate decreases quite a bit.

There was some discussion of whether this enhanced framing would force TCP
to deliver
out-of-order. The answer is no: this architecture does enhanced data
placement.
TCP semantics need to be observed by any implementation.  This is a
difference between
data placement and data delivery.  Data delivery is still done in-order,
according
to the rules of TCP.  The ULP is not aware that out of order data has
arrived.  Correct
implementations will not deliver data out of order.  Note:  The memory the
NIC is placing
the data in is owned by the NIC.

Mark Bakke said that this would be a good time to treat data integrity as
well as
framing. The protocols that want data integrity are CIFS and NFS; these are
the same
that want greater reliability from a CRC.  Recommended having a SCSI data
level CRC;
customers will be looking for this from a file level as well.  May be
opportunity
to put in a HW implemented CRC.  David Black suggested that Mark Bakke send
out a
draft on this matter.

There was some confusion as to why the TCP header is not sufficient.  It was
pointed out
that multiple simultaneous SCSI transaction are placed on a single TCP
connection so
headers and data are mixed on a single TCP connection and sequence numbers
do not
a-priori indicate what is data and what is header.

Buffer offset question - the iSCSI protocol packet classifier (or filter) is
placing the
data, not TCP.

Steve Bellovin asked for a hum of the room on whether to solve the "framing
problem" in
an iSCSI-specific  way or whether to pursue a mechanism to add to TCP. The
hum in the
room was to do it in TCP.


ISCSI document review - presented by Julian Satran.

- Rough consensus has been reached on the session model - Symetric with
optional
	multiple connections.
- Login Session context - good understanding.
- Login Security context - more work needed.
- Commands, messages, tasks, and tags almost complete.  Items open - coding,
some layout.
- Response numbering scheme is well understood; complete.
- The data numbering scheme has received no consensus.  It may be removed.
Julian's
	personal opinion is that it's optional and low cost with advantages.
- For recovery, command restart and status well understood.  No consensus on
	data recovery.  Digest not well understood; needs to be readdressed.
- Text commands - negotiation mechanisms done.
- Mapping moved to T10 (aliasing).  Dropped from iSCSI.
- RDMA/Sync, Security/Authentication - all are still open issues.
- Authentication - login phase must provide authentication. This was the
consensus
	at the last meeting.  Every iSCSI PDU must provide data integrity
and
	authentication.
- A mechanism should enable optional end2end data protection/authentication.
Would like
	to use TCP  recovery in presence of error.  Digests can be activated
at a higher
	level.  Need a mechanism that can be activated on demand, ideally at
login.
- The current digest scheme needs to be changed.  Julian suggested using
IPSec for data
	integrity, since all the above mechanisms are provided by IPSec, it
is a best fit
	for what is needed and very cheap if use only what is needed.  Can
insert own
	policies, including policies that will verify integrity verses
provide security
	but use same mechanisms.  Policies will be addressed in next two
weeks.
- David:  IPSec does negotiation securely.  What is currently in the draft
is most
	likely vulnerable to man-in-the-middle attack.
- Steve Bellovin indicated that the IPSec WG would be extremely opposed to
any insecure
	non-cryptographic algorithm being defined for IPSec.  Silicon must
support SHA-1
	or MD5 in order to do key negotiation.  There are active
discussions/proposals
	on how to do high speed encryption/negotiation.  Early in process;
drafts not
	yet standards, but worth looking at this.
- Mark Bakke really wants to maintain the separate iSCSI header/iSCSI
payload digests.
	This separation is lost by moving to IPSec.  Gained data integrity
is only as
	good as the group is willing to pay.  Good integration with
encryption. 
- Can use IPSec in transport mode, which will provide end2end protection.
Integrity is
	required end2end, but security may not be.  Security may need to be
removed at the
	firewall/gateway, but need to still be able to verify integrity at
the endpoints.
	Can have multiple layers of IPSec if needed.  Comment from audience
- not
	recommended.
- David Peterson of Cisco asked whether ACA will be mandated by the draft.
The
	consensus, after the discussion, is that iSCSI must support ACA but
that a
	device need not support ACA (Ralph Weber pointed out that few
initiator use ACA
	today). There was some grumbling because ACA is needed for reliable
pipelining
	of ordered commands in the face of errors.

- There was a question on whether asynchronous event notification (AEN) was
mandatory
	to implement in iSCSI. Again, iSCSI transports must support
asynchronous events
	but iSCSI devices need not. Somebody pointed out that SCSI mode
pages can be used
	to regulate whether a device generate AENs.

- Ralph Weber of T10 praised iSCSI for trying to advance the state of the
art in SCSI.

-- iSCSI requirements --- presented by Marjorie Krueger 

Doug Otis asked whether the T10 work on authorization was going to be
integrated
into iSCSI. David Black  said that the documents won't be integrated into a
single
text. SCSI provides authorization, try to leave to T10.  Randy Haagens
pointed out
that SCSI/T10 is not quite there on privacy, authorization and
authentication so we
have to do our own mechanisms. Also, since iSCSI introduces the
authentication
problem (by running SCSI over IP networks), iSCSI is the appropriate place
to fix
it.  T10 work will be referenced where applicable.

It was noted that the point of iSCSI authentication and authorization was to
control
who was able to get to a target.

-- Bootstrapping  -- presented by Prasenjit Sarkar

This document contains guidelines for how iSCSI boot clients connect to
iSCSI boot
server.  Included description of how to use existing techniques.  iSCSI boot
clients
need IP address, iSCSI boot server service delivery port name, default; LUN
= 0;
iSCSI initiator software.

Boot process steps:
			Client software stage
				Use PXE or related bootp/tftp protocol to
get iSCSI
					initiator software
			DHCP stage
				Use DHCP to configure client IP address
				Use new DHCP option to configure iSCSI boot
server
					service delivery port name
			Discovery server stage
				Use "to be defined" iSCSI delivery service
to get iSCSI 

There was a question on whether the boot client had to have IPsec, in light
of the
integrity proposal by Julian and security proposals by others. Prasenjit
answered
that it was not required; you just need bootp.

Mark Carlson noted that he didn't see any requirements for security in the
boot
process. He pointed out that booting from disk is a security-critical
operation
in many environments. Prasenjit countered that the boot stuff doesn't
disallow security.

There was some question on what to do with the iSCSI session once a
bootstrap program
was done with it. It  was noted that it was probably simplest to close it
and have
the loaded program establish a new iSCSI session.

-- MIB presentation - Mark Bakke

A group forming to work on iSCSI MIB.  An initial stab, via SNMP, taken.
Manage iSCSI portion - iSCSI only, not SCSI 'stuff'.  If needed, separate
SCSI MIB,
if does not already exist, needs to be addressed separately.

Original MIB structure not adequate, being redone.  Also reflects older
version of
iSCSI draft.
	
Kevin (Nishan) - Has the MIB group looked into zoned environment support,
similar to FC?  
Mark indicated that he had not looked at this.  Where does zoning fit into
iSCSI architecture, if at all?

Where is MIB running?  Could be anything running iSCSI including initiator,
target,
gateway.

FC HBA API available from SNIA, might be of interest to this group.  It has
a
complete list of things management  tools want to be able to see out of an
initiator.


----- Tuesday, December 12, 2000

-- Naming and Discovery Requirements - Mark Bakke, Cisco

Mark said that the naming and discovery would specify target discovery but
it
would leave LUN discovery to SCSI mechanisms, such as REPORT LUNs. There was
a bit
of debate on this; why not go all the way and support LUN discovery in the
naming
system?  Some people countered with a layering argument: "Leave unto SCSI
what
is SCSI's".  

Scaling requirements include both small and large environments.
Find targets by querying SNS.  Small environments do not require SNS.
Hierarchical format, with Naming Authority.

World Wide Unique Identifier
Address composed of IP addr+TCP port+Target Name, URL like.
Plan to apply for well known port for TCP.  In such a case, an address w/o
TCP specified would default to this well known port.

Format includes info on naming authority, including support for 'local'
naming
authority.

Character set to be allowed?  Unicode?
Recommend UI schemes for naming authority.
Need to look at security issues.

T10 issues - reservations, reset, LUN naming
Target reset discussion.  Noted that T10 is thinking of making target reset
optional.

Is breaking of a connection in iSCSI equivalent to a target reset?
Consensus is
no: the end of a session was equivalent to a target reset and would also
cause any persistent reservations to be released.

Naming scheme will allow multiple port and multiple initiator/target
discovery.
Will give list of targets + all paths to that target.

Draft currently an individual submission - concensus (hum) taken, to be
adopted
as working group document.  No opposition hums.

-- iSNS document presented by Josh Tseng, Nishan

ISNS describes a scalable information facility for registration, discovery
and
manament of networked facilities.

ISNS follows a client/server architecture.  If client registers with name
server,
allows itself to be managed by the name server.

Why needed? Simplifies storage management implementations.  Allows greater
scalability
over broadcast/multicase discovery methods.  Supports zoning.

Next step - incorporate requirements/suggestions from IPS working group.
Extend document for FCIP

Access control - what is name server role?  Targets upload public key to
name server.
Enforced at the end node/target.  Supports both soft and hard zoning.

How does it fit into discovery.  Naming and discovery team will look at this
to see
how well it fits.  Should this be maintained as a separate document vs
incorporated
into naming/discovery team.

In reading the draft, reliance on WWN.  What do you do about devices which
do not have
WWN/don't want WWNs.  Work done prior to n&d requirements document.  This
draft would
need to be redone to support WWUI of n&d requirements.

Direction is one in which naming and discovery team approves of? Yes, close.

Is there working group concensus as a base document; working w/ NDT group to
produce
a revised document, aligned with N&D, which would then be adopted as an
official wg
document.  Rough concencus - next revised version of document will become an
official working group document.  Not unanimous.

-- FCIP - Status and progress of FCIP. - Raj Bhagwat

Current status - difference from previous presentation 
Solution for bridging remote FC SAN islands. From FC point of view, appears
tobe entirely
an FC network.  Initially did not have congestion management (last
presentation).
Draft overhauled to incorporate TCP as transport in order to address
congestion
management and recovery mechanisms.  In rev -00, PSH flag incorporated.
Based on
feedback from mailing list, this was eliminated and in -01, a new frame
boundary
mechanism introduced.  Topics under discussion -- QOS, security, MTU/MSS,
Framing/synchronization, order of delivery, discovery, error recovery.

Alignment with new project in T11 - FC-BB2.  FC-BB2 focused on issues
outside the scope
of the IETF, including link level issues.  Target date for completion - June
2001.

David Robinson complained that much FC/IP work is done on conference calls.
He asked
that these conference calls be made public so as to allow broader
participation.  Conference calls are design team calls open to design team
members and authors.
Public review on mailing lists.

What is an FCIP device - a gateway between an FC SAN and IP network.
Discovery of
FCIP gateway (device) of other FCIP gateways.  How do gateways discover each
other?
Currently in spec is static configuration.  Dynamic configuration support is
envisioned, perhaps using iSNS.

David to work with authors offline on QoS text.

-- iFCP - presented by Charles Monia, Nishan

What is the difference between iFCP and FCIP?

FCIP is a tunneling model between FC SANs.  A conduit for FC frames to flow
transparently to FC network over IP backbone.

IFCP network model extends up to the FC storage device itself.  Uses a
session model.
Consolidates FC storage switching and routing functions in the IP fabric.
Reduce
total cost of ownership, unify network and storage management domains and
exploit IP
technology investment.  Extend SAN over lan/man/wan distances.

Next step -- complete the n_port session model.  Encapsulation changes for
additional
end-to-end error detection.

The authors of iFCP would like to see it considered for adoption as a work
group item.
Adoption of iFCP as a work group item requires modification to the WG
charter.  David
requested input on this be set to the WG chairs.  Revising of the charter
requires
consultation of the area directors and working group chairs.

After the presentation, Julian Satran suggested that iFCP and FC/IP should
merge since
they are so similar. Others agreed with Julian. Charles Monia countered that
they would
be difficult to merge because they take different approaches.  iFCP works by
intercepting FC logins (connection requests) and modifying FC frames. In
addition, it
doesn't run FC routing protocols between FC SANs.

Clarification of FCIP and iFCP - the latter is for FCP protocol mapping
only, whereas
FCIP can transport any FC upper level protocol.

FC/IP works at a lower level than iFCP. It doesn't modify FC frames.

FC/IP requires running FC routing/switching protocols between FC domains.

Some thought that iFCP was a superset of FC/IP.

Somebody was concerned that the iFCP gateway would need to run IP routing
protocols.
It was eventually decided the iFCP gateway was just an IP host and didn't
have to run
IP routing protocols.

	Other comments need to sent to mailing list or chairs directly.

-- Adaptation Layer presentation -- Randall Stewart, Cisco

Randall Stewart's presentation introduced how the IPS protocols could be
architected
an adaptation layer independent of the underlying transport (i.e. at least
both SCTP
and TCP).

To do this, a uniform API boundary between the ULP and transport would need
to be
defined.  This would require many changes to all existing drafts.  APIs
would need
to be a message oriented type of mechanism.  Critical path would need to be
done so
that they would be protocol agnostic.

Transport interface would need to provide methods for passing buffers
to/from control
of transport, e.g. for zero copy.

	Adaption layer would need to worry about 
		Framing
		Zero copy
		Parallel paths
		Message retrieval
		Notifications
	
Must be very careful that this API would not make assumptions about the
transport
being used.  In adaptation model, would need to figure out how to overcome
the issues.

Julian Satran though this was a good way to proceed and would like to see
Randall
write up a section on this for the draft.  Randall would be more than glad
to help
by contributing both advice and/or drafts.

Randy Haagens thought that the adaptation layer would add too many layers
between
iSCSI and TCP and that separate protocol should be done for SCTP.

Steph Bailey suggested that the CAM may be an inspiration for the adaptation
layer.
Others responded that the CAM is at the wrong layer, above iSCSI.
Follow-Ups:
- RE: Draft San Diego minutes
  - From: "Douglas Otis" <dotis@sanlight.net>
Prev by Date: A memo on some checksums
Next by Date: FCIP QoS text
Prev by thread: A memo on some checksums
Next by thread: RE: Draft San Diego minutes
Index(es):
- Date
- Thread
Home
Last updated: Tue Sep 04 01:05:50 2001
6315 messages in chronological order