|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Draft San Diego minutesHere's the initial draft of the San Diego minutes. Comments need to be made in the next few days, as the final version for the proceedings will probably be sent in on Friday. Thanks, --David ---------- IPS Meeting Minutes, Monday December 11, 2000 EMC will be sending out an IPR notice regarding a patent related to iSCSI and FCIP. David will be sending the information to the IPS mailing list this week. Interim meeting being scheduled for week of January 15, to coincide with T10 in Orlando - Grosvenor resort. -- Framework document - Mark Carlson. Describes environments for IP Storage. Includes terms, background on various protocols. This is a living document. Currently more of a survey. This document will coordinate with Naming and Discovery. Looking for more co-authors, please contact Mark if you are interested. -- Framing discussion -- Randy Haagens and Allyn Romanow - Allyn and Randy were asked to compose this presentation by the ADs. Purpose was to try to clarify the problem and present a range of solutions. - Framing is a common challenge with for both iSCSI, FCIP as well as non IPS documents. While framing is not explicitly required, a solution for a more effective iSCSI specification is highly desirable. The focus of the presentation was understanding the requirements of framing (i.e. the problem). Reaching consensus on a solution was not one the goals of the presentation. Allyn started the presentation by pointing out that this topic will also be discussed on Monday night in the TSVWG. - The problem: TCP reassembly can be costly, and in some instances not feasible. Also, there is limited host memory and host bus bandwidth, so we want to avoid manipulating the data more than once. Best would be one use of the bus and memory - zero copy. Note: This is not the same as TCP zero copy. In TCP, typically wait for all the data to arrive, then copy data to host. - In outbound direction, data can be transferred directly from memory to the protocol controller and out onto the wire. In the inbound direction, when received out of order, requires data to be put in reassembly buffer until all data is received. - One solution: Direct Memory Placement (Payload steering; data steering; RDMA) -- In order to conserve host memory bandwidth, CPU cycles and reduce on-board memory requirements, it is desirable to deliver iSCSI data directly to host buffers, avoiding the overhead of TCP reassembly buffers. The TCP reassembly buffer can be 250MB for a 10Gbps link with 200ms round-trip time. At 1Gbps, reassembly is possible but very costly. But when get to 10Gbps speeds or above, reassembly is no longer feasible. So, the goal is to get rid of a separate TCP reassembly buffer. Can decode ULP (iSCSI) headers and place payload directly in host memory without intermediate buffers. This would not be a conventional NIC card; instead it would be very iSCSI aware, but it would not necessarily process the iSCSI headers, but just use them to determine where to place the data. As in TCP, the iSCSI stream is presented to the iSCSI protocol processor in-order. - In this solution, must address loss of ULP sync - when a segment containing a ULP header is dropped or delayed, ULP sync is lost. Direct data placement cannot continue; data must be diverted to a reassembly buffer. Goal is to recover ULP sync at the next ULP header. There are both TCP aware and TCP unaware solutions to recovering ULP sync. - TCP unaware approaches: a) SCTP - issues include not widely deployed b) Special Characters - requires byte by bytes processing c) Fixed length ULP messages - Inefficient for short ULP messages d) Periodic Marker - Best solution for this class of approaches Sublayer of a framing protocol. Managable; relatively easy to implement in hardware Marker 4 byte field number of ULP bytes remaining in current PDU. Marker inserted and removed by framing protocol; e.g. iSCSI. After loss of sync, locate next mearker; use to locate the next ULP PDU. Markers are transmitted twice in a row; ensures markers cannot be split by stream fragmentation/segmentation. - TCP aware Approaches a) URGent pointer - disallowed b) PSH bit - disallowed - Another TCP aware approach can be considered by the TSV working group. Allyn Romanow presented what the TSV working group works on. The working group works on small items in the transport area that do not need a full working group as well as TCP/UDP transport issues. - Allyn Romanow presented a technique for demarcating message boundaries using TCP option. This consists of using one of the reserved bits in the TCP header to extend TCP to support this type of framing. Then can add up to 40 bytes before the TCP payload. Problem is that these reserved bits are a scarce resource; need to evaluate the need for the change. Also any time a change to TCP is proposed, there is tension, e.g. tension between the need to update TCP and stability of TCP. - Procedure for standardizing a TCP option consists of a) The IESG has to approve new work items for the TSV wg. b) Ask the Transport Services (TSV) working group to adopt this as a WG item c) Pros-and cons will be discussed on the TSV wg mailing list. If it supported, hopefully the spec will be wrapped at the next IETF (roughly 3 month time frame). If no support, it's dead. The advantage of the TSV wg is that transport experts will be able to contribute feedback. d) If supported, will be adopted at next IETF meeting. Advantage is that people who are experts in transport will be able to contribute, and that this will not be an iSCSI specific solution. IPS should follow this process and contribute. Make sure that the solution (since not iSCSI specific) meets the needs of this group. This is a very common problem, that is worthy of consideration at the transport layer. Addresses areas beyond IPS Allison pointed out that TCP option is not the only approach. TCP header bits could potentially be used for framing. The flag approach may send many packets that are less than MSS. This is potentially a risky change to TCP. - Message Boundary Option Two approaches. Not drafts, very introductory. Flag approach -- Costa has written up; will post as draft. The flag approach may send many packets that are less than MSS. This is potentially a risky change to TCP. ULP header is aligned with first byte of TCP payload. Offset Approach -- 4 bytes. 2 byte offset indicates offset into TCP payload of first ULP header in the segment. Write-up forthcoming. Discussion - Lead by Steve Bellovin Steve Requested the group concentrate on Requirements Somesh Gupta (HP) proposed another option -- periodic alignment instead of periodic marker. There could be a requirement in iSCSI that an upper-layer header appear every n kbytes in the TCP stream. Padding could be used to make sure this happens. Requires API change to TCP/IP stack. Ed Cox indicated that this issue has originated multiple times in past. It needs to be a general case, not IPS specific. Randy/Allyn argues that this is the general case, in that ULP usually have own PDU size info. Somebody thought that you would need to encode multiple message boundaries in any TCP options or otherwise use just one upper layer PDU per TCP segment. This would either imply small packets or lots of overhead. The reply was that we don't need to identify every boundary. We can use the length fields in the ULP frames to find the next ULP header in the packet. Steph Bailey (Genroco) asked how do you handle losing the first part of the packet; don't you get into the same situation trying to avoid? Pointed out that the message boundary proposals don't address the issue of the message being very long. Can't have unbounded ULP, since if we do, and lose the ULP header, have to be prepared to buffer an unlimited amount of data in anonymous buffers . With max ULP, must buffer up to max length of ULP, and this must be a reasonable size. Julian Satran said that the mechanism should be generic enough that other ULPs can use it easily. Venkat seconded this suggestion, saying that we should also treat RDMA, VI, etc. in the proposal. Want to have something that would have wider application, so to reduce HW dev costs. Someone from Sun asked when when is the TCP option examined - only when there is a loss on the receive side, in order to recover. Question asked how does this relate to RDMA? Allyn response -- RDMA different but related. RDMA proposal either implicitly or explicitly addresses framing. May make sense to do a generalized RDMA protocol that would make use of this framing mechanism. There was consensus at the end of the discussion - that framing is best done at the transport layer and should be done generically. Modifying TCP vs not : Luciano Dalle Ore pointed out that deployment of a general solution, will take much longer to get a solution. Will not be mandatory. May run into interoperability problems. Options are truly options some will be able to support; others will not. If inband, can spec it now, possibly require it. Question asked based on past experience w/options in TCP, how long will this take to propagate? Allyn responded once defined, based on previous experience, how long to get procedure defined - a couple of months (by March, 2001); work over mailing list. Will not hold up iSCSI development effort. Deployment 1-2 years. Allison pointed out that there is motivation to get this done/deployed, so deployment could occur much quicker. When SACK done, was no motivation to get adopted. Question asked if this framing would be mandatory. David indicated it would probably not be. Asked about necessity of attending tonight's meeting. Allyn responded that the proposal will be discussed tonight, but much more discussion on reflector. Recommended IPS participants sign up for TSV reflector. Somebody asked: What if we're unlucky and lose multiple iSCSI headers in a TCP window? Well, you have to buffer proportional to the number of headers that you lose. Also, the sending rate decreases quite a bit. There was some discussion of whether this enhanced framing would force TCP to deliver out-of-order. The answer is no: this architecture does enhanced data placement. TCP semantics need to be observed by any implementation. This is a difference between data placement and data delivery. Data delivery is still done in-order, according to the rules of TCP. The ULP is not aware that out of order data has arrived. Correct implementations will not deliver data out of order. Note: The memory the NIC is placing the data in is owned by the NIC. Mark Bakke said that this would be a good time to treat data integrity as well as framing. The protocols that want data integrity are CIFS and NFS; these are the same that want greater reliability from a CRC. Recommended having a SCSI data level CRC; customers will be looking for this from a file level as well. May be opportunity to put in a HW implemented CRC. David Black suggested that Mark Bakke send out a draft on this matter. There was some confusion as to why the TCP header is not sufficient. It was pointed out that multiple simultaneous SCSI transaction are placed on a single TCP connection so headers and data are mixed on a single TCP connection and sequence numbers do not a-priori indicate what is data and what is header. Buffer offset question - the iSCSI protocol packet classifier (or filter) is placing the data, not TCP. Steve Bellovin asked for a hum of the room on whether to solve the "framing problem" in an iSCSI-specific way or whether to pursue a mechanism to add to TCP. The hum in the room was to do it in TCP. ISCSI document review - presented by Julian Satran. - Rough consensus has been reached on the session model - Symetric with optional multiple connections. - Login Session context - good understanding. - Login Security context - more work needed. - Commands, messages, tasks, and tags almost complete. Items open - coding, some layout. - Response numbering scheme is well understood; complete. - The data numbering scheme has received no consensus. It may be removed. Julian's personal opinion is that it's optional and low cost with advantages. - For recovery, command restart and status well understood. No consensus on data recovery. Digest not well understood; needs to be readdressed. - Text commands - negotiation mechanisms done. - Mapping moved to T10 (aliasing). Dropped from iSCSI. - RDMA/Sync, Security/Authentication - all are still open issues. - Authentication - login phase must provide authentication. This was the consensus at the last meeting. Every iSCSI PDU must provide data integrity and authentication. - A mechanism should enable optional end2end data protection/authentication. Would like to use TCP recovery in presence of error. Digests can be activated at a higher level. Need a mechanism that can be activated on demand, ideally at login. - The current digest scheme needs to be changed. Julian suggested using IPSec for data integrity, since all the above mechanisms are provided by IPSec, it is a best fit for what is needed and very cheap if use only what is needed. Can insert own policies, including policies that will verify integrity verses provide security but use same mechanisms. Policies will be addressed in next two weeks. - David: IPSec does negotiation securely. What is currently in the draft is most likely vulnerable to man-in-the-middle attack. - Steve Bellovin indicated that the IPSec WG would be extremely opposed to any insecure non-cryptographic algorithm being defined for IPSec. Silicon must support SHA-1 or MD5 in order to do key negotiation. There are active discussions/proposals on how to do high speed encryption/negotiation. Early in process; drafts not yet standards, but worth looking at this. - Mark Bakke really wants to maintain the separate iSCSI header/iSCSI payload digests. This separation is lost by moving to IPSec. Gained data integrity is only as good as the group is willing to pay. Good integration with encryption. - Can use IPSec in transport mode, which will provide end2end protection. Integrity is required end2end, but security may not be. Security may need to be removed at the firewall/gateway, but need to still be able to verify integrity at the endpoints. Can have multiple layers of IPSec if needed. Comment from audience - not recommended. - David Peterson of Cisco asked whether ACA will be mandated by the draft. The consensus, after the discussion, is that iSCSI must support ACA but that a device need not support ACA (Ralph Weber pointed out that few initiator use ACA today). There was some grumbling because ACA is needed for reliable pipelining of ordered commands in the face of errors. - There was a question on whether asynchronous event notification (AEN) was mandatory to implement in iSCSI. Again, iSCSI transports must support asynchronous events but iSCSI devices need not. Somebody pointed out that SCSI mode pages can be used to regulate whether a device generate AENs. - Ralph Weber of T10 praised iSCSI for trying to advance the state of the art in SCSI. -- iSCSI requirements --- presented by Marjorie Krueger Doug Otis asked whether the T10 work on authorization was going to be integrated into iSCSI. David Black said that the documents won't be integrated into a single text. SCSI provides authorization, try to leave to T10. Randy Haagens pointed out that SCSI/T10 is not quite there on privacy, authorization and authentication so we have to do our own mechanisms. Also, since iSCSI introduces the authentication problem (by running SCSI over IP networks), iSCSI is the appropriate place to fix it. T10 work will be referenced where applicable. It was noted that the point of iSCSI authentication and authorization was to control who was able to get to a target. -- Bootstrapping -- presented by Prasenjit Sarkar This document contains guidelines for how iSCSI boot clients connect to iSCSI boot server. Included description of how to use existing techniques. iSCSI boot clients need IP address, iSCSI boot server service delivery port name, default; LUN = 0; iSCSI initiator software. Boot process steps: Client software stage Use PXE or related bootp/tftp protocol to get iSCSI initiator software DHCP stage Use DHCP to configure client IP address Use new DHCP option to configure iSCSI boot server service delivery port name Discovery server stage Use "to be defined" iSCSI delivery service to get iSCSI There was a question on whether the boot client had to have IPsec, in light of the integrity proposal by Julian and security proposals by others. Prasenjit answered that it was not required; you just need bootp. Mark Carlson noted that he didn't see any requirements for security in the boot process. He pointed out that booting from disk is a security-critical operation in many environments. Prasenjit countered that the boot stuff doesn't disallow security. There was some question on what to do with the iSCSI session once a bootstrap program was done with it. It was noted that it was probably simplest to close it and have the loaded program establish a new iSCSI session. -- MIB presentation - Mark Bakke A group forming to work on iSCSI MIB. An initial stab, via SNMP, taken. Manage iSCSI portion - iSCSI only, not SCSI 'stuff'. If needed, separate SCSI MIB, if does not already exist, needs to be addressed separately. Original MIB structure not adequate, being redone. Also reflects older version of iSCSI draft. Kevin (Nishan) - Has the MIB group looked into zoned environment support, similar to FC? Mark indicated that he had not looked at this. Where does zoning fit into iSCSI architecture, if at all? Where is MIB running? Could be anything running iSCSI including initiator, target, gateway. FC HBA API available from SNIA, might be of interest to this group. It has a complete list of things management tools want to be able to see out of an initiator. ----- Tuesday, December 12, 2000 -- Naming and Discovery Requirements - Mark Bakke, Cisco Mark said that the naming and discovery would specify target discovery but it would leave LUN discovery to SCSI mechanisms, such as REPORT LUNs. There was a bit of debate on this; why not go all the way and support LUN discovery in the naming system? Some people countered with a layering argument: "Leave unto SCSI what is SCSI's". Scaling requirements include both small and large environments. Find targets by querying SNS. Small environments do not require SNS. Hierarchical format, with Naming Authority. World Wide Unique Identifier Address composed of IP addr+TCP port+Target Name, URL like. Plan to apply for well known port for TCP. In such a case, an address w/o TCP specified would default to this well known port. Format includes info on naming authority, including support for 'local' naming authority. Character set to be allowed? Unicode? Recommend UI schemes for naming authority. Need to look at security issues. T10 issues - reservations, reset, LUN naming Target reset discussion. Noted that T10 is thinking of making target reset optional. Is breaking of a connection in iSCSI equivalent to a target reset? Consensus is no: the end of a session was equivalent to a target reset and would also cause any persistent reservations to be released. Naming scheme will allow multiple port and multiple initiator/target discovery. Will give list of targets + all paths to that target. Draft currently an individual submission - concensus (hum) taken, to be adopted as working group document. No opposition hums. -- iSNS document presented by Josh Tseng, Nishan ISNS describes a scalable information facility for registration, discovery and manament of networked facilities. ISNS follows a client/server architecture. If client registers with name server, allows itself to be managed by the name server. Why needed? Simplifies storage management implementations. Allows greater scalability over broadcast/multicase discovery methods. Supports zoning. Next step - incorporate requirements/suggestions from IPS working group. Extend document for FCIP Access control - what is name server role? Targets upload public key to name server. Enforced at the end node/target. Supports both soft and hard zoning. How does it fit into discovery. Naming and discovery team will look at this to see how well it fits. Should this be maintained as a separate document vs incorporated into naming/discovery team. In reading the draft, reliance on WWN. What do you do about devices which do not have WWN/don't want WWNs. Work done prior to n&d requirements document. This draft would need to be redone to support WWUI of n&d requirements. Direction is one in which naming and discovery team approves of? Yes, close. Is there working group concensus as a base document; working w/ NDT group to produce a revised document, aligned with N&D, which would then be adopted as an official wg document. Rough concencus - next revised version of document will become an official working group document. Not unanimous. -- FCIP - Status and progress of FCIP. - Raj Bhagwat Current status - difference from previous presentation Solution for bridging remote FC SAN islands. From FC point of view, appears tobe entirely an FC network. Initially did not have congestion management (last presentation). Draft overhauled to incorporate TCP as transport in order to address congestion management and recovery mechanisms. In rev -00, PSH flag incorporated. Based on feedback from mailing list, this was eliminated and in -01, a new frame boundary mechanism introduced. Topics under discussion -- QOS, security, MTU/MSS, Framing/synchronization, order of delivery, discovery, error recovery. Alignment with new project in T11 - FC-BB2. FC-BB2 focused on issues outside the scope of the IETF, including link level issues. Target date for completion - June 2001. David Robinson complained that much FC/IP work is done on conference calls. He asked that these conference calls be made public so as to allow broader participation. Conference calls are design team calls open to design team members and authors. Public review on mailing lists. What is an FCIP device - a gateway between an FC SAN and IP network. Discovery of FCIP gateway (device) of other FCIP gateways. How do gateways discover each other? Currently in spec is static configuration. Dynamic configuration support is envisioned, perhaps using iSNS. David to work with authors offline on QoS text. -- iFCP - presented by Charles Monia, Nishan What is the difference between iFCP and FCIP? FCIP is a tunneling model between FC SANs. A conduit for FC frames to flow transparently to FC network over IP backbone. IFCP network model extends up to the FC storage device itself. Uses a session model. Consolidates FC storage switching and routing functions in the IP fabric. Reduce total cost of ownership, unify network and storage management domains and exploit IP technology investment. Extend SAN over lan/man/wan distances. Next step -- complete the n_port session model. Encapsulation changes for additional end-to-end error detection. The authors of iFCP would like to see it considered for adoption as a work group item. Adoption of iFCP as a work group item requires modification to the WG charter. David requested input on this be set to the WG chairs. Revising of the charter requires consultation of the area directors and working group chairs. After the presentation, Julian Satran suggested that iFCP and FC/IP should merge since they are so similar. Others agreed with Julian. Charles Monia countered that they would be difficult to merge because they take different approaches. iFCP works by intercepting FC logins (connection requests) and modifying FC frames. In addition, it doesn't run FC routing protocols between FC SANs. Clarification of FCIP and iFCP - the latter is for FCP protocol mapping only, whereas FCIP can transport any FC upper level protocol. FC/IP works at a lower level than iFCP. It doesn't modify FC frames. FC/IP requires running FC routing/switching protocols between FC domains. Some thought that iFCP was a superset of FC/IP. Somebody was concerned that the iFCP gateway would need to run IP routing protocols. It was eventually decided the iFCP gateway was just an IP host and didn't have to run IP routing protocols. Other comments need to sent to mailing list or chairs directly. -- Adaptation Layer presentation -- Randall Stewart, Cisco Randall Stewart's presentation introduced how the IPS protocols could be architected an adaptation layer independent of the underlying transport (i.e. at least both SCTP and TCP). To do this, a uniform API boundary between the ULP and transport would need to be defined. This would require many changes to all existing drafts. APIs would need to be a message oriented type of mechanism. Critical path would need to be done so that they would be protocol agnostic. Transport interface would need to provide methods for passing buffers to/from control of transport, e.g. for zero copy. Adaption layer would need to worry about Framing Zero copy Parallel paths Message retrieval Notifications Must be very careful that this API would not make assumptions about the transport being used. In adaptation model, would need to figure out how to overcome the issues. Julian Satran though this was a good way to proceed and would like to see Randall write up a section on this for the draft. Randall would be more than glad to help by contributing both advice and/or drafts. Randy Haagens thought that the adaptation layer would add too many layers between iSCSI and TCP and that separate protocol should be done for SCTP. Steph Bailey suggested that the CAM may be an inspiration for the adaptation layer. Others responded that the CAM is at the wrong layer, above iSCSI.
Home Last updated: Tue Sep 04 01:05:50 2001 6315 messages in chronological order |