Internet DRAFT - draft-gibson-pnfs-reqs
Network Working Group G. Gibson
Internet-Draft Panasas Inc. & Carnegie Mellon
Expires: April 18, 2005 B. Welch
Panasas Inc.
G. Goodson
P. Corbett
Network Appliance Inc.
October 18, 2004
Parallel NFS Requirements and Design Considerations
draft-gibson-pnfs-reqs-00.txt
Status of this Memo
This document is an Internet-Draft and is subject to all provisions
of section 3 of RFC 3667. By submitting this Internet-Draft, each
author represents that any applicable patent or other IPR claims of
which he or she is aware have been or will be disclosed, and any of
which he or she become aware will be disclosed, in accordance with
RFC 3668.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 18, 2005.
Copyright Notice
Copyright (C) The Internet Society (2004).
Abstract
This draft specifies the requirements that should be satisfied in the
definition of a parallel NFS protocol and the considerations
recommended for its designs. It responds to the scalable bandwidth
Gibson, et al. Expires April 18, 2005 [Page 1]
Internet-Draft pNFS Requirements and Design Considerations October 2004
problem described in the pNFS Problem Statement,
draft-gibson-pnfs-problem-statement-01.txt. In the interest of a
timely adoption of scalable bandwidth file service, parallel NFS is
proposed to be a NFSv4 minor extension for communicating file layout
available through existing and future storage subsystem protocols
such as other NFSv4 file servers (NFS), block-based SCSI subsystems
(SBC), and object-based SCSI (OSD) subsystems.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. NFSv4 Minor Extension . . . . . . . . . . . . . . . . . . . . 5
3. Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Scalable Bandwidth . . . . . . . . . . . . . . . . . . . . 6
3.2 Scalable Capacity . . . . . . . . . . . . . . . . . . . . 6
4. Interoperability . . . . . . . . . . . . . . . . . . . . . . . 7
4.1 NFSv4 Interoperability . . . . . . . . . . . . . . . . . . 7
4.2 Storage Protocol Interoperability . . . . . . . . . . . . 7
4.3 Separability of Storage Protocols . . . . . . . . . . . . 7
5. Concurrent Sharing . . . . . . . . . . . . . . . . . . . . . . 8
5.1 Shared Direct Access to Storage . . . . . . . . . . . . . 8
5.2 Attribute Updates . . . . . . . . . . . . . . . . . . . . 8
5.3 Client caching . . . . . . . . . . . . . . . . . . . . . . 8
6. Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7. Security Considerations . . . . . . . . . . . . . . . . . . . 11
7.1 File Storage Access Protocols . . . . . . . . . . . . . . 11
7.2 Object Storage Access Protocols . . . . . . . . . . . . . 11
7.3 Block Storage Access Protocols . . . . . . . . . . . . . . 11
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 13
Intellectual Property and Copyright Statements . . . . . . . . 15
Gibson, et al. Expires April 18, 2005 [Page 2]
Internet-Draft pNFS Requirements and Design Considerations October 2004
1. Introduction
In many application areas, single system servers are rapidly being
replaced by clusters of inexpensive commodity computers. As
clustering technology has improved, the barriers to running
application codes on very large clusters have been lowered. Examples
of application areas that are seeing the rapid adoption of scalable
client clusters are data intensive applications such as genomics,
seismic processing, data mining, content and video distribution, and
high performance computing. The aggregate storage I/O requirements
of a cluster can scale proportionally to the number of computers in
the cluster. It is not unusual for clusters today to make bandwidth
demands that far outstrip the capabilities of traditional file
servers. A natural solution to this problem is to enable file
service to scale as well, by increasing the number of server nodes
that are able to service a single file system to a cluster of
clients.
Scalable bandwidth can be claimed by simply adding multiple
independent servers to the network. Unfortunately, this leaves to
file system users the task of spreading data across these independent
servers. Because the data processed by a given data-intensive
application is usually logically associated, users routinely
co-locate this data in a single file system, directory or even a
single file. The NFSv4 protocol currently requires that all the data
in a single file system be accessible through a single exported
network endpoint, constraining access to be through a single NFS
server.
A better way of increasing the bandwidth to a single file system is
to enable access to be provided through multiple endpoints in a
coordinated or coherent fashion. Separation of control and data
flows provides a straightforward framework to accomplish this, by
allowing transfers of data to proceed in parallel from many clients
to many data storage endpoints. Control and file management
operations, inherently more difficult to parallelize, can remain the
province of a single NFS server, inheriting the simple management of
today's NFS file service, while offloading data transfer operations
allows bandwidth scalability. Data transfer may be done using NFS or
other protocols, such as iSCSI, under the control of an NFSv4 server
with parallel NFS extensions. Such an approach protects the
industry's large investment in NFS, since the bandwidth bottleneck no
longer needs to drive users to adopt a proprietary alternative
solution, and leverages SAN storage infrastructures, all within a
common architectural framework.
This document sets requirements for extensions to the NFSv4 protocol,
the parallel NFS extensions, to enable the extended NFSv4 server to
Gibson, et al. Expires April 18, 2005 [Page 3]
Internet-Draft pNFS Requirements and Design Considerations October 2004
manage clients that are enabled to directly access storage.
Gibson, et al. Expires April 18, 2005 [Page 4]
Internet-Draft pNFS Requirements and Design Considerations October 2004
2. NFSv4 Minor Extension
This document includes the definition of the requirements for
protocol extensions to implement Parallel NFS.
It is believed that this extension can fit within the
minor-versioning of the NFSv4 protocol framework presented in RFC
3050. NFSv4's minor-versioning requirement specifies that no changes
are to be made to an existing operation's arguments or results (with
the exception of GETATTR4). Also, new operations may only be added
to the COMPOUND and CB_COMPOUND procedures.
Minor-versioning also requires that the Parallel NFS extension is
compatible with all preceding NFSv4 minor versions. Accordingly,
until a minor extension is accepted, its requirements may be impacted
by the approval of another minor extension, although an impact like
this by one minor extension on another is typically to be avoided.
Gibson, et al. Expires April 18, 2005 [Page 5]
Internet-Draft pNFS Requirements and Design Considerations October 2004
3. Scalability
3.1 Scalable Bandwidth
A principle purpose for parallel NFS is to enable clients of an NFS
service to achieve individual and aggregate file and file system
bandwidths that can scale with storage device, storage networking and
client resources. The core point in the parallel NFS problem
statement [1] is that bandwidth scaling is not provided by the
existing NFS approach of forwarding all data through a single network
endpoint associated with the NFS file server.
Parallel NFS must enable high bandwidth access by single clients and
aggregates of clients, especially clusters of clients, into one file
system, into possibly small and arbitrary collections of files, and
into just one file.
Moreover, a parallel NFS solution for scalable bandwidth must enable
an NFS client to directly and in parallel access a file, possibly
small and arbitrary collection of files or a file system that is
spread over multiple distinct network endpoints. That is, it must be
possible for single files and collections of related files to be
"striped" over physically different storage subsystems each with its
own network endpoint.
3.2 Scalable Capacity
Parallel NFS must enable the capacity of a single file, a possibly
small and arbitrary collection of files and a single file system to
grow in proportion to the available storage resources.
This reflects a recognition that when bandwidth scales, the size of
the file(s) accessed should be expected to grow proportionately, and
that striping over network endpoints is not required to be effective
with arbitrarily small amounts of data residing at a single network
endpoint.
This requirement does not supersede file and file system limitations
on the size of an individual file or file system.
Gibson, et al. Expires April 18, 2005 [Page 6]
Internet-Draft pNFS Requirements and Design Considerations October 2004
4. Interoperability
4.1 NFSv4 Interoperability
Parallel NFS is a optional minor extension of NFSv4. Accordingly,
any client capable of using the parallel NFS extensions must also be
able to interoperate with an NFSv4 server that is not capable of
using the parallel NFS extensions, and any NFSv4 server that is
capable of using the parallel NFS extensions must also be able to
provide full service for an NFSv4 client that is not capable of using
the parallel NFSv4 extensions.
4.2 Storage Protocol Interoperability
The protocols used by parallel NFS capable clients to directly access
storage must be well defined, standards-based storage protocols.
In the interest of wider applicability of parallel NFS, the
extensions to NFSv4 that enable and manage a client's opportunity to
directly access storage subsystems must be agnostic to actual storage
protocol employed, and that it be possible for new storage protocols
to be added to the set that a parallel NFS server supports.
It is anticipated that parallel NFS storage protocols will be defined
using (possibly) non-parallel NFSv4 as a storage protocol, using
block-based SCSI (SBC) as a storage protocol and using object-based
SCSI (OSD) as a storage protocol. SBC and OSD SCSI storage
protocols, in at least some implementations, are anticipated to
employ an iSCSI storage transport protocol.
4.3 Separability of Storage Protocols
The interpretation of a layout, the bits a parallel NFS server gives
to a parallel NFS client to enable the client to know how and where
to directly access a file or file system striped over multiple
storage network endpoints, is not needed for correct execution of the
parallel NFS extension operations.
At least one instance of a parallel NFS layout format and storage
access protocol must be fully specified and multiply implemented.
Gibson, et al. Expires April 18, 2005 [Page 7]
Internet-Draft pNFS Requirements and Design Considerations October 2004
5. Concurrent Sharing
5.1 Shared Direct Access to Storage
The parallel NFS extension should support shared access to storage by
many clients. This includes access to the same storage devices by
multiple clients, as well as access to the same files stored on one
or more storage devices. The result extends the basic shared file
system abstraction provided by NFS giving clients direct access to
storage devices under the overall control of an NFS server
responsible for authorizing such direct access and delimiting its
scope and duration.
The parallel NFS extension should allow clients to specify points in
time at which updates must be made visible to other clients. This
requirement is more conducive to optimizations that can lead to high
performance. It also complements the programming model used by
parallel applications.
In this model, individual clients compute independently, generate
results, and then synchronize with the overall computation. When
storing results to shared storage, it may be necessary to communicate
with the NFS server to ensure that updates are visible to other
clients. When making these updates visible, it is important for
efficiency to limit the need for separate interactions with the
server to those points that are truly required by the demands of the
application.
5.2 Attribute Updates
File updates include changes to associated attributes that include
the file size (i.e., end-of-file position), file modify time, file
access time, and file change time. The parallel NFS extension allows
that updates to these attributes follow the same model as data
updates where updates are only guaranteed to be visible to other
clients in response to explicit operations performed by the modifying
client. The values of these attributes at other times may not be
strictly defined.
The parallel NFS extension acknowledges that some implementations may
provide looser semantics for file access time. As well, the
extension does not mandate strict implementation of the file access
time attribute.
5.3 Client caching
The parallel NFS extension does not address issues around client
caching and the coherency of data stored in different client caches.
Gibson, et al. Expires April 18, 2005 [Page 8]
Internet-Draft pNFS Requirements and Design Considerations October 2004
The extension assumes that the existing mechanisms that NFS clients
use to manage their cached data apply equally when they use parallel
NFS. Likewise, the this extension should not prevent the
implementation of a richer/stronger set of caching and coherency
semantics.
Gibson, et al. Expires April 18, 2005 [Page 9]
Internet-Draft pNFS Requirements and Design Considerations October 2004
6. Recovery
Error recovery is often the most difficult aspect of a protocol to
achieve interoperability. For this reason these requirements place
the most stringent demands on parallel NFS servers. But in the
interests of performance and scalability, these requirements leave it
open for client implementations to more fully participate in error
recovery.
Specifically, it should be possible for client implementations using
parallel NFS extensions to have very simple recovery actions, albeit
probably lowered performance, when coping with errors on the storage
access protocols.
Simple clients are envisioned to respond to storage access protocols
by immediately notifying the managing parallel NFS server of the
error. Upon completion of the NFS server's recovery, simple clients
should be able to complete the action causing the error by
re-execution. To make this especially simple, it must be possible
for a simple parallel NFS client to re-execute using only NFSv4
operations.
As a consequence of this recovery model, an operation, composed of
one ore more component actions, applied by parallel NFS clients
directly on storage must be idempotent at the client level. This is
not a requirement for atomicity or transactions of the storage access
protocol, only that it be possible to re-execute the client-level
operation that experienced error, possibly using different component
operations directly on storage or through the parallel NFS server,
and achieve the same transformation on stored information.
Gibson, et al. Expires April 18, 2005 [Page 10]
Internet-Draft pNFS Requirements and Design Considerations October 2004
7. Security Considerations
The parallel NFS extension must provide a level of security that is
comparable to that defined in the NFSv4 specification. NFSv4
mandates end to end mutual authentication. All existing NFSv4
security mechanisms apply to the operations introduced by the
parallel NFS extension. In all cases, this extension allows use of
the direct NFSv4 path of sending both metadata and data requests
through the metadata server.
The security model provided by all specified parallel NFS storage
access protocols must be well documented. Various storage access
protocols will have different security mechanisms that protect
against different types of attacks. Access protocols that rely on
trusted environments should not be foreclosed. However, protocols
that provide strong security guarantees will be available.
7.1 File Storage Access Protocols
A file storage access protocol may have the same security mechanism
between the client and metadata server as between the client and data
server. ACLs set at the metadata server are effective at the data
servers and need not be visible (via getattr) at the data servers.
7.2 Object Storage Access Protocols
An object storage access protocol may rely on a cryptographically
secure capability to control accesses at the data servers. These
capabilities can be generated by the metadata server after it checks
access control for a client. They are returned to the client and
passed to the object storage device, which verifies that the
capability allows the requested operation.
7.3 Block Storage Access Protocols
A block storage access protocol would rely on SAN-based security, and
the trust that clients will only access the blocks they have been
directed to use. There are LUN masking/unmapping and zone-based
security schemes that can be manipulated to fence clients from each
other's data. Block storage access protocols may provide no
guarantee of data integrity, since any client can modify any data
block to which it has physical access.
Gibson, et al. Expires April 18, 2005 [Page 11]
Internet-Draft pNFS Requirements and Design Considerations October 2004
8. IANA Considerations
The parallel NFS protocol extension provides for the naming of the
specific storage access protocol. The storage access protocol's name
is used by the client to interpret the layout information it receives
from the metadata server. As well, the name specifies the storage
access protocol to be used for accessing the data servers.
The namespace is separated into (at least) three ranges. First, a
range of names reserved for future standards-based storage protocol
specifications (e.g., a block, file, and object storage protocol
standard). Second, a range of names reserved for vendor proprietary
protocols. Third, a range of names that are reserved for
non-approved protocols (e.g., custom in-house protocols or for
testing).
Similar to NFSv4 named attributes, the parallel NFS protocol does not
define the specific assignment of names to storage access protocols
(nor does it define any specific storage access protocols). However,
an IANA registry should be created for the registration of names in
order to prevent collisions within the namespace. Along with the
name, the format of the data layout and the storage access protocol
should be well defined. The goal is to promote the interoperability
of parallel NFS clients and servers.
Gibson, et al. Expires April 18, 2005 [Page 12]
Internet-Draft pNFS Requirements and Design Considerations October 2004
9. Acknowledgements
Many members of the pNFS informal working group have helped
considerably. The authors would like to thank Andy Adamson, David
Black, Gary Grider, Benny Halevy, Dean Hildebrand, Peter Honeyman,
Dave Noveck, Julian Satran, and Tom Talpey.
10 References
[1] Gibson et. al, "pNFS Problem Statement", July 2004,
<ftp://www.ietf.org/internet-drafts/
draft-gibson-pnfs-problem-statement-01.txt>.
Authors' Addresses
Garth Gibson
Panasas Inc. & Carnegie Mellon
1501 Reedsdale Street
Pittsburgh, PA 15233
USA
Phone: +1 412 323 3500
EMail: ggibson@panasas.com
Brent Welch
Panasas Inc.
6520 Kaiser Drive
Fremont, CA 94555
USA
Phone: +1 510 608 7770
EMail: welch@panasas.com
Garth Goodson
Network Appliance Inc.
495 East Java Drive
Sunnyvale, CA 94089
USA
Phone: +1 408 822 6847
EMail: goodson@netapp.com
Gibson, et al. Expires April 18, 2005 [Page 13]
Internet-Draft pNFS Requirements and Design Considerations October 2004
Peter Corbett
Network Appliance Inc.
375 Totten Pond Road
Waltham, MA 02451
USA
Phone: +1 781 768 5343
EMail: peter@pcorbett.net
Gibson, et al. Expires April 18, 2005 [Page 14]
Internet-Draft pNFS Requirements and Design Considerations October 2004
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2004). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Gibson, et al. Expires April 18, 2005 [Page 15]