Figure 1 illustrates the main alternatives for storage architecture. The simplest organization, the local filesystem (1), aggregates an application, file management (naming, directories, access control, concurrency control) and low-level storage management. Disk data makes one trip over a simple peripheral area network such as SCSI or Fibrechannel and disks offer a fixed-size block abstraction. Stand-alone computer systems use this organization.
To share data more effectively among many computers, an intermediate server machine is introduced (2). If the server offers a simple file access interface to clients, the organization is known as a distributed filesystem. If the server processes data on behalf of the clients, this organization is a distributed database. In this organization, data makes a second network trip to the client and the server machine can become a bottleneck, particularly since it usually serves large numbers of disks.
To transparently improve storage bandwidth and reliability, many systems interpose another computer, such as a RAID controller. This organization (3) adds another peripheral network transfer and store-and-forward stage for data to traverse.
Provided that the distributed filesystem is reorganized to logically “DMA” data rather than copy it through its server, a fourth organization (4) reduces the number of network transits for data to two. This system also applies where clients are trusted to maintain filesystem metadata integrity and implement disk striping and redundancy. In this case, client caching of metadata can reduce the number of network transfers for control messages and data to two. Moreover, disks can be attached to client machines which are presumed to be independently paid for and generally idle. This eliminates additional store-and-forward cost, if clients are idle, without eliminating the copy itself.
In (5), the NASD architecture embeds the disk management functions into the device and offers a variable-length object storage interface while file managers enable repeated client accesses to specific storage objects by granting a cachable capability. Therefore, all data and most control travels across the network once and there is no expensive store-and-forward computer. Using an object interface in storage rather than a fixed-block interface shifts data layout management to the disk. Also, NASD partitions are variable-sized groupings of objects, not physical regions of disk media, enabling the total partition space to be managed easily, in a manner similar to virtual volumes or virtual disks. We also believe that specific implementations can exploit NASD’s uninterpreted filesystem-specific attribute fields to respond to higher-level capacity planning and reservation systems such as HP’s attribute-managed storage.
To offer disk striping and redundancy for NASD, we layer the NASD interface. In this organization (6), a storage manager replaces the file manager’s capability with a set of capabilities for the objects that actually make up the high-level striped object. This costs an additional control message but once equipped with these capabilities, clients again access storage objects directly. Redundancy and striping are done within the objects accessible with the client’s set of capabilities, not the physical disk addresses.