Aggregating services onto shared infrastructures, rather than using separate physical resources for each, is a longstanding approach to reducing hardware and administration costs. It reduces the number of distinct systems that must be managed and allows excess resources to be shared among bursty services. Combined with virtualization, such aggregation strengthens notions such as service outsourcing and utility computing.
When multiple services use the same server, each obviously gets only a fraction of the server’s resources and, if continuously busy, achieves a fraction of its peak throughput. But, each service should be able to use its fraction of resources with the same efficiency as when run alone; that is, there should be minimal interference. The Argon storage server is designed to reduce interference between workloads, explicitly managing its resources to bound the inefficiency arising from inter-service disk and cache interference in traditional systems. In many cases, fairness or weighted fair sharing between workloads is also desired. The goal is to provide each service with at least a configured fraction (e.g., 0.9) of the throughput it achieves when it has the storage server to itself, within its share of the server—a service allocated 1/nth of a server should get nearly 1/nth (or more) of the throughput it would get alone. This fraction is called the R-value, drawing on the analogy of the thermal resistance measure in building insulation. With an R-value of 1.0, sharing affects the portion of server time dedicated to a service, but not the service’s efficiency within that portion. Additionally, insulation increases the predictability of service performance in the face of sharing.
Argon focuses on the two primary storage server resources, disk and cache, in insulating a service’s efficiency. Disk efficiency is the fraction of a request’s service time spent actually transferring data to or from the disk media, and cache efficiency can be viewed as the fraction of requests absorbed by the cache. It assumes that network bandwidth and CPU time will not be bottleneck resources. Given that assumption, a service’s share of server time maps to the share of disk time that it receives. Within that share of server time, a service’s efficiency will be determined by what fraction of its requests are absorbed by the cache and by the disk efficiency of those that are not. Argon’s fairness focus is on providing explicit shares of server time. To accomplish the complementary goals of insulation and fairness, the Argon storage server combines three techniques: aggressive amortization, cache partitioning, and quanta-based scheduling.
Detecting sequential streams and using sufficiently large prefetching/write-back ranges amortizes positioning costs to achieve the configured R-value of streaming bandwidth. Amortization refers to performing large disk accesses for streaming workloads, and is necessary in order to approach the disk’s streaming efficiency when sharing the disk with other workloads. However, there is a trade-off between efficiency and responsiveness. Performing very large accesses for streaming workloads will achieve the disk’s streaming bandwidth, but at the cost of larger variance in response time. Because the disk is being used more efficiently, the average response time actually improves. But, because blocking will occur as large prefetch or coalesced requests are processed, the maximum response time and the variance in response times significantly increase. Thus, the prefetch size should only be as large as necessary to achieve the specified R-value. Aggressive prefetching ensures quanta are effectively used for streaming reads without requiring a queue of actual client requests long enough to fill the time; to ensure that streaming writes efficiently use the quanta, Argon coalesces them aggressively in write-back cache space.
Cache partitioning refers to explicitly dividing up a server’s cache among multiple services and prevents any one service from squeezing out others. But, it is often not appropriate to simply split the cache into equal-sized partitions. Workloads that depend on achieving a high cache absorption rate may require more than 1/nth of the cache space to achieve the R-value of their standalone efficiency in their time quantum. Conversely, large streaming workloads only require a small amount of cache space to buffer prefetched data or dirty writeback data. Therefore, knowledge of the relationship between a workload’s performance and its cache size is necessary in order to correctly assign it sufficient cache space to achieve the R-value of its standalone efficiency.
Argon provides weighted fair sharing by explicitly allocating disk time and by providing appropriately-sized cache partitions to each workload. Each workload’s cache efficiency is insulated by sizing its cache partition to provide the specified R-value of the absorption rate it would get from using the entire cache. To maximize the value of available cache space, the space allocated to each service is set to the minimum amount required to achieve the configured R-value of its standalone efficiency. For example, a service that streams large files and exhibits no reuse hits only requires enough cache space to buffer its prefetched data. On-line cache simulation is used to determine the required cache space.
Argon uses a three-step process to discover the required cache partition size for each workload. First, a workload’s request pattern is traced; this lets Argon deduce the relationship between a workload’s cache space and its I/O absorption rate (i.e., the fraction of requests that do not go to disk). Second, a system model predicts the workload’s throughput as a function of the I/O absorption rate. Third, Argon uses the specified R-value to compute the required I/O absorption rate (the relationship calculated in step 2), which is then used to select the required cache partition size (the relationship calculated in step 1).
Scheduling in Argon refers to controlling when each workload’s requests are sent to the disk firmware and is necessary for three reasons. First, it ensures that a workload receives exclusive disk access, as required for amortization. Second, it ensures that disk time is appropriately divided among workloads. Third, it ensures that the R-value of standalone efficiency for a workload is achieved in its quantum, by ensuring that the quantum is large enough.
Argon performs simple round-robin time quantum scheduling, with each workload receiving a scheduling quantum. Requests from a particular workload are queued until that workload’s time quantum begins. Then, queued requests from that workload are issued, and incoming requests from that workload are passed through to the disk until the workload has submitted what the scheduler has computed to be the maximum number of requests it can issue in the time quantum, or the quantum expires.
The scheduler must estimate how many requests can be performed in the time quantum for a given workload, since average service times may vary between workloads. The length of each quantum is determined by Argon to achieve the configured R-value, and average response time is kept low by improving overall server efficiency. Each workload’s disk efficiency is insulated by ensuring that disk time is allotted to clients in large enough quanta so that the majority of time is spent handling client requests, with comparatively minimal time spent at the beginning of a quantum seeking to the workload’s first request.