Storage systems continue to be built around decades-old imperativeinterfaces, like read/write and get/put. Although this low-level interface can be used for any framework or application, it can lead to significant IO inefficiencies, especially in cases (e.g., data maintenance tasks like compaction, integrity checks, rebalancing, etc.), for which caches tend to be least effective. Although not a new fact, IO efficiency is reaching emergency status, as the IOPS/TB (or BW/TB) available from each storage device in large-scale cluster storage drops with each increase in device capacity... new approaches are needed to more efficiently use the IOPS/TB available.
It's time to augment cluster storage with declarative interfaces, whereby data maintenance tasks and data management applications can register the need for sets of data items and allow the storage system to orchestrate the corresponding IO. So, rather than converting order-flexible and time-flexible needs into an arbitrary ordering of "do this now" imperative IO, the flexibility can be exposed to and exploited by the storage system. With this flexibility, significant opportunities arise for eliminating redundant IO (e.g., data read for an integrity check could also be used for rebalancing), smoothing IO bursts, and coalescing IOs.
![]() |
Basic Declarative IO Architecture |
FACULTY
George Amvrosiadis
Nathan Beckmann
Greg Ganger
Rashmi Vinayak
GRAD STUDENTS
Sanjith Athlur
Theo Gregersen
Timothy Kim
Sara McAllister
Sarvesh Tandon
Lucy Wang
Yiwei Zhao
COLLABORATORS
Ben Berg (University of North Carolina)
Daniel Berger (Microsoft)
Saurabh Kadekodi (Google)
Arif Merchant (Google)
coming soon
We thank the members and companies of the PDL Consortium: Amazon, Bloomberg LP, Datadog, Google, Honda, Intel Corporation, Jane Street, LayerZero Research, Meta, Microsoft Research, Oracle Corporation, Oracle Cloud Infrastructure, Pure Storage, Salesforce, Samsung Semiconductor Inc., and Western Digital for their interest, insights, feedback, and support.