The PDL Packet - Fall 2024 Newsletter
H4H: Hybrid Convolution-Transformer Architecture Search for NPU-CIM Heterogeneous Systems for AR/VR Applications
Yiwei Zhao, Jinhui Chen, Sai Qian Zhang, Syed Shakib Sarwar, Kleber Hugo Stangherlin, Jorge Tomas Gomez, Jae-Sun Seo, Barbara De Salvo, Chiao Liu, Phillip B. Gibbons, Ziyun Li
30th Asia and South Pacific Design Automation Conference (ASPDAC ’25), January 20–23, 2025, Tokyo, Japan.
Low-latency and low-power edge AI is crucial for Augmented/Virtual Reality applications. Recent advances demonstrate that hybrid models, combining convolution layers (CNN) and transformers (ViT), often achieve a superior accuracy/performance tradeoff on various computer vision and machine learning (ML) tasks. However, hybrid ML models can present system challenges for latency and energy efficiency due to their diverse nature in dataflow and memory access patterns. In this work, we leverage architecture heterogeneity from Neural Processing Units (NPU) and Compute-In-Memory (CIM) and explore diverse execution schemas for efficient hybrid model executions. [...more]
The Key to Effective UDF Optimization: Before Inlining, First Perform Outlining
Samuel Arch, Yuchen Liu, Todd Mowry, Jignesh Patel, Andrew Pavlo
Proceedings of the VLDB Endowment, Vol. 18, No. 1., December 2024.
Although user-defined functions (UDFs) are a popular way to augment SQL’s declarative approach with procedural code, the mismatch between programming paradigms creates a fundamental optimization challenge. UDF inlining automatically removes all UDF calls by replacing them with equivalent SQL subqueries. Although inlining leaves queries entirely in SQL (resulting in large performance gains), we observe that inlining the entire UDF often leads to sub-optimal performance. A better approach is to analyze the UDF, deconstruct it into smaller pieces, and inline only the pieces that help query optimization. [...more]
Morph: Efficient File-Lifetime Redundancy Management for Cluster File Systems
Timothy Kim, Sanjith Athlur, Saurabh Kadekodi, Francisco Maturana Dax Delvira, Arif Merchant, Gregory R. Ganger, K. V. Rashmi
SOSP ’24, November 4–6, 2024, Austin, TX, USA.
Many data services tune and change redundancy configurations of files over their lifetimes to address changes in data temperature and latency requirements. Unfortunately, changing redundancy configs (transcode) is IO-intensive. The Morph cluster file system introduces new transcode-efficient redundancy schemes to minimize overheads as files progress through lifetime phases. [...more]
Reducing Cross-Cloud/Region Costs with the Auto-Configuring MACARON Cache
Hojin Park, Ziyue Qiu, Gregory R. Ganger, George Amvrosiadis
SOSP ’24, November 4–6, 2024, Austin, TX, USA.
An increasing demand for cross-cloud and cross-region data access is bringing forth challenges related to high data transfer costs and latency. In response, we introduce Macaron, an auto-configuring cache system designed to minimize cost for remote data access. A key insight behind Macaron is that cloud cache size is tied to cost, not hardware limits, shifting the way we think about cache design and eviction policies. [...more]
Congratulations to Sophia on winning the ACM Student Research contest at SOSP this year. Her research on "Possum: A Tail of Dynamic Flash Capacity for Sustainability" investigates managing flash storage density for improved performance and device endurance...
Congratulations to Sara (Ph.D. student, CSD) on becoming a Siebel Scholar! Her work on computer systems focuses on distributed, caching and storage systems, leveraging hardware-software co-design and grounding system design to enable more efficient and sustainable systems ...
Congratulations to Dimitrios (Assistant Professor, SCS), who has been named an Intel Rising Star. His research bridges hardware and OSes and delves into the core challenges of datacenter computing, addressing fundamental questions about scalability limitations, security vulnerabilities, and energy efficiency ...