This workshop aims to bring together the systems, HPC, and machine learning communities to explore the growing role of sparsity as a foundational tool for scaling efficiency across modern computing workloads, from scientific computing and HPC to LLMs. By fostering collaboration among researchers, practitioners, and industry experts, the workshop will focus on (a) developing novel architectures and system techniques that exploit sparsity at multiple levels and (b) deploying sparsity-aware models effectively in real-world scientific and AI applications. Topics of interest include unstructured sparsity, quantization, MoE architectures, and other innovations that drive computational efficiency, scalability, and sustainability across the full system stack.
Keynote Talk

Einsums, Fibertrees and Dataflow: Architecture for the Post-Moore Era
by Joel Emer
Bio
Joel Emer is a Professor of the Practice at MIT's Electrical Engineering and Computer Science Department (EECS) and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). He also works part-time as a Senior Distinguished Research Scientist at Nvidia in Westford, MA, where he is responsible for exploration of future architectures as well as modeling and analysis methodologies. Prior to joining Nvidia, he worked at Intel where he was an Intel Fellow and Director of Microarchitecture Research. Previously he worked at Compaq and Digital Equipment Corporation (DEC).
For nearly 50 years, Dr. Emer has held various research and advanced development positions investigating processor micro-architecture and developing performance modeling and evaluation techniques. He has made architectural contributions to a number of VAX, Alpha and X86 processors and is recognized as one of the developers of the widely employed quantitative approach to processor performance evaluation. He has also been recognized for his contributions in the advancement of deep learning accelerator design, spatial and parallel architectures, processor reliability analysis, memory dependence prediction, pipeline and cache organization, performance modeling methodologies and simultaneous multithreading. He earned a doctorate in electrical engineering from the University of Illinois in 1979. He received a bachelor's degree with highest honors in electrical engineering in 1974, and his master's degree in 1975 -- both from Purdue University. Among his honors, he is a Fellow of both the ACM and IEEE, and a member of the NAE. He also received both the Eckert-Mauchly award and the B. Ramakrishan Rau award for lifetime contributions in computer architecture.
Abstract
Over the past few years, efforts to address the challenges of the end of Moore's Law has led to significant rise in domain-specific accelerators. Many of these accelerators target tensor algebraic computations and even more specifically computations on sparse tensors. To exploit that sparsity, these accelerators employ a wide variety of novel solutions to achieve good performance. At the same time, prior work on sparse accelerators does not systematically express this full range of design features, making it difficult to understand the impact of each design choice and compare or extend the state-of-the-art.
In an analogous fashion to our prior work that categorized DNN dataflows into patterns like weight stationary and output stationary, this talk will try to provide a systematic approach to characterize the range of sparse tensor accelerators. Thus, rather than presenting a single specific combination of a dataflow and concrete data representation, I will present a generalized framework for describing computations, dataflows, the manipulation of sparse (and dense) tensor operands and data representation options. In this framework, this separation of concerns is intended to better understand designs and facilitate the exploration of the wide design space of tensor accelerators. Included in this framework I will present a description of computations using an extension of the Einstein summation notation (Einsums) and a format-agnostic abstraction for sparse tensors, called fibertrees. Using the fibertree abstraction, one can express a wide variety of concrete data representations, each with its own advantages and disadvantages. Furthermore by adding a set of operators for activities, like traversal and merging of tensors, the fibertree notation can be used to express dataflows independent of the concrete data representation used for the tensor operands. Thus, using this common language, I will show how to describe a variety of sparse tensor accelerator designs and ultimately our state-of-art transformer accelerator.
Invited Talks

Dataflow Optimizations in Sparse Accelerators: Loop Reordering and Loop Tiling
by Mingyu Gao
Bio
Mingyu Gao is an associate professor of computer science in the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University in Beijing, China. He received his PhD in Electrical Engineering at Stanford University. His research interests lie in the fields of computer architecture and systems, including efficient memory architectures, scalable data processing, and hardware system security, with a special emphasis on data-intensive applications like artificial intelligence and big data analytics. He has published in top-tier conferences including ISCA, ASPLOS, MICRO, HPCA, OSDI, SIGMOD, and VLDB. He also regularly serves in the program committees of ISCA, MICRO, ASPLOS, HPCA, and other top conferences.
Abstract
Dataflow techniques, such as loop reordering and loop tiling, have been widely used in domain-specific accelerators for machine learning and dense algebra, in order to improve on-chip data reuse and reduce off-chip data accesses to boost performance and energy efficiency. However, when coming to sparse tensor accelerators, special cares must be taken to address the new challenges of irregular and diverse patterns of input data. Different sparse patterns would prefer drastically different dataflow schemes. In this talk, we present two of our recent papers in ASPLOS 2023 and ISCA 2025, which propose flexible dataflow schemes that efficiently adapt to various sparse data patterns, focusing on loop reordering and loop tiling techniques, respectively. Such efficient and adaptive designs require comprehensive co-designs across hardware architecture and software scheduling.

The Compression Trinity: Navigating Sparsity, Quantization, and Low-Rank Approaches in Transformers
Bio
Amir Yazdanbakhsh is a Research Scientist at Google DeepMind, working at the intersection of machine learning and computer architecture. His primary focus is on applying machine learning to design efficient and sustainable computing systems, from leading the development of large-scale distributed training systems on TPUs to shaping the next generation of Google's ML accelerators. His work has been recognized by the ISCA Hall of Fame. Notably, his research on using AI to solve performance challenges in hyperscale systems received an IEEE Micro Top Picks award, and his work on a new system for AI won the IEEE Computer Society Best Paper Award. Amir received his Ph.D. from the Georgia Institute of Technology, where he was a recipient of the Microsoft and Qualcomm fellowships.
Abstract
Sparsity, quantization, and low-rank adaptation each offer powerful levers to tame the compute and memory demands of modern transformers, yet their joint use often reveals hidden frictions—in gradient propagation, tensor fidelity, and latency budgets. Which ordering preserves pruning integrity under aggressive quantization? Can late-stage low-rank adapters bridge the expressivity gap of ultra-sparse models? Is there a single scaling law that links sparse and dense training regimes? In this talk I’ll draw on our recent projects to frame these tensions and invite you to explore the open problems at the intersection of algorithmic compression and hardware-aware design.
Schedule
Time | Title | Speaker |
---|---|---|
8:00AM – 8:10AM | Opening Remarks | |
8:10AM – 9:10AM | Keynote: Einsums, Fibertrees and Dataflow: Architecture for the Post-Moore Era | Joel Emer |
Session 1 | Session Chair: TBD | ||
9:10AM – 9:40AM | Invited Talk: Dataflow Optimizations in Sparse Accelerators: Loop Reordering and Loop Tiling | Mingyu Gao |
9:40AM – 9:50AM | Taming the Tail: Sparsity-Aware NoI Topology Synthesis for Mixed DL Workloads on Chiplet-Based Accelerators | Arnav Shukla |
9:50AM – 10:00AM | MPI-over-CXL: Enhancing Communication Efficiency in Distributed HPC Systems | Miryeong Kwon |
10:00AM – 10:30AM | Break | |
Session 2 | Session Chair: TBD | ||
10:30AM – 11:00AM | Invited Talk: The Compression Trinity: Navigating Sparsity, Quantization, and Low-Rank Approaches in Transformers | Amir Yazdanbakhsh |
11:00AM – 11:10AM | FuseFlow: A Fusion-Centric Compilation Framework for Sparse Deep Learning on Streaming Dataflow | Rubens Lacouture |
11:10AM – 11:20AM | Fullex: A Full-dense Approach to Matrix Multiplication in Sparse Structures | Danial Farsi |
11:20AM – 11:30AM | Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums | Jaeyeon Won |
11:30AM – 11:40AM | Zero-Overhead Sparsity Prediction for Dynamic Algorithm Selection in Deep Learning Models | Seungjun Lee |
11:40AM – 11:50PM | Position-Adaptive Temporal Sparsity for KV Caches in Long-Context LLMs | Mahmoud Abumandour |
11:50AM – 12:00PM | Closing Remarks |
Call For Papers
Sparsity has become a defining feature in modern computing workloads, from scientific simulations on HPC platforms to inference and training in cutting-edge LLMs. Sparsity appears across all layers of the stack: bit-level computations, sparse data structures, irregular memory access patterns, and high-level architectural design such as MoEs and dynamic routing. Although sparsity offers enormous potential to improve computing efficiency, reduce energy consumption, and enable scalability, its integration into modern systems introduces significant architectural, algorithmic, and programming challenges.
We invite submissions that address any aspect of sparsity in computing systems. Topics of interest include, but are not limited to:
- Sparsity in scientific and HPC applications.
- Sparse inference and training techniques in LLMs and foundation models.
- Architectural support for unstructured and structured sparsity.
- MoE and dynamic routing models: performance, systems, and deployment.
- Quantization, pruning, and compression methods that induce or leverage sparsity.
- Compiler, runtime, and scheduling frameworks for sparse workloads.
- Benchmarks, metrics, and tools to evaluate sparse systems.
- Hardware/software co-design for sparsity-aware execution.
- Sparse acceleration in near-memory, neuromorphic, and analog computing.
- Sparsity in edge and low-power AI systems.
- Programming models and abstractions for sparse computing.
- Case studies of sparse systems deployed on a scale.
- Combinatorial algorithms and graph computations over irregular or not rigidly structured data (beyond matrices and tensors).
- Techniques for handling the interaction between dense and sparse data structures (e.g., in property graphs combining tables and graphs).
We welcome complete papers, early stage work, and position papers that inspire discussion and foster community building. We target a soft limit of 4 pages, formatted in double-column style, similar to the main MICRO submission. If you have any questions please feel free to reach out to Bahar Asgari [bahar at umd dot edu] or Ramyad Hadidi [rhadidi at d-matrix dot ai]
Important Info:
- Submission Deadline:
August 31, 2025, 11:59pm PST→ September 2, 2025, 11:59pm PST - Author Notification: September 9, 2025, 11:59pm PST (before MICRO's early registration deadline).
- Submission Link: hotcrp
FAQ:
1. How strict is the 4-page limit?
The 4-page limit is a soft guideline. Your text (excluding references) may slightly exceed 4 pages (e.g., 4.25–4.5 pages). The exact length will not affect the decision on your paper.
2. Do early-stage or position papers need to include results?
Yes. Even early-stage or position papers should include some preliminary results to support their claims. We understand these papers may not yet have a complete set of evaluation results.
3. Should I list the authors in my submission?
No. In line with the main MICRO submission guidelines, please do not include author names in your submission.
4. Can I also submit my work elsewhere?
Yes. Papers submitted to SPICE will not be published in the proceedings. You are free to publish the complete version of your work elsewhere, and you may also submit preliminary or ongoing work to SPICE.