SPICE Workshop at MICRO 2025

This workshop aims to bring together the systems, HPC, and machine learning communities to explore the growing role of sparsity as a foundational tool for scaling efficiency across modern computing workloads, from scientific computing and HPC to LLMs. By fostering collaboration among researchers, practitioners, and industry experts, the workshop will focus on (a) developing novel architectures and system techniques that exploit sparsity at multiple levels and (b) deploying sparsity-aware models effectively in real-world scientific and AI applications. Topics of interest include unstructured sparsity, quantization, MoE architectures, and other innovations that drive computational efficiency, scalability, and sustainability across the full system stack.

Keynote Talk

Einsums, Fibertrees and Looptrees: Taking a Systematic Approach to Tensor Accelerator Design

by Joel Emer

Bio

Joel Emer is a Professor of the Practice at MIT's Electrical Engineering and Computer Science Department (EECS) and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). He also works part-time as a Senior Distinguished Research Scientist at Nvidia in Westford, MA, where he is responsible for exploration of future architectures as well as modeling and analysis methodologies. Prior to joining Nvidia, he worked at Intel where he was an Intel Fellow and Director of Microarchitecture Research. Previously he worked at Compaq and Digital Equipment Corporation (DEC).

For nearly 50 years, Dr. Emer has held various research and advanced development positions investigating processor micro-architecture and developing performance modeling and evaluation techniques. He has made architectural contributions to a number of VAX, Alpha and X86 processors and is recognized as one of the developers of the widely employed quantitative approach to processor performance evaluation. He has also been recognized for his contributions in the advancement of deep learning accelerator design, spatial and parallel architectures, processor reliability analysis, memory dependence prediction, pipeline and cache organization, performance modeling methodologies and simultaneous multithreading. He earned a doctorate in electrical engineering from the University of Illinois in 1979. He received a bachelor's degree with highest honors in electrical engineering in 1974, and his master's degree in 1975 -- both from Purdue University. Among his honors, he is a Fellow of both the ACM and IEEE, and a member of the NAE. He also received both the Eckert-Mauchly award and the B. Ramakrishan Rau award for lifetime contributions in computer architecture.

Abstract

Over the past few years, efforts to address the challenges of the end of Moore's Law has led to significant rise in domain-specific accelerators. Many of these accelerators target tensor algebraic computations and even more specifically computations on sparse tensors. To exploit that sparsity, these accelerators employ a wide variety of novel solutions to achieve good performance. At the same time, prior work on sparse accelerators does not systematically express this full range of design features, making it difficult to understand the impact of each design choice and compare or extend the state-of-the-art.

In an analogous fashion to our prior work that categorized DNN dataflows into patterns like weight stationary and output stationary, this talk will try to provide a systematic approach to characterize the range of sparse tensor accelerators. Thus, rather than presenting a single specific combination of a dataflow and concrete data representation, I will present a generalized framework for describing computations, dataflows, the manipulation of sparse (and dense) tensor operands and data representation options. In this framework, this separation of concerns is intended to better understand designs and facilitate the exploration of the wide design space of tensor accelerators. Included in this framework I will present a description of computations using an extension of the Einstein summation notation (Einsums) and a format-agnostic abstraction for sparse tensors, called fibertrees. Using the fibertree abstraction, one can express a wide variety of concrete data representations, each with its own advantages and disadvantages. Furthermore by adding a set of operators for activities, like traversal and merging of tensors, the fibertree notation can be used to express dataflows independent of the concrete data representation used for the tensor operands. Thus, using this common language, I will show how to describe a variety of sparse tensor accelerator designs and ultimately our state-of-art transformer accelerator.

Invited Talks

Dataflow Optimizations in Sparse Accelerators: Loop Reordering and Loop Tiling

by Mingyu Gao

Bio

Mingyu Gao is an associate professor of computer science in the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University in Beijing, China. He received his PhD in Electrical Engineering at Stanford University. His research interests lie in the fields of computer architecture and systems, including efficient memory architectures, scalable data processing, and hardware system security, with a special emphasis on data-intensive applications like artificial intelligence and big data analytics. He has published in top-tier conferences including ISCA, ASPLOS, MICRO, HPCA, OSDI, SIGMOD, and VLDB. He also regularly serves in the program committees of ISCA, MICRO, ASPLOS, HPCA, and other top conferences.

Abstract

Dataflow techniques, such as loop reordering and loop tiling, have been widely used in domain-specific accelerators for machine learning and dense algebra, in order to improve on-chip data reuse and reduce off-chip data accesses to boost performance and energy efficiency. However, when coming to sparse tensor accelerators, special cares must be taken to address the new challenges of irregular and diverse patterns of input data. Different sparse patterns would prefer drastically different dataflow schemes. In this talk, we present two of our recent papers in ASPLOS 2023 and ISCA 2025, which propose flexible dataflow schemes that efficiently adapt to various sparse data patterns, focusing on loop reordering and loop tiling techniques, respectively. Such efficient and adaptive designs require comprehensive co-designs across hardware architecture and software scheduling.

The Compression Trinity: Navigating Sparsity, Quantization, and Low-Rank Approaches in Transformers

by Amir Yazdanbakhsh

Bio

Amir Yazdanbakhsh is a Research Scientist at Google DeepMind, working at the intersection of machine learning and computer architecture. His primary focus is on applying machine learning to design efficient and sustainable computing systems, from leading the development of large-scale distributed training systems on TPUs to shaping the next generation of Google's ML accelerators. His work has been recognized by the ISCA Hall of Fame. Notably, his research on using AI to solve performance challenges in hyperscale systems received an IEEE Micro Top Picks award, and his work on a new system for AI won the IEEE Computer Society Best Paper Award. Amir received his Ph.D. from the Georgia Institute of Technology, where he was a recipient of the Microsoft and Qualcomm fellowships.

Abstract

Sparsity, quantization, and low-rank adaptation each offer powerful levers to tame the compute and memory demands of modern transformers, yet their joint use often reveals hidden frictions—in gradient propagation, tensor fidelity, and latency budgets. Which ordering preserves pruning integrity under aggressive quantization? Can late-stage low-rank adapters bridge the expressivity gap of ultra-sparse models? Is there a single scaling law that links sparse and dense training regimes? In this talk I’ll draw on our recent projects to frame these tensions and invite you to explore the open problems at the intersection of algorithmic compression and hardware-aware design.

Schedule

Time	Title	Speaker
8:00AM – 8:10AM	Opening Remarks
8:10AM – 9:10AM	Keynote Talk: Einsums, Fibertrees and Looptrees: Taking a Systematic Approach to Tensor Accelerator Design	Joel Emer

	Session 1 \| Session Chair: Ramyad Hadidi

9:10AM – 9:40AM	Invited Talk: Dataflow Optimizations in Sparse Accelerators: Loop Reordering and Loop Tiling	Mingyu Gao
9:40AM – 9:50AM	Taming the Tail: Sparsity-Aware NoI Topology Synthesis for Mixed DL Workloads on Chiplet-Based Accelerators Arnav Shukla (IIIT Delhi), Harsh Sharma (Washington State University), Srikant Bharadwaj (Microsoft Research), Vinayak Abrol (IIIT Delhi), Sujay Deb (IIIT Delhi)	Arnav Shukla
9:50AM – 10:00AM	MPI-over-CXL: Enhancing Communication Efficiency in Distributed HPC Systems Miryeong Kwon, Donghyun Gouk, Hyein Woo, Junhee Kim, Jinwoo Baek, Kyungkuk Nam, Sangyoon Ji, Jiseon Kim, Hanyeoreum Bae, Junhyeok Jang, Hyunwoo You, Junseok Moon, Myoungsoo Jung (Panmnesia)	Junhee Kim
10:00AM – 10:30AM	Break

	Session 2 \| Session Chair: Bahar Asgari

10:30AM – 11:00AM	Invited Talk: The Compression Trinity: Navigating Sparsity, Quantization, and Low-Rank Approaches in Transformers	Amir Yazdanbakhsh
11:00AM – 11:10AM	FuseFlow: A Fusion-Centric Compilation Framework for Sparse Deep Learning on Streaming Dataflow Rubens Lacouture, Olivia Hsu, Nathan Zhang, Ritvik Sharma, Marco Siracusa, Fredrik Kjolstad, Kunle Olukotun (Stanford University)	Rubens Lacouture
11:10AM – 11:20AM	Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums Jaeyeon Won (MIT), Willow Ahrens (Georgia Tech), Joel Emer (MIT/NVIDIA), Saman Amarasinghe (MIT)	Jaeyeon Won
11:20AM – 11:30AM	Zero-Overhead Sparsity Prediction for Dynamic Algorithm Selection in Deep Learning Models Seungjun Lee, Seungkwan Kang, Miryeong Kwon, Hyunkyu Choi, Myoungsoo Jung (KAIST)	Seungjun Lee
11:30AM – 11:40PM	Position-Adaptive Temporal Sparsity for KV Caches in Long-Context LLMs Mahmoud Abumandour, Alaa Alameldeen (Simon Fraser University)	Mahmoud Abumandour
11:40AM – 11:50AM	Fullex: A Full-dense Approach to Matrix Multiplication in Sparse Structures Danial Farsi (University of Isfahan), Mahdi Pazooki (Iran University of Science and Technology), Parsa Bashari, Hajar Falahati, Negin Mahani (BSC), Hakem Beitolahi (Iran University of Science and Technology), Adrian Cristal, Osman Unsal (BSC)	Parsa Bashari
11:50AM – 12:00PM	Closing Remarks

Call For Papers

Sparsity has become a defining feature in modern computing workloads, from scientific simulations on HPC platforms to inference and training in cutting-edge LLMs. Sparsity appears across all layers of the stack: bit-level computations, sparse data structures, irregular memory access patterns, and high-level architectural design such as MoEs and dynamic routing. Although sparsity offers enormous potential to improve computing efficiency, reduce energy consumption, and enable scalability, its integration into modern systems introduces significant architectural, algorithmic, and programming challenges.

We invite submissions that address any aspect of sparsity in computing systems. Topics of interest include, but are not limited to:

Sparsity in scientific and HPC applications.
Sparse inference and training techniques in LLMs and foundation models.
Architectural support for unstructured and structured sparsity.
MoE and dynamic routing models: performance, systems, and deployment.
Quantization, pruning, and compression methods that induce or leverage sparsity.
Compiler, runtime, and scheduling frameworks for sparse workloads.
Benchmarks, metrics, and tools to evaluate sparse systems.
Hardware/software co-design for sparsity-aware execution.
Sparse acceleration in near-memory, neuromorphic, and analog computing.
Sparsity in edge and low-power AI systems.
Programming models and abstractions for sparse computing.
Case studies of sparse systems deployed on a scale.
Combinatorial algorithms and graph computations over irregular or not rigidly structured data (beyond matrices and tensors).
Techniques for handling the interaction between dense and sparse data structures (e.g., in property graphs combining tables and graphs).

We welcome complete papers, early stage work, and position papers that inspire discussion and foster community building. We target a soft limit of 4 pages, formatted in double-column style, similar to the main MICRO submission. If you have any questions please feel free to reach out to Bahar Asgari [bahar at umd dot edu] or Ramyad Hadidi [rhadidi at d-matrix dot ai]

Important Info:

Submission Deadline: ~~August 31, 2025, 11:59pm PST~~ → September 2, 2025, 11:59pm PST
Author Notification: September 9, 2025, 11:59pm PST (before MICRO's early registration deadline).
Submission Link: hotcrp

FAQ:

1. How strict is the 4-page limit?

The 4-page limit is a soft guideline. Your text (excluding references) may slightly exceed 4 pages (e.g., 4.25–4.5 pages). The exact length will not affect the decision on your paper.

2. Do early-stage or position papers need to include results?

Yes. Even early-stage or position papers should include some preliminary results to support their claims. We understand these papers may not yet have a complete set of evaluation results.

3. Should I list the authors in my submission?

No. In line with the main MICRO submission guidelines, please do not include author names in your submission.

4. Can I also submit my work elsewhere?

Yes. Papers submitted to SPICE will not be published in the proceedings. You are free to publish the complete version of your work elsewhere, and you may also submit preliminary or ongoing work to SPICE.

Organizers

Co-Chairs

Bahar Asgari Assistant Professor University of Maryland, College Park (UMD)

Ramyad Hadidi Senior Staff Engineer d-Matrix

Organizer Committee (Alphabetical Order)

Ben Feinberg Senior Member of Technical Staff Sandia National Laboratories

Christina Giannoula Incoming Faculty Max Planck Institute for Software Systems (MPI-SWS)

Olivia Hsu Incoming Assistant Professor Carnegie Mellon University (CMU)

Prashant J. Nair Assistant Professor / Senior Principal Engineer UBC & d-Matrix

Antonino Tumeo Chief Scientist Pacific Northwest National Laboratory (PNNL)

Farzaneh Zokaee System-on-Chip Architect Ampere Computing

Student Volunteers

Ubaid Bakhtiar PhD Student University of Maryland, College Park (UMD)

Donghyeon Joo PhD Student University of Maryland, College Park (UMD)

Sanjali Yadav PhD Student University of Maryland, College Park (UMD)

Amirmahdi Namjoo PhD Student University of Maryland, College Park (UMD)

Helia Hosseini PhD Student University of Maryland, College Park (UMD)

Jeonghyun Woo PhD Student University of British Columbia (UBC)

Junsu Kim PhD Student University of British Columbia (UBC)

Sparsity, the Key Ingredient from HPC to Efficient LLMs

A Workshop Co-Located with MICRO 2025, Seoul, Korea

October 18, 2025 | 8:00 AM to 12:00 PM KST | Location: Bell-vue

Keynote Talk

Einsums, Fibertrees and Looptrees: Taking a Systematic Approach to Tensor Accelerator Design

Invited Talks

Dataflow Optimizations in Sparse Accelerators: Loop Reordering and Loop Tiling

The Compression Trinity: Navigating Sparsity, Quantization, and Low-Rank Approaches in Transformers

Schedule

Call For Papers

Important Info:

FAQ:

Organizers

Co-Chairs

Organizer Committee (Alphabetical Order)

Student Volunteers

Sponsors