This workshop aims to bring together the systems, HPC, and machine learning communities to explore the growing role of sparsity as a foundational tool for scaling efficiency across modern computing workloads, from scientific computing and HPC to LLMs. By fostering collaboration among researchers, practitioners, and industry experts, the workshop will focus on (a) developing novel architectures and system techniques that exploit sparsity at multiple levels and (b) deploying sparsity-aware models effectively in real-world scientific and AI applications. Topics of interest include unstructured sparsity, quantization, MoE architectures, and other innovations that drive computational efficiency, scalability, and sustainability across the full system stack.
Keynote Talk
Einsums, Fibertrees and Dataflow: Architecture for the Post-Moore Era
Bio: Joel Emer is a Professor of the Practice at MIT's Electrical Engineering and Computer Science Department (EECS) and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). He also works part-time as a Senior Distinguished Research Scientist at Nvidia in Westford, MA, where he is responsible for exploration of future architectures as well as modeling and analysis methodologies. Prior to joining Nvidia, he worked at Intel where he was an Intel Fellow and Director of Microarchitecture Research. Previously he worked at Compaq and Digital Equipment Corporation (DEC). For nearly 50 years, Dr. Emer has held various research and advanced development positions investigating processor micro-architecture and developing performance modeling and evaluation techniques. He has made architectural contributions to a number of VAX, Alpha and X86 processors and is recognized as one of the developers of the widely employed quantitative approach to processor performance evaluation. He has also been recognized for his contributions in the advancement of deep learning accelerator design, spatial and parallel architectures, processor reliability analysis, memory dependence prediction, pipeline and cache organization, performance modeling methodologies and simultaneous multithreading. He earned a doctorate in electrical engineering from the University of Illinois in 1979. He received a bachelor's degree with highest honors in electrical engineering in 1974, and his master's degree in 1975 -- both from Purdue University. Among his honors, he is a Fellow of both the ACM and IEEE, and a member of the NAE. He also received both the Eckert-Mauchly award and the B. Ramakrishan Rau award for lifetime contributions in computer architecture.
Abstract: Over the past few years, efforts to address the challenges of the end of Moore's Law has led to significant rise in domain-specific accelerators. Many of these accelerators target tensor algebraic computations and even more specifically computations on sparse tensors. To exploit that sparsity, these accelerators employ a wide variety of novel solutions to achieve good performance. At the same time, prior work on sparse accelerators does not systematically express this full range of design features, making it difficult to understand the impact of each design choice and compare or extend the state-of-the-art. In an analogous fashion to our prior work that categorized DNN dataflows into patterns like weight stationary and output stationary, this talk will try to provide a systematic approach to characterize the range of sparse tensor accelerators. Thus, rather than presenting a single specific combination of a dataflow and concrete data representation, I will present a generalized framework for describing computations, dataflows, the manipulation of sparse (and dense) tensor operands and data representation options. In this framework, this separation of concerns is intended to better understand designs and facilitate the exploration of the wide design space of tensor accelerators. Included in this framework I will present a description of computations using an extension of the Einstein summation notation (Einsums) and a format-agnostic abstraction for sparse tensors, called fibertrees. Using the fibertree abstraction, one can express a wide variety of concrete data representations, each with its own advantages and disadvantages. Furthermore by adding a set of operators for activities, like traversal and merging of tensors, the fibertree notation can be used to express dataflows independent of the concrete data representation used for the tensor operands. Thus, using this common language, I will show how to describe a variety of sparse tensor accelerator designs and ultimately our state-of-art transformer accelerator.
Call For Papers
Sparsity has become a defining feature in modern computing workloads, from scientific simulations on HPC platforms to inference and training in cutting-edge LLMs. Sparsity appears across all layers of the stack: bit-level computations, sparse data structures, irregular memory access patterns, and high-level architectural design such as MoEs and dynamic routing. Although sparsity offers enormous potential to improve computing efficiency, reduce energy consumption, and enable scalability, its integration into modern systems introduces significant architectural, algorithmic, and programming challenges.
We invite submissions that address any aspect of sparsity in computing systems. Topics of interest include, but are not limited to:
- Sparsity in scientific and HPC applications.
- Sparse inference and training techniques in LLMs and foundation models.
- Architectural support for unstructured and structured sparsity.
- MoE and dynamic routing models: performance, systems, and deployment.
- Quantization, pruning, and compression methods that induce or leverage sparsity.
- Compiler, runtime, and scheduling frameworks for sparse workloads.
- Benchmarks, metrics, and tools to evaluate sparse systems.
- Hardware/software co-design for sparsity-aware execution.
- Sparse acceleration in near-memory, neuromorphic, and analog computing.
- Sparsity in edge and low-power AI systems.
- Programming models and abstractions for sparse computing.
- Case studies of sparse systems deployed on a scale.
- Combinatorial algorithms and graph computations over irregular or not rigidly structured data (beyond matrices and tensors).
- Techniques for handling the interaction between dense and sparse data structures (e.g., in property graphs combining tables and graphs).
We welcome complete papers, early stage work, and position papers that inspire discussion and foster community building. We target a soft limit of 4 pages, formatted in double-column style, similar to the main MICRO submission. If you have any questions please feel free to reach out to Bahar Asgari [bahar at umd dot edu] or Ramyad Hadidi [rhadidi at d-matrix dot ai]
Important Info:
- Submission Deadline:
August 31, 2025, 11:59pm PST→ September 2, 2025, 11:59pm PST - Author Notification: September 9, 2025, 11:59pm PST (before MICRO's early registration deadline).
- Submission Link: hotcrp
FAQ:
1. How strict is the 4-page limit?The 4-page limit is a soft guideline. Your text (excluding references) may slightly exceed 4 pages (e.g., 4.25–4.5 pages). The exact length will not affect the decision on your paper.
2. Do early-stage or position papers need to include results?Yes. Even early-stage or position papers should include some preliminary results to support their claims. We understand these papers may not yet have a complete set of evaluation results.
3. Should I list the authors in my submission?No. In line with the main MICRO submission guidelines, please do not include author names in your submission.
4. Can I also submit my work elsewhere?Yes. Papers submitted to SPICE will not be published in the proceedings. You are free to publish the complete version of your work elsewhere, and you may also submit preliminary or ongoing work to SPICE.
Organizers:
-
Co-Chairs
- Bahar Asgari | Assistant Professor | Department of Computer Science at the University of Maryland, College Park (UMD)
- Ramyad Hadidi | Senior Staff Engineer | d-Matrix
-
Organizer Committee (Alphabetical Order)
- Ben Feinberg | Senior Member of Technical Staff | the Scalable Computer Architecture Group at Sandia National Laboratories
- Christina Giannoula | Incoming Faculty | Max Planck Institute for Software Systems (MPI-SWS)
- Olivia Hsu | Incoming Assistant Professor | Department of Electrical and Computer Engineering at Carnegie Mellon University (CMU)
- Prashant J. Nair | Assistant Professor and Senior Principal Engineer | the University of British Columbia (UBC) and d-Matrix
- Antonino Tumeo | Chief Scientist | The Future Computing Technologies Group at Pacific Northwest National Laboratory (PNNL)
- Farzaneh Zokaee | System-on-Chip Architect | Ampere Computing
-
Student Volunteers
- Ubaid Bakhtiar | PhD Student | Department of Electrical and Computer Engineering at the University of Maryland, College Park (UMD)
- Donghyeon Joo | PhD Student | Department of Computer Science at the University of Maryland, College Park (UMD)
- Sanjali Yadav | PhD Student | Department of Computer Science at the University of Maryland, College Park (UMD)
- Amirmahdi Namjoo | PhD Student | Department of Computer Science at the University of Maryland, College Park (UMD)
- Jeonghyun Woo | PhD Student | Department of Electrical and Computer Engineering at the University of British Columbia (UBC)
- Junsu Kim | PhD Student | Department of Electrical and Computer Engineering at the University of British Columbia (UBC)