Sparsity, the Key Ingredient from HPC to Efficient LLMs

A Workshop Co-Located with MICRO 2025, Seoul, Korea

October 18, 2025 | 8:00 AM to 12:00 PM KST | Location: Bell-vue

This workshop aims to bring together the systems, HPC, and machine learning communities to explore the growing role of sparsity as a foundational tool for scaling efficiency across modern computing workloads, from scientific computing and HPC to LLMs. By fostering collaboration among researchers, practitioners, and industry experts, the workshop will focus on (a) developing novel architectures and system techniques that exploit sparsity at multiple levels and (b) deploying sparsity-aware models effectively in real-world scientific and AI applications. Topics of interest include unstructured sparsity, quantization, MoE architectures, and other innovations that drive computational efficiency, scalability, and sustainability across the full system stack.

Keynote Talk

Einsums, Fibertrees and Dataflow: Architecture for the Post-Moore Era

Bio: Joel Emer is a Professor of the Practice at MIT's Electrical Engineering and Computer Science Department (EECS) and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). He also works part-time as a Senior Distinguished Research Scientist at Nvidia in Westford, MA, where he is responsible for exploration of future architectures as well as modeling and analysis methodologies. Prior to joining Nvidia, he worked at Intel where he was an Intel Fellow and Director of Microarchitecture Research. Previously he worked at Compaq and Digital Equipment Corporation (DEC). For nearly 50 years, Dr. Emer has held various research and advanced development positions investigating processor micro-architecture and developing performance modeling and evaluation techniques. He has made architectural contributions to a number of VAX, Alpha and X86 processors and is recognized as one of the developers of the widely employed quantitative approach to processor performance evaluation. He has also been recognized for his contributions in the advancement of deep learning accelerator design, spatial and parallel architectures, processor reliability analysis, memory dependence prediction, pipeline and cache organization, performance modeling methodologies and simultaneous multithreading. He earned a doctorate in electrical engineering from the University of Illinois in 1979. He received a bachelor's degree with highest honors in electrical engineering in 1974, and his master's degree in 1975 -- both from Purdue University. Among his honors, he is a Fellow of both the ACM and IEEE, and a member of the NAE. He also received both the Eckert-Mauchly award and the B. Ramakrishan Rau award for lifetime contributions in computer architecture.

Abstract: Over the past few years, efforts to address the challenges of the end of Moore's Law has led to significant rise in domain-specific accelerators. Many of these accelerators target tensor algebraic computations and even more specifically computations on sparse tensors. To exploit that sparsity, these accelerators employ a wide variety of novel solutions to achieve good performance. At the same time, prior work on sparse accelerators does not systematically express this full range of design features, making it difficult to understand the impact of each design choice and compare or extend the state-of-the-art. In an analogous fashion to our prior work that categorized DNN dataflows into patterns like weight stationary and output stationary, this talk will try to provide a systematic approach to characterize the range of sparse tensor accelerators. Thus, rather than presenting a single specific combination of a dataflow and concrete data representation, I will present a generalized framework for describing computations, dataflows, the manipulation of sparse (and dense) tensor operands and data representation options. In this framework, this separation of concerns is intended to better understand designs and facilitate the exploration of the wide design space of tensor accelerators. Included in this framework I will present a description of computations using an extension of the Einstein summation notation (Einsums) and a format-agnostic abstraction for sparse tensors, called fibertrees. Using the fibertree abstraction, one can express a wide variety of concrete data representations, each with its own advantages and disadvantages. Furthermore by adding a set of operators for activities, like traversal and merging of tensors, the fibertree notation can be used to express dataflows independent of the concrete data representation used for the tensor operands. Thus, using this common language, I will show how to describe a variety of sparse tensor accelerator designs and ultimately our state-of-art transformer accelerator.

Call For Papers

Sparsity has become a defining feature in modern computing workloads, from scientific simulations on HPC platforms to inference and training in cutting-edge LLMs. Sparsity appears across all layers of the stack: bit-level computations, sparse data structures, irregular memory access patterns, and high-level architectural design such as MoEs and dynamic routing. Although sparsity offers enormous potential to improve computing efficiency, reduce energy consumption, and enable scalability, its integration into modern systems introduces significant architectural, algorithmic, and programming challenges.

We invite submissions that address any aspect of sparsity in computing systems. Topics of interest include, but are not limited to:

We welcome complete papers, early stage work, and position papers that inspire discussion and foster community building. We target a soft limit of 4 pages, formatted in double-column style, similar to the main MICRO submission. If you have any questions please feel free to reach out to Bahar Asgari [bahar at umd dot edu] or Ramyad Hadidi [rhadidi at d-matrix dot ai]

Important Info:

FAQ:

1. How strict is the 4-page limit?

The 4-page limit is a soft guideline. Your text (excluding references) may slightly exceed 4 pages (e.g., 4.25–4.5 pages). The exact length will not affect the decision on your paper.

2. Do early-stage or position papers need to include results?

Yes. Even early-stage or position papers should include some preliminary results to support their claims. We understand these papers may not yet have a complete set of evaluation results.

3. Should I list the authors in my submission?

No. In line with the main MICRO submission guidelines, please do not include author names in your submission.

4. Can I also submit my work elsewhere?

Yes. Papers submitted to SPICE will not be published in the proceedings. You are free to publish the complete version of your work elsewhere, and you may also submit preliminary or ongoing work to SPICE.

Organizers: