
Sudeep Pasricha
Colorado State University, USA
Stochastic In-DRAM Acceleration of LLMs
Abstract
Data movement remains the biggest energy and performance bottleneck for accelerating AI workloads such as large language models (LLMs) in modern computing platforms. In this talk, I will present a new type of mixed-analog stochastic in-DRAM accelerator called ARTEMIS for accelerating inference of transformer models at the heart of modern LLMs. This in-memory accelerator employs minimal changes to conventional DRAM arrays and efficiently alleviates the costs associated with transformer model execution by supporting stochastic computing for multiplications and temporal analog accumulations using a novel in-DRAM metal-on-metal capacitor. To further address the intra-memory data movement bottleneck, an optimized token-based dataflow tailored for the stochastic-analog computational flow is developed. Memory resources are assigned for computations across different layers based on the input tokens. Accordingly, each memory bank processes and stores the intermediate results related to a specific set of tokens, thereby reducing the amount of data transferred between layers. Lightweight intra- and inter-bank microarchitectures enhancements are devised to aggressively reduce data movement latencies and energy overheads. I will show promising initial results of how ARTEMIS is able to outperform GPU, TPU, CPU, and several state-of-the-art processing-in-memory LLM accelerators.
Biography
Sudeep Pasricha is the Aram and Helga Budak Endowed Professor in the Department of Electrical and Computer Engineering, the Department of Computer Science, and the Department of Systems Engineering at Colorado State University. He is Director of the Embedded, High Performance, and Intelligent Computing (EPIC) Laboratory and the Chair of Computer Engineering. He received the B.E. degree in Electronics and Communication Engineering from Delhi Institute of Technology, India, in 2000, and his Ph.D. in Computer Science from the University of California, Irvine in 2008. He joined Colorado State University (CSU) in 2008. Prior to joining CSU, he spent several years working in STMicroelectronics and Conexant Inc. His research focuses on the design and application of innovative software algorithms (particularly AI and machine learning), hardware architectures, and hardware-software co-design techniques for energy-efficient, fault-tolerant, real-time, and secure computing. He has co-authored seven books, multiple patents, and published more than 350 research articles in peer-reviewed journals and conferences, and workshops. His research has been funded by various sponsors including NSF, SRC, AFOSR, DOE, ORNL, DoD, Fiat-Chrysler, HPE, and NASA. He has served as General Chair and Program Committee Chair for multiple IEEE and ACM conferences and also served in the Editorial board of multiple IEEE and ACM journals. He has received 17 Best Paper Awards and Nominations at various IEEE and ACM conferences. Other honors and awards include: 2025 CSU Board of Governors Excellence in Graduate Teaching Award, 2025 Aram and Helga Budak Professorship, 2024 ECE Excellence in Teaching Award, 2019 George T. Abell Outstanding Research Faculty Award, the 2016-2018 University Distinguished Monfort Professorship, 2016-2019 Walter Scott Jr. College of Engineering Rockwell-Anderson Professorship, 2018 IEEE-CS/TCVLSI Mid-Career Research Achievement Award, the 2015 IEEE/TCSC Award for Excellence for a Mid-Career Researcher, the 2014 George T. Abell Outstanding Mid-Career Faculty Award, and the 2013 AFOSR Young Investigator Award. For professional service, he has received the 2019 ACM SIGDA Distinguished Service Award, the 2015 ACM SIGDA Service Award, and the 2012 ACM SIGDA Technical Leadership Award. He is a Fellow of the IEEE, Fellow of AAIA, Fellow of AIIA, Distinguished Member of the ACM, an IEEE CEDA Distinguished Lecturer, and an ACM Distinguished Speaker.
If you wish to modify any information or update your photo, please contact Web Chair Arief Wicaksana.