Yaswanth Raparti
AMD
Exploiting Structured Sparsity in NPUs for Efficient Inference
Abstract
Deep learning inference is rapidly being adapted on edge and client devices for a wide variety of applications ranging from image processing, translation, to navigation and text/image generation. These inference tasks are offloaded to specialized processors called neural processing units (NPUs). NPUs are more efficient than GPU for ML type workload as they are designed to churn linear operations like matrix multiplications with low control overhead. However, there are some limitations with NPUs like lack of high DRAM bandwidth, lack of powerful caches, and strict power constraints. In this talk, I will focus on techniques to exploit structured sparsity in deep learning models to improve the performance and energy efficiency of low power NPUs.
Biography
Yaswanth Raparti is a design engineering manager at RyzenAI group in AMD. He received his PhD in Electrical and Computer Engineering at Colorado State University in 2019. His research primarily focuses on cross-layer frameworks to improve performance, utilization, reliability, and security of manycore accelerators in AI/ML era. He has received a best paper award for his research on network-on-chip and memory scheduling for GPGPUs. He has authored several papers in peer-reviewed journals and conferences, a book chapter, and received 2 patents. Prior to his current role, he worked on designing high speed SSD and memory controllers for bleeding edge datacenters, at Micron and Samsung.
Download
yaswanth-raparti.pdfIf you wish to modify any information or update your photo, please contact Web Chair Hiroki Matsutani.