17th INTERNATIONAL FORUM ON MPSoC
for software-defined hardware
Speaker's Profile
Song Yao
DeePhi Tech, China
Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design
Download SlidesAbstract
Bandwidth matters. The performance of deep learning computing platform largely depends on the memory system and the bandwidth. Sparsity and low precision are necessary to achieve high energy efficiency for deep learning inference. However, this cannot rely solely on hardware.
In this talk, we will introduce a software-hardware co-design methodology for accelerating deep learning algorithm which consists of compression, compilation, and hardware acceleration. A full-stack software development kit called DNNDK is proposed for developing compressed sparse neural networks. Two customized architectures called Aristotle and Descartes are also proposed to accelerate compressed neural networks. With the proposed methodology, even on FPGA, it is possible to achieve more than 10x energy efficiency compared with latest GPU product.
Biography