Norbert Wehn
University of Kaiserslautern, Germany
A Novel Adaptive Quantization Methodology for 8-bit Floating-Point DNN Training
Abstract
There is a high energy cost associated with training Deep Neural Networks. Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width. However, low-bit-width data formats suffer from a limited dynamic range, resulting in reduced accuracy. In this talk we present a novel 8-bit Floating Point (FP8) data format quantized DNN training methodology, which adapts to the required dynamic range on-the-fly. Our approach is compatible with any DNN compute cores without any major modifications to the architecture. Results show that the DRAM access energy is reduced by 3.07× while using an 8-bit data format instead of using 32-bit. The accuracy loss of the proposed methodology with 8-bit quantized training is ≈ 1% for various networks with image and natural language processing datasets.
Biography
Norbert holds the chair for Microelectronic System Design in the department of Electrical Engineering and Information Technology at the University of Kaiserslautern-Landau. He has more than 500 publications in various fields of microelectronic system design and holds several patents. His special research interests are VLSI-architectures for mobile communication, forward error correction techniques, low-power techniques, advanced SoC and memory architectures, postquantum cryptography, reliability challenges in SoC, machine learning, IoT and smart learning environments.
Download
norbert-wehn.pdfIf you wish to modify any information or update your photo, please contact Web Chair Hiroki Matsutani.