Takahide Yoshikawa

Fujitsu Ltd., Japan

Building the LLM model on the supercomputer Fugaku toward the AI for Science

Abstract

Recently, there has been a growing demand for AI in science and engineering analysis, for example, the AI surrogate models that can replace detailed simulations and the foundation models that support precise analysis and reasoning based on extensive domain-specific knowledge. These technologies should be combined with HPC, such as detailed numerical simulations, and thus, it is desirable that these can be developed and operated on large-scale supercomputer systems. Toward this trend, we collaborated with RIKEN and Science Tokyo and developed an AI framework that will support the generalization and operation of AI models for science on the Fugaku. Using the framework, we have successfully made a large language model with 13 billion parameters, specialized for the Japanese context. In this talk, I will first introduce a basic scheme to parallelize the AI training, such as data, tensor, and pipeline parallelism, and then, show our scheme to optimize the training process on the Fugaku through the partition of the tasks within nodes and the optimization of inter-node communication.

Biography

Takahide Yoshikawa is a Project Director of Next Architecture Project, Fujitsu Research at Fujitsu Ltd. He received his B.E., M.E., and Ph.D. degrees from the University of Tokyo in 1994, 1996, 2002, respectively, and he is a Senior Member of IEEE. He has been involved in various server systems projects, such as the K computer and Fugaku. In the K computer project, he proposed and implemented the whole verification, validation, and test system of its interconnect, Tofu. In Fugaku, he led the verification and validation of the CPU. Currently, he is tackling research on the architecture of the future high-performance computing system.

If you wish to modify any information or update your photo, please contact Web Chair Arief Wicaksana.