Loading
MPSoC 2024
  • Home
  • Commitees
  • Agenda
  • Speakers
  • Registration
  • Venue & Hotel
  • MPSoC Book
  • Menu

Takahide Yoshikawa

Fujitsu Ltd., Japan

How do we debug and verify errors on 100,000-node systems? Insights through our supercomputer system projects

Abstract

Fujitsu has developed the world’s fastest supercomputer systems, including K Computer in 2012 and supercomputer Fugaku in 2020. Such supercomputer systems are very large and complex, with about 100,000 nodes containing over 7 million cores. Once a hardware or software bug causes an error after such a large-scale system is put into operation, identifying the root cause and correcting it should be very costly and time-consuming. For example, in a system with 100,000 nodes, just 1 MB of trace data per node would amount to more than 100 GB of data, which would require many difficulties just to analyze. In order to prevent this, it is critically essential to guarantee the operation of a large system within smaller verification environments (such as one module, one node, few nodes, and few shelves) using various kinds of verification technologies (such as verification support functions in CPU, firmware, and OS, and also specific test equipment for post-silicon validations). In this presentation, I will introduce various technologies for guaranteeing the stable operation of large-scale systems based on our experiences through the Fugaku project.

Biography

Takahide Yoshikawa is a Project Director of Next Architecture Project, Fujitsu Research at Fujitsu Ltd. He received his B.E., M.E., and Ph.D. degrees from the University of Tokyo in 1994, 1996, 2002, respectively, and he is a Senior Member of IEEE. He has been involved in various server systems projects, such as the K computer and Fugaku. In the K computer project, he proposed and implemented the whole verification, validation, and test system of its interconnect, Tofu. In Fugaku, he led the verification and validation of the CPU. Currently, he is tackling research on the architecture of the future high-performance computing system.

Download

takahide-yoshikawa.pdf

If you wish to modify any information or update your photo, please contact Web Chair Hiroki Matsutani.

Contact

Please address any issue to General Chair Masaaki Kondo.

Active Pages

  • Agenda
  • Commitees
  • MPSoC Book
  • Registration
  • Speakers
  • Venue & Hotel
© Copyright - MPSoC 2024 | Privacy Policy
Scroll to top