Summary
Overview
Work History
Education
Skills
Timeline
Generic
Hongrui Zhan

Hongrui Zhan

Computer Science
Hefei,AH

Summary

Skilled in C++ and Python, I complete innovative ML solutions in Megatron-LM, enhancing MoE model efficiency and training speed. I tried to introduce expert parallelism into the vLLM framework and made corresponding CUDA Kernel adaptations.I have a strong passion for learning technical knowledge in the fields of AI and HPC, and can quickly master the techniques and principles involved.

Overview

1
1
year of professional experience

Work History

Adding Expert Parallelism in VLLM

02.2024 - Current
  • Developing tensor parallelism and expert parallelism independent of non-MoE parts in vLLM.
  • Modify some cuda kernels related to MoE operators in vLLM to adapt to flexible expert parallelism.
  • Find some performance optimization opportunities for MoE model inference.

LazyCopy for MoE All-to-All Communication

10.2024 - 12.2024
  • Discover the redundancy in MoE model training, design and implementation a no redundancy All-to-All method in Megatron-LM framework.
  • Develop some simple cuda kernels as a torch extension to speedup operations of tensor indexes.
  • Evaluate the training speedup of lazycopy All-to-All methods.

Communication-Friendly MoE Structure

05.2024 - 08.2024
  • Design communication-efficient MoE structure (which is called BigMac) and implement it in Megatron-LM framework.
  • Develop tensor parallelism for BigMac structure in Megatron-LM framework.
  • Evaluate the training speedup of the new model and confirm that the model accuracy loss compared to the baseline is acceptable.

Education

Master of Science - Computer Science

University of Science And Technology of China
Hefei, Anhui
04.2001 -

Bachelor of Science - Computer Science

ShanDong University
Qing Dao
04.2001 -

Skills

Proficient in C and Python development in ML training and inference framework

Timeline

LazyCopy for MoE All-to-All Communication

10.2024 - 12.2024

Communication-Friendly MoE Structure

05.2024 - 08.2024

Adding Expert Parallelism in VLLM

02.2024 - Current

Master of Science - Computer Science

University of Science And Technology of China
04.2001 -

Bachelor of Science - Computer Science

ShanDong University
04.2001 -
Hongrui ZhanComputer Science