Summary
Overview
Work History
Education
Skills
Timeline
Generic
Hongrui Zhan

Hongrui Zhan

Computer Science
Hefei,AH

Summary

Skilled in C++ and Python, I complete innovative ML solutions in Megatron-LM, enhancing MoE model efficiency and training speed. I tried to introduce expert parallelism into the vLLM framework and made corresponding CUDA Kernel adaptations.I have a strong passion for learning technical knowledge in the fields of AI and HPC, and can quickly master the techniques and principles involved.

Overview

1
1
year of professional experience

Work History

Adding Expert Parallelism in VLLM

02.2024 - Current
  • Developing tensor parallelism and expert parallelism independent of non-MoE parts in vLLM.
  • Modify some cuda kernels related to MoE operators in vLLM to adapt to flexible expert parallelism.
  • Find some performance optimization opportunities for MoE model inference.

LazyCopy for MoE All-to-All Communication

10.2024 - 12.2024
  • Discover the redundancy in MoE model training, design and implementation a no redundancy All-to-All method in Megatron-LM framework.
  • Develop some simple cuda kernels as a torch extension to speedup operations of tensor indexes.
  • Evaluate the training speedup of lazycopy All-to-All methods.

Communication-Friendly MoE Structure

05.2024 - 08.2024
  • Design communication-efficient MoE structure (which is called BigMac) and implement it in Megatron-LM framework.
  • Develop tensor parallelism for BigMac structure in Megatron-LM framework.
  • Evaluate the training speedup of the new model and confirm that the model accuracy loss compared to the baseline is acceptable.

Education

Master of Science - Computer Science

University of Science And Technology of China
Hefei, Anhui
04.2001 -

Bachelor of Science - Computer Science

ShanDong University
Qing Dao
04.2001 -

Skills

Proficient in C and Python development in ML training and inference framework

Familiar with CUDA development

Experience in distributed deployment of LLM training and inference tasks

Able to quickly master new technologies in the LLM field and have a strong interest in AI infra technology

Timeline

LazyCopy for MoE All-to-All Communication

10.2024 - 12.2024

Communication-Friendly MoE Structure

05.2024 - 08.2024

Adding Expert Parallelism in VLLM

02.2024 - Current

Master of Science - Computer Science

University of Science And Technology of China
04.2001 -

Bachelor of Science - Computer Science

ShanDong University
04.2001 -
Hongrui ZhanComputer Science