Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVIDIA] Add stage2 NCCL kernel overlap #7092

Merged
merged 2 commits into from
Sep 22, 2023

Conversation

Tom-Zheng
Copy link
Contributor

PR types

Performance optimization

PR changes

Others

Description

This PR adds NCCL kernel overlap feature to stage2 FSDP training. It brings 1.5% E2E speedup in GPT training.

@paddle-bot
Copy link

paddle-bot bot commented Sep 20, 2023

Thanks for your contribution!

@Tom-Zheng
Copy link
Contributor Author

Add @jeng1220 for vis.

@codecov
Copy link

codecov bot commented Sep 20, 2023

Codecov Report

Merging #7092 (d4397a6) into develop (a99cc55) will decrease coverage by 0.01%.
The diff coverage is 23.07%.

@@             Coverage Diff             @@
##           develop    #7092      +/-   ##
===========================================
- Coverage    59.75%   59.75%   -0.01%     
===========================================
  Files          559      559              
  Lines        82347    82359      +12     
===========================================
+ Hits         49210    49213       +3     
- Misses       33137    33146       +9     
Files Changed Coverage Δ
paddlenlp/trainer/training_args.py 52.35% <0.00%> (-0.33%) ⬇️
paddlenlp/trainer/trainer.py 54.85% <33.33%> (-0.16%) ⬇️

@Tom-Zheng Tom-Zheng force-pushed the gh_add_stage2_comm_overlap branch from 3bed524 to f470b71 Compare September 21, 2023 03:37
@ZHUI ZHUI requested a review from FeixLiu September 21, 2023 08:49
@@ -1576,6 +1585,9 @@ def get_expected_keys(inputs, keys):
offload=cpu_offload,
**extra_kwargs,
)
if level == "os_g":
model._set_reduce_overlap(True)
Copy link
Contributor

@FeixLiu FeixLiu Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a flag to control whether using overlap or not? There are some constraints for the overlap, such as the logging_step should bigger than 1 for broadcast overlap and no other sync could be called during the training for broadcast overlap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Tom-Zheng Tom-Zheng force-pushed the gh_add_stage2_comm_overlap branch from f470b71 to d4397a6 Compare September 22, 2023 05:52
@Tom-Zheng
Copy link
Contributor Author

@FeixLiu Would you please take a look again?

Copy link
Contributor

@FeixLiu FeixLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wawltor wawltor merged commit 060fbf2 into PaddlePaddle:develop Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants