Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove delay_scale_loss and release_grads for llama-2 13B's benchmark. #8623

Merged
merged 1 commit into from
Jun 19, 2024

Conversation

Xreki
Copy link
Contributor

@Xreki Xreki commented Jun 19, 2024

PR types

Others

PR changes

Others

Description

模型 训练策略 分支 训练吞吐 max memory reserved(日志中)
Llama-2 13B pp4sharding8-vpp5-mbs1-acc4 develop 1991.236 48.738
Llama-2 13B pp4sharding8-vpp5-mbs1-acc4 去掉release_grads 2037.899 (+2.34%) 53.602
Llama-2 13B pp4sharding8-vpp5-mbs1-acc4 去掉delay_scale_loss 2051.128 (+0.65%) 53.602

Llama-2 13B性能提升说明:

  • release_grads策略可以节省峰值显存占用,但是每个训练step结束后会释放梯度所占空间,并在下一个step重新申请和初始化,故而会引入一定的开销。Llama-2 13B模型并没有打满显存,故可以移除该选项
  • delay_scale_loss策略是为了优化收敛,一方面相比较的竞品没有使用该策略,另一方面该策略在会引入一个设备同步、影响sharding allgather overlap的效果。
    if self.args.gradient_accumulation_steps > 1 and self._enable_delay_scale_loss():
    paddle.device.synchronize()
    for p in model._layers.parameters():
    with paddle.no_grad():
    if hasattr(p, "main_grad") and p.main_grad is not None:
    assert p.grad is None
    p.main_grad.scale_(1.0 / self.args.gradient_accumulation_steps)
    elif p.grad is not None:
    p.grad.scale_(1.0 / self.args.gradient_accumulation_steps)

Copy link

paddle-bot bot commented Jun 19, 2024

Thanks for your contribution!

Copy link

codecov bot commented Jun 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.18%. Comparing base (cd2a70e) to head (d98e9e7).
Report is 241 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #8623   +/-   ##
========================================
  Coverage    54.18%   54.18%           
========================================
  Files          625      625           
  Lines        98947    98947           
========================================
  Hits         53618    53618           
  Misses       45329    45329           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ZHUI ZHUI merged commit 970b868 into PaddlePaddle:develop Jun 19, 2024
9 of 11 checks passed
@Xreki Xreki deleted the opt_llama2_benchmark branch June 19, 2024 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants