Update set_seed in trainer_utils.py #7528

AnnaTrainingG · 2023-11-27T10:52:04Z

PR types

Others

PR changes

Others

Description

修复PP分布式策略下多卡之前seed设置错误的问题

paddle-bot · 2023-11-27T10:52:10Z

Thanks for your contribution!

ZHUI

LGTM

ZHUI · 2023-11-27T12:28:52Z

参考这 https://github.com/PaddlePaddle/PaddleNLP/pull/5590/files

codecov · 2023-12-04T03:59:48Z

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (be318c5) 57.92% compared to head (e122535) 57.86%.
Report is 15 commits behind head on develop.

Files	Patch %	Lines
paddlenlp/trainer/trainer_utils.py	75.00%	7 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #7528      +/-   ##
===========================================
- Coverage    57.92%   57.86%   -0.06%     
===========================================
  Files          579      582       +3     
  Lines        86390    86492     +102     
===========================================
+ Hits         50038    50046       +8     
- Misses       36352    36446      +94

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ZHUI · 2023-12-04T07:52:59Z

llm/gpt-3/run_pretrain.py

+        init_dist_env(
+            training_args.tensor_parallel_degree,
+            training_args.sharding_parallel_degree,
+            training_args.pipeline_parallel_degree,
+            training_args.data_parallel_degree,
+            training_args.seed,
+        )


这个环境已经在 training_args 里面初始化好了。

已经删除

ZHUI · 2023-12-04T07:53:51Z

paddlenlp/trainer/trainer_utils.py

+    """
+
+    # set control in tensor parallel
+    print("init_dist_env asdfasdfasdf  niuliling")


测试的flag 删除

ZHUI

LGTM

ZHUI

LGTM

ZHUI · 2023-12-04T11:36:08Z

paddlenlp/trainer/trainer_utils.py

-        if args.use_hybrid_parallel:
-            from paddle.distributed.fleet.meta_parallel import get_rng_state_tracker
+    else:
+        hcg = fleet.get_hybrid_communicate_group() if hasattr(fleet.fleet, "_hcg") else None


hasattr(fleet.fleet, "_hcg") 这个属性是初始化分布式了之后才有？

cpu版本paddle

gpu版本paddle 跑cpu

gpu版本跑gpu

2的情况下fleet.fleet._hcg是否是None。

是的init之后才有hcg 或者init_dist_env之后才有hcg
2的时候不是none, 当未初始化的时候才是none

Update trainer_utils.py

f7a5dff

Update trainer_utils.py

e52eba7

ZHUI previously approved these changes Nov 27, 2023

View reviewed changes

update

d28c5ac

AnnaTrainingG dismissed ZHUI’s stale review via d28c5ac November 28, 2023 06:39

AnnaTrainingG and others added 3 commits November 28, 2023 14:42

Merge branch 'develop' into pp_set_seed

51180a4

Update

59d8a5f

update

bdf75bf

AnnaTrainingG changed the title ~~Update trainer_utils.py~~ Update set_seed in trainer_utils.py Dec 4, 2023

ZHUI reviewed Dec 4, 2023

View reviewed changes

Update

46ce3fc

ZHUI previously approved these changes Dec 4, 2023

View reviewed changes

update

e122535

AnnaTrainingG dismissed ZHUI’s stale review via e122535 December 4, 2023 09:20

ZHUI reviewed Dec 4, 2023

View reviewed changes

ZHUI approved these changes Dec 5, 2023

View reviewed changes

ZHUI merged commit f3607d5 into PaddlePaddle:develop Dec 5, 2023

ZHUI mentioned this pull request Jan 2, 2024

PaddleNLP 2.7.0 Release Note Candidate #7753

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update set_seed in trainer_utils.py #7528

Update set_seed in trainer_utils.py #7528

AnnaTrainingG commented Nov 27, 2023 •

edited

Loading

paddle-bot bot commented Nov 27, 2023

ZHUI left a comment

ZHUI commented Nov 27, 2023 •

edited

Loading

codecov bot commented Dec 4, 2023 •

edited

Loading

ZHUI Dec 4, 2023

AnnaTrainingG Dec 4, 2023

ZHUI Dec 4, 2023

AnnaTrainingG Dec 4, 2023

ZHUI left a comment

ZHUI left a comment

ZHUI Dec 4, 2023

AnnaTrainingG Dec 5, 2023

Update set_seed in trainer_utils.py #7528

Update set_seed in trainer_utils.py #7528

Conversation

AnnaTrainingG commented Nov 27, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Nov 27, 2023

ZHUI left a comment

Choose a reason for hiding this comment

ZHUI commented Nov 27, 2023 • edited Loading

codecov bot commented Dec 4, 2023 • edited Loading

Codecov Report

ZHUI Dec 4, 2023

Choose a reason for hiding this comment

AnnaTrainingG Dec 4, 2023

Choose a reason for hiding this comment

ZHUI Dec 4, 2023

Choose a reason for hiding this comment

AnnaTrainingG Dec 4, 2023

Choose a reason for hiding this comment

ZHUI left a comment

Choose a reason for hiding this comment

ZHUI left a comment

Choose a reason for hiding this comment

ZHUI Dec 4, 2023

Choose a reason for hiding this comment

AnnaTrainingG Dec 5, 2023

Choose a reason for hiding this comment

AnnaTrainingG commented Nov 27, 2023 •

edited

Loading

ZHUI commented Nov 27, 2023 •

edited

Loading

codecov bot commented Dec 4, 2023 •

edited

Loading