Fix synchronized memcpy in GPT #7008

Wong4j · 2023-09-12T09:22:26Z

PR types

Performance optimization

PR changes

Others

Description

Avoid synchronized memcpy in GPT pretraining

paddle-bot · 2023-09-12T09:22:39Z

Thanks for your contribution!

ZHUI · 2023-09-13T03:03:26Z

paddlenlp/transformers/gpt/modeling.py

+            if loss_mask is None:
+                loss_mask = (masked_lm_loss > 0).astype("float32")
+                loss_mask = loss_mask.reshape([-1])
+            masked_lm_loss = paddle.sum(masked_lm_loss.reshape([-1]) * loss_mask)


Support for using custom loss mask?

这里的改动主要是因为 masked_lm_loss[masked_lm_loss > 0] 的写法会导致D2H的copy。改成loss_mask与lm_loss相乘，得到masked_lm_loss，两种是等效的，但不会有D2H copy。

是 slice op的原因吗？

这里可能需要注意下最后 masked_lm_loss 的数据类型，希望是 float32的

原实现masked_lm_loss = masked_lm_loss[masked_lm_loss > 0].astype("float32")，返回的masked_lm_loss的shape跟masked_lm_loss > 0比较结果中True的个数有关，因此需要把masked_lm_loss > 0比较结果中True的个数传回CPU，因此需要一个DtoH拷贝。masked_lm_loss[masked_lm_loss > 0]的实现无法避免这个DtoH的。

PR中的修改避开了getitem操作，实现了同样的功能，并且避免了DtoH拷贝。

这里可能需要注意下最后 masked_lm_loss 的数据类型，希望是 float32的

现在的写法应该可以确保是float32吧？

ZHUI

LGTM

codecov · 2023-09-19T07:07:43Z

Codecov Report

Merging #7008 (2bd5a46) into develop (e49842c) will decrease coverage by 0.17%.
Report is 35 commits behind head on develop.
The diff coverage is 2.62%.

@@             Coverage Diff             @@
##           develop    #7008      +/-   ##
===========================================
- Coverage    60.06%   59.90%   -0.17%     
===========================================
  Files          552      554       +2     
  Lines        81755    81976     +221     
===========================================
  Hits         49105    49105              
- Misses       32650    32871     +221

Files Changed	Coverage Δ
paddlenlp/experimental/transformers/__init__.py	`0.00% <0.00%> (ø)`
...dlenlp/experimental/transformers/bloom/__init__.py	`0.00% <0.00%> (ø)`
...dlenlp/experimental/transformers/bloom/modeling.py	`0.00% <0.00%> (ø)`
...erimental/transformers/fused_transformer_layers.py	`0.00% <0.00%> (ø)`
...enlp/experimental/transformers/generation_utils.py	`0.00% <0.00%> (ø)`
paddlenlp/trainer/trainer.py	`55.37% <100.00%> (+<0.01%)`	⬆️
paddlenlp/transformers/gpt/modeling.py	`66.51% <100.00%> (+0.15%)`	⬆️

... and 1 file with indirect coverage changes

fix synchronized memcpy in GPT

5dddf0c

paddle-bot bot added the contributor label Sep 12, 2023

Merge branch 'PaddlePaddle:develop' into jaywan/fix_sync_memcpy

2bd5a46

ZHUI reviewed Sep 13, 2023

View reviewed changes

ZHUI approved these changes Sep 14, 2023

View reviewed changes

ZHUI merged commit 5748b69 into PaddlePaddle:develop Sep 19, 2023

ZHUI mentioned this pull request Jan 2, 2024

PaddleNLP 2.7.0 Release Note Candidate #7753

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix synchronized memcpy in GPT #7008

Fix synchronized memcpy in GPT #7008

Wong4j commented Sep 12, 2023

paddle-bot bot commented Sep 12, 2023

ZHUI Sep 13, 2023

Wong4j Sep 13, 2023

ZHUI Sep 13, 2023

ZHUI Sep 13, 2023

Xreki Sep 13, 2023 •

edited

Loading

Wong4j Sep 13, 2023

ZHUI left a comment

codecov bot commented Sep 19, 2023

Fix synchronized memcpy in GPT #7008

Fix synchronized memcpy in GPT #7008

Conversation

Wong4j commented Sep 12, 2023

PR types

PR changes

Description

paddle-bot bot commented Sep 12, 2023

ZHUI Sep 13, 2023

Choose a reason for hiding this comment

Wong4j Sep 13, 2023

Choose a reason for hiding this comment

ZHUI Sep 13, 2023

Choose a reason for hiding this comment

ZHUI Sep 13, 2023

Choose a reason for hiding this comment

Xreki Sep 13, 2023 • edited Loading

Choose a reason for hiding this comment

Wong4j Sep 13, 2023

Choose a reason for hiding this comment

ZHUI left a comment

Choose a reason for hiding this comment

codecov bot commented Sep 19, 2023

Codecov Report

Xreki Sep 13, 2023 •

edited

Loading