-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Fused Mixtral support #8901
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8901 +/- ##
===========================================
- Coverage 54.80% 54.05% -0.75%
===========================================
Files 647 650 +3
Lines 102474 104427 +1953
===========================================
+ Hits 56157 56445 +288
- Misses 46317 47982 +1665 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
5b4384c
to
83a2000
Compare
paddlenlp/experimental/transformers/fused_transformer_layers.py
Outdated
Show resolved
Hide resolved
paddlenlp/experimental/transformers/fused_transformer_layers.py
Outdated
Show resolved
Hide resolved
6070538
to
2b8afcf
Compare
paddlenlp/experimental/transformers/fused_transformer_layers.py
Outdated
Show resolved
Hide resolved
paddlenlp/experimental/transformers/fused_transformer_layers.py
Outdated
Show resolved
Hide resolved
paddlenlp/experimental/transformers/fused_transformer_layers.py
Outdated
Show resolved
Hide resolved
89ff18d
to
3e459a5
Compare
return self.num_experts > 1 | ||
|
||
def use_moe(self, i: int) -> bool: | ||
return self.has_moe() and (self.moe_every2 is False or (self.moe_every2 and i % 2 == 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个判断有点诡异,万一我是每隔四层换成moe layer呢。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不过只针对mixtral的话,暂时先这样吧。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,如果是有4,8...的需求,个人感觉可以把moe_every参数改为一个枚举,利用枚举来做判断,目前在做其他的支持,后续可以提交一个PR再修改
建议后面新增相关单测,确保功能正确性。 @penPenf28 @yuanlehome |
@@ -1128,6 +1154,29 @@ def compute_out_linear(self, fmha_out, i): | |||
weight_dtype=self.weight_dtype, | |||
) | |||
|
|||
def compute_fused_moe(self, tmp_out, i): | |||
# todo[xinhw]: make bias optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需尽早修复此bug
@@ -713,6 +794,29 @@ def compute_ffn_layernorm(self, out_linear_out, residual_input, i): | |||
|
|||
return tmp_out, residual_input | |||
|
|||
def compute_fused_moe(self, tmp_out, i): | |||
# todo[xinhw]: make bias optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需尽早修复此bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* [Feature] Fused Mixtral support * [Refactor] add MoeConfig and fix static graph export problem * [Bugfix] fix small bug * [Bugfix] fix moe_config bug * [Bugfix] fix moe_config bug * [Refactor] refine code * [Refactor] refine code * [Refactor] refine code * [Refactor] match fused moe api change * [Feature] wint8 support
PR types
New features
PR changes
Models
Description
增加了高性能版本Mixtral-8x7B-Instruct-v0.1模型的支持,目前支持bfloat16+wint8,模型包括非block和block版本;
目前代码中包括一些冗余的量化部分,后续会进行修改添加相关的量化支持