-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NPU] support npu llama2-13B export & inference #8442
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8442 +/- ##
===========================================
- Coverage 55.42% 54.29% -1.14%
===========================================
Files 617 617
Lines 96281 96339 +58
===========================================
- Hits 53366 52304 -1062
- Misses 42915 44035 +1120 ☔ View full report in Codecov by Sentry. |
a59da09
to
0ee4655
Compare
csrc_npu/README.md
Outdated
@@ -0,0 +1,14 @@ | |||
# PaddleNLP 自定义 OP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里建议不要单独创建新的csrc目录,因为后续多硬件接入会非常多,建议直接在csrc目录创建一个npu目录
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改,csrc_npu -> csrc/npu
csrc_npu/README.md
Outdated
|
||
# 1. 安装 PaddleCustomDevice | ||
|
||
参考 [PaddleCustomDevice NPU 安装文档](https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/npu/README_cn.md) 进行安装 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前CustomDevice有NPU编译后的版本吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前没有用于高性能推理的包,推荐是自行编译
@@ -570,7 +570,7 @@ def compute_layernorm_before_qkv(self, src, i): | |||
return ln_out | |||
|
|||
def compute_qkv_linear(self, ln_out, i): | |||
if float(paddle.version.cuda()) < 11.6: | |||
if paddle.version.cuda() == "False" or float(paddle.version.cuda()) < 11.6: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
昆仑芯推理也是走了这个逻辑吗,如果是对昆仑芯推理有影响吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没有影响,这里只影响 paddle-cpu 版本 (npu用的paddle-cpu版本) 走上面这个分支,否则 float(paddle.version.cuda()) 会报错,cpu版本的paddle.version.cuda() 返回的是字符串 False
if predictor_args.device == "npu": | ||
from llama.npu.export_utils import process_params | ||
|
||
process_params(os.path.join(export_args.output_path, predictor_args.model_prefix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里对NPU的模型的op的attr进行修改的原因是什么了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPU高性能推理时
- 权重转置后的矩阵乘性能更好
- npu的dequant scale有特殊格式
- 这里修改避免引入硬件强相关的代码到组网
PR types
New features
PR changes
Others
Description
support npu llama2-13B export & inference