[NPU] support npu llama2-13B export & inference #8442

ronny1996 · 2024-05-15T09:15:07Z

PR types

New features

PR changes

Others

Description

support npu llama2-13B export & inference

paddle-bot · 2024-05-15T09:15:12Z

Thanks for your contribution!

codecov · 2024-05-15T09:44:09Z

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 54.29%. Comparing base (05acad5) to head (fefe28d).
Report is 218 commits behind head on develop.

Files	Patch %	Lines
...erimental/transformers/fused_transformer_layers.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8442      +/-   ##
===========================================
- Coverage    55.42%   54.29%   -1.14%     
===========================================
  Files          617      617              
  Lines        96281    96339      +58     
===========================================
- Hits         53366    52304    -1062     
- Misses       42915    44035    +1120

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

llm/llama/npu/llama_npu_process_params.py

llm/predictor.py

wawltor · 2024-05-20T03:22:07Z

csrc_npu/README.md

@@ -0,0 +1,14 @@
+# PaddleNLP 自定义 OP


这里建议不要单独创建新的csrc目录，因为后续多硬件接入会非常多，建议直接在csrc目录创建一个npu目录

已修改，csrc_npu -> csrc/npu

wawltor · 2024-05-20T03:22:35Z

csrc_npu/README.md

+
+# 1. 安装 PaddleCustomDevice
+
+参考 [PaddleCustomDevice NPU 安装文档](https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/npu/README_cn.md) 进行安装


目前CustomDevice有NPU编译后的版本吗？

目前没有用于高性能推理的包，推荐是自行编译

wawltor · 2024-05-20T03:27:14Z

paddlenlp/experimental/transformers/fused_transformer_layers.py

@@ -570,7 +570,7 @@ def compute_layernorm_before_qkv(self, src, i):
        return ln_out

    def compute_qkv_linear(self, ln_out, i):
-        if float(paddle.version.cuda()) < 11.6:
+        if paddle.version.cuda() == "False" or float(paddle.version.cuda()) < 11.6:


昆仑芯推理也是走了这个逻辑吗，如果是对昆仑芯推理有影响吗？

没有影响，这里只影响 paddle-cpu 版本 (npu用的paddle-cpu版本) 走上面这个分支，否则 float(paddle.version.cuda()) 会报错，cpu版本的paddle.version.cuda() 返回的是字符串 False

wawltor · 2024-05-20T03:31:47Z

llm/export_model.py

+    if predictor_args.device == "npu":
+        from llama.npu.export_utils import process_params
+
+        process_params(os.path.join(export_args.output_path, predictor_args.model_prefix))


这里对NPU的模型的op的attr进行修改的原因是什么了？

NPU高性能推理时

权重转置后的矩阵乘性能更好

npu的dequant scale有特殊格式

这里修改避免引入硬件强相关的代码到组网

qili93 reviewed May 15, 2024

View reviewed changes

llm/llama/npu/llama_npu_process_params.py Outdated Show resolved Hide resolved

llm/predictor.py Outdated Show resolved Hide resolved

ronny1996 force-pushed the llama2_dev branch 3 times, most recently from a59da09 to 0ee4655 Compare May 16, 2024 11:55

[NPU] support npu llama2-13B export & inference

044cd6a

ronny1996 force-pushed the llama2_dev branch from 0ee4655 to 044cd6a Compare May 17, 2024 06:27

wawltor reviewed May 20, 2024

View reviewed changes

move csrc_npu to csrc/npu

fefe28d

wawltor merged commit 87e4c4f into PaddlePaddle:develop May 20, 2024
7 of 12 checks passed

ronny1996 deleted the llama2_dev branch May 20, 2024 14:38

ronny1996 mentioned this pull request Jul 10, 2024

revert benchmark fix #8747

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU] support npu llama2-13B export & inference #8442

[NPU] support npu llama2-13B export & inference #8442

ronny1996 commented May 15, 2024

paddle-bot bot commented May 15, 2024

codecov bot commented May 15, 2024 •

edited

Loading

wawltor May 20, 2024

ronny1996 May 20, 2024

wawltor May 20, 2024

ronny1996 May 20, 2024

wawltor May 20, 2024

ronny1996 May 20, 2024 •

edited

Loading

wawltor May 20, 2024

ronny1996 May 20, 2024


		# 1. 安装 PaddleCustomDevice

		参考 [PaddleCustomDevice NPU 安装文档](https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/npu/README_cn.md) 进行安装

[NPU] support npu llama2-13B export & inference #8442

[NPU] support npu llama2-13B export & inference #8442

Conversation

ronny1996 commented May 15, 2024

PR types

PR changes

Description

paddle-bot bot commented May 15, 2024

codecov bot commented May 15, 2024 • edited Loading

Codecov Report

wawltor May 20, 2024

Choose a reason for hiding this comment

ronny1996 May 20, 2024

Choose a reason for hiding this comment

wawltor May 20, 2024

Choose a reason for hiding this comment

ronny1996 May 20, 2024

Choose a reason for hiding this comment

wawltor May 20, 2024

Choose a reason for hiding this comment

ronny1996 May 20, 2024 • edited Loading

Choose a reason for hiding this comment

wawltor May 20, 2024

Choose a reason for hiding this comment

ronny1996 May 20, 2024

Choose a reason for hiding this comment

codecov bot commented May 15, 2024 •

edited

Loading

ronny1996 May 20, 2024 •

edited

Loading