-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ernie3.0 python deploy dev #2077
Ernie3.0 python deploy dev #2077
Conversation
@@ -0,0 +1,38 @@ | |||
# Ernie-3.0 Python部署说明 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
新增deploy层级
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- onnxruntime-gpu >= 1.10.0 | ||
- paddleinference-trt | ||
- onnx >= 1.10.0 | ||
- paddle2onnx develop版本 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
到时paddlenlp内置paddle2onnx,务必确保正式版本匹配,所以不用增加这一项。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
### 运行指令 | ||
1. CPU非量化模型 | ||
``` | ||
python infer.py --model_path tnews/pruned_fp32/float32 --device ‘cpu’ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpu这里的符号全半角有问题
--device ‘cpu’ -> --device 'cpu'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,272 @@ | |||
# C#opyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright有Typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,148 @@ | |||
# C#opyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tokenizer=tokenizer, | ||
is_test=False) | ||
dev_ds = dev_ds.map(trans_func, lazy=False) | ||
batchify_fn = lambda samples, fn=Tuple( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
与liqi讨论,需要换成非batchify function的版本
@@ -0,0 +1,69 @@ | |||
# Ernie-3.0 Python部署指南 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ERNIE 3.0
无横杆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# Ernie-3.0 Python部署指南 | ||
|
||
## 安装 | ||
Ernie-3.0的部署分为cpu和gpu两种情况,请根据你的部署环境安装对应的依赖。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ERNIE 3.0
cpu -> CPU
gpu -> GPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
大小写要保持一致
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
### GPU端 | ||
在进行GPU部署之前请先确保机器已经安装好CUDA11.04和CUDNN8.2+,然后请使用如下指令安装所需依赖 | ||
``` | ||
pip install -r requirement_gpu.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requirements_cpu.txt
requirements_gpu.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
带s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
``` | ||
pip install -r requirement_gpu.txt | ||
``` | ||
在计算能力大于7.0的GPU硬件上,比如T4,如需FP16或者Int8量化推理加速,还需安装TensorRT和PaddleInference-TRT,具体硬件和精度支持情况请参考:[GPU算力和支持精度对照表](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
计算能力(Compute Capability)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|--model_path | 配置包含Paddle模型的目录路径| | ||
|--device | 配置部署设备,可选‘cpu’或者‘gpu’| | ||
|--batch_size |测试的batch size大小| | ||
|--enable_fp16 | 是否使用FP16进行加速 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_fp16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|--task_name | 配置任务名称,默认tnews| | ||
|--model_path | 配置包含Paddle模型的目录路径| | ||
|--device | 配置部署设备,可选‘cpu’或者‘gpu’| | ||
|--batch_size |测试的batch size大小| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
注意上下一致,没有默认值?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|--model_path | 用于推理的Paddle模型的路径| | ||
|--batch_size |测试的batch size大小,默认为32| | ||
|--perf | 是否测试性能 | | ||
|--enable_quantize | 是否启动ONNX FP32的动态量化进行加速 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不用露出ONNX,
是否启动ONNX FP32的动态量化进行加速->是否使用动态量化进行加速
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
参数说明: | ||
| 参数 |参数说明 | | ||
|----------|--------------| | ||
|--task_name | 配置任务名称,默认tnews| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
还有什么其他选项呢?换其他选项也能跑通吗?
enable_quantize=args.enable_quantize, | ||
collect_shape=args.collect_shape, | ||
num_threads=args.num_threads) | ||
if args.collect_shape: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
补充一些注释
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
args.enable_fp16 = False | ||
args.collect_shape = False | ||
if args.device == 'gpu': | ||
args.num_threads = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
设置为自动根据当前硬件的核数,进行加速
@@ -0,0 +1,72 @@ | |||
# Ernie3.0 Python部署指南 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ernie3.0 -> ERNIE 3.0 Python部署
全局的Ernie 3.0 -> ERNIE 3.0
pip install -r requirements_cpu.txt | ||
``` | ||
### GPU端 | ||
在进行GPU部署之前请先确保机器已经安装好CUDA11.04和CUDNN8.2+,然后请使用如下指令安装所需依赖 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA >= 11.04, cuDNN >= 8.2
注意专业术语的标准大小写形式
``` | ||
python infer_cpu.py --task_name tnews --model_path ./model/infer | ||
``` | ||
如果在支持avx512_vnni的CPU机器上,比如Intel(R) Xeon(R) Gold 6271C或11代CPU以上机器,可开启enable_quantize开关,无需数据便可对FP32模型进行量化,获得1到2倍的加速效果,具体部署指令如下 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Note**:在支持avx512_vnni指令集或Intel® DL Boost的CPU设备上,可开启enable_quantize开关对FP32模型进行动态量化以获得更高的推理性能,加速性能相比如下表所示:
|--batch_size |测试的batch size大小,默认为32| | ||
|--perf | 是否测试性能,默认关闭 | | ||
|--enable_quantize | 是否使用动态量化进行加速,默认关闭 | | ||
|--num_threads | 配置cpu的线程数,默认为cpu的最大线程数 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
将上述的Note放在这个位置,并提供数据表格支撑
|--batch_size |测试的batch size大小,默认为32| | ||
|--use_fp16 | 是否使用FP16进行加速,默认关闭 | | ||
|--perf | 是否测试性能,默认关闭 | | ||
|--collect_shape | 配置是否自动配置TensorRT的dynamic shape,开启enable_fp16或者进行int8量化推理时需要先开启此选项进行dynamic shape配置,生成shapeinfo.txt后再关闭,默认关闭 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
全局讲enable_fp16统一为use_fp16 保持统一,不要一下子use一下子enable
fetch_vars] = fluid.io.load_inference_model(model_dir, exe) | ||
else: | ||
[program, feed_var_names, | ||
fetch_vars] = fluid.io.load_inference_model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
使用paddle.static下的APi
|
||
def infer(self, data): | ||
if isinstance(self.predictor, | ||
paddle.fluid.core_avx.PaddleInferPredictor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么需要这个判断?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为这里有两种predictor,onnxruntime和paddleinference
metric = METRIC_CLASSES[args.task_name]() | ||
metric.reset() | ||
for i, batch in enumerate(batches): | ||
input_ids, segment_ids, label = batchify_fn(batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
逐渐替换为非batchify版本
@@ -0,0 +1,99 @@ | |||
#Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的井号后面要有一个空格
tokenizer=tokenizer, | ||
is_test=False) | ||
dev_ds = dev_ds.map(trans_func, lazy=False) | ||
batchify_fn = lambda samples, fn=Tuple( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后面需要替换为非batchify_fn版本,使用tokenizer版本
``` | ||
pip install -r requirements_gpu.txt | ||
``` | ||
在计算能力(Compute Capability)大于7.0的GPU硬件上,比如T4,如需FP16或者Int8量化推理加速,还需安装TensorRT和PaddleInference-TRT,计算能力(Compute Capability)和精度支持情况请参考:[GPU算力和支持精度对照表](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PaddleInference-TRT改成Paddle Inference
``` | ||
在计算能力(Compute Capability)大于7.0的GPU硬件上,比如T4,如需FP16或者Int8量化推理加速,还需安装TensorRT和PaddleInference-TRT,计算能力(Compute Capability)和精度支持情况请参考:[GPU算力和支持精度对照表](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix) | ||
1. TensorRT安装请参考:[TensorRT安装说明](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/install-guide/index.html#overview),简要步骤如下: | ||
(1)下载TensorRT8.4版本,文件名TensorRT-XXX.tar.gz,[下载链接](https://developer.nvidia.com/tensorrt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paddle Inference目前只适配了8.2
(4)使用pip install安装TensorRT-XXX/python中对应的tensorrt安装包 | ||
2. PaddleInference-TRT安装步骤如下: | ||
(1)下载对应版本的PaddleInference-TRT,[PaddleInference-TRT下载路径](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python) | ||
(2)使用pip install安装下载好的PaddleInference-TRT安装包 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里只考虑了Linux,Windows上的下载地址,以及环境变量tensorrt的安装都不同
(3)通过export LD_LIBRARY_PATH=TensorRT-XXX/lib:$LD_LIBRARY_PATH将lib路径加入到LD_LIBRARY_PATH中 | ||
(4)使用pip install安装TensorRT-XXX/python中对应的tensorrt安装包 | ||
2. PaddleInference-TRT安装步骤如下: | ||
(1)下载对应版本的PaddleInference-TRT,[PaddleInference-TRT下载路径](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下载对应cuda环境,python版本的Paddle Inference预测库包。 其中注意须下载支持tensort的预测包,如linux-cuda11.2-cudnn8.2-trt8-gcc8.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
import onnxruntime as ort | ||
import copy | ||
self.predictor_type = "onnxruntime" | ||
float_onnx_file = "model.onnx" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是否一定要写到本地,不能直接传中间转换结果吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dynamic_quantize_onnx_file) | ||
providers = ['CPUExecutionProvider'] | ||
sess_options = ort.SessionOptions() | ||
sess_options.optimized_model_filepath = "./optimize_model.onnx" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里同理
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pip install -r requirements_cpu.txt | ||
``` | ||
### GPU端 | ||
在进行GPU部署之前请先确保机器已经安装好CUDA >= 11.04,CuDNN >= 8.2,然后请使用如下指令安装所需依赖 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
确认下是11.04还是11.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
11.2就可以了
The model detects all entities: | ||
entity: 玛雅 label: LOC pos: [2, 3] | ||
entity: 华夏 label: LOC pos: [14, 15] | ||
----------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的输出打印值,跟代码里面的demo是不同的?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enable_quantize=False, | ||
set_dynamic_shape=False, | ||
num_threads=10): | ||
file_name = model_path.split('/')[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
代码考虑Windows/Linux兼容性,不要用字符串split,而是os.path.split
|
||
def paddle_quantize_model(self, model_dir, model_file, params_file): | ||
file_name = model_file.split('.')[0] | ||
model = paddle.jit.load(model_dir + "/" + file_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
使用os.path.join
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
路径相关的都删除了
看起来代码和文档并未在Windows进行测试过, @ZeyuChen 我们看是否有必要支持Windows,如若有必要,需要进行实测 |
help="The directory or name of model.", ) | ||
parser.add_argument( | ||
"--model_path", | ||
default='tnews_quant_models/mse4/int8', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种参数不适合设置默认值
PR types
Performance optimization
PR changes
Others
Description
Ernie3.0 python deploy script