-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference] Add a8w8(fp8) a8w8c8(int8) quant_type support #9032
[Inference] Add a8w8(fp8) a8w8c8(int8) quant_type support #9032
Conversation
2. add llama3.1 and qwen2 ptq config 3. update quantization.md
Thanks for your contribution! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #9032 +/- ##
===========================================
+ Coverage 53.81% 54.01% +0.19%
===========================================
Files 652 652
Lines 104356 105208 +852
===========================================
+ Hits 56155 56823 +668
- Misses 48201 48385 +184 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -220,6 +228,12 @@ python run_finetune.py ./config/llama/ptq_argument.json | |||
|
|||
# GPTQ 量化启动命令参考 | |||
python run_finetune.py ./config/llama/ptq_argument.json | |||
|
|||
# W8A8C8(INT)量化启动命令参考 | |||
python run_finetune.py ./config/llama/ptq_c8_argument.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
多了一个空格
@@ -0,0 +1,138 @@ | |||
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023 -> 2024
# print( | ||
# f"{index/len(subject_list)} Inference starts at {run_date} on {args.model_name_or_path} with subject of {subject_name}!" | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug信息还是否有必要保留?
@@ -0,0 +1,61 @@ | |||
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023 -> 2024
@@ -0,0 +1,191 @@ | |||
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023 -> 2024
@@ -0,0 +1,94 @@ | |||
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023 -> 2024
import numpy as np | ||
import paddle | ||
|
||
# from paddleslim.quant.observers.channel_wise import ChannelWiseObserver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?删掉
@@ -0,0 +1,105 @@ | |||
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023 -> 2024
@@ -0,0 +1,55 @@ | |||
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023 -> 2024
# from paddle.quantization.factory import ObserverFactory | ||
from experimental.layers.cache_kv import CacheKVMatMul | ||
|
||
# from paddleslim.quant.observers.mse import MSEObserverLayer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不需要的都给删掉,其他地方自行排查下
…le#9032) * 1. add a8w8(fp8) a8w8c8(int8) quant_type support 2. add llama3.1 and qwen2 ptq config 3. update quantization.md * fix load_quant_model bug * fix load quant bug * update ll/README.md
PR types
New features
PR changes
APIs | Docs
Description