[Inference] Add a8w8(fp8) a8w8c8(int8) quant_type support #9032

lixcli · 2024-08-28T07:43:22Z

PR types

New features

PR changes

APIs | Docs

Description

add a8w8(fp8) a8w8c8(int8) quant_type support
add llama3.1 and qwen2 ptq config
update quantization.md

2. add llama3.1 and qwen2 ptq config 3. update quantization.md

paddle-bot · 2024-08-28T07:43:26Z

Thanks for your contribution!

codecov · 2024-08-28T08:15:57Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.01%. Comparing base (34a71c8) to head (d21ace7).
Report is 229 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9032      +/-   ##
===========================================
+ Coverage    53.81%   54.01%   +0.19%     
===========================================
  Files          652      652              
  Lines       104356   105208     +852     
===========================================
+ Hits         56155    56823     +668     
- Misses       48201    48385     +184

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

DrownFish19

LGTM

yuanlehome · 2024-08-28T14:21:38Z

llm/README.md

@@ -220,6 +228,12 @@ python  run_finetune.py ./config/llama/ptq_argument.json

 # GPTQ 量化启动命令参考
 python  run_finetune.py ./config/llama/ptq_argument.json
+
+# W8A8C8(INT)量化启动命令参考
+python  run_finetune.py ./config/llama/ptq_c8_argument.json


多了一个空格

yuanlehome · 2024-08-28T14:22:00Z

llm/experimental/ceval/default/eval.py

@@ -0,0 +1,138 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


2023 -> 2024

yuanlehome · 2024-08-28T14:22:44Z

llm/experimental/ceval/default/eval.py

+        # print(
+        #     f"{index/len(subject_list)} Inference starts at {run_date} on {args.model_name_or_path} with subject of {subject_name}!"
+        # )


debug信息还是否有必要保留？

yuanlehome · 2024-08-28T14:22:49Z

llm/experimental/ceval/default/evaluator.py

@@ -0,0 +1,61 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


2023 -> 2024

yuanlehome · 2024-08-28T14:23:05Z

llm/experimental/ceval/default/model_evaluator.py

@@ -0,0 +1,191 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


2023 -> 2024

yuanlehome · 2024-08-28T14:24:06Z

llm/experimental/observer/abs_max_headwise.py

@@ -0,0 +1,94 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


2023 -> 2024

yuanlehome · 2024-08-28T14:24:17Z

llm/experimental/observer/abs_max_headwise.py

+import numpy as np
+import paddle
+
+# from paddleslim.quant.observers.channel_wise import ChannelWiseObserver


yuanlehome · 2024-08-28T14:24:22Z

llm/experimental/observer/avg_headwise.py

@@ -0,0 +1,105 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


2023 -> 2024

yuanlehome · 2024-08-28T14:24:28Z

llm/experimental/observer/channel_wise.py

@@ -0,0 +1,55 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


2023 -> 2024

yuanlehome · 2024-08-28T14:25:00Z

llm/experimental/observer/channel_wise.py

+# from paddle.quantization.factory import ObserverFactory
+from experimental.layers.cache_kv import CacheKVMatMul
+
+# from paddleslim.quant.observers.mse import MSEObserverLayer


不需要的都给删掉，其他地方自行排查下

…le#9032) * 1. add a8w8(fp8) a8w8c8(int8) quant_type support 2. add llama3.1 and qwen2 ptq config 3. update quantization.md * fix load_quant_model bug * fix load quant bug * update ll/README.md

1. add a8w8(fp8) a8w8c8(int8) quant_type support

005f2ad

2. add llama3.1 and qwen2 ptq config 3. update quantization.md

fix load_quant_model bug

e56d9c4

fix load quant bug

e2b9a49

DrownFish19 changed the title ~~add a8w8(fp8) a8w8c8(int8) quant_type support~~ [Inference] Add a8w8(fp8) a8w8c8(int8) quant_type support Aug 28, 2024

update ll/README.md

d21ace7

DrownFish19 approved these changes Aug 28, 2024

View reviewed changes

DrownFish19 merged commit 19927ba into PaddlePaddle:develop Aug 28, 2024
10 of 12 checks passed

yuanlehome reviewed Aug 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference] Add a8w8(fp8) a8w8c8(int8) quant_type support #9032

[Inference] Add a8w8(fp8) a8w8c8(int8) quant_type support #9032

lixcli commented Aug 28, 2024 •

edited

Loading

paddle-bot bot commented Aug 28, 2024

codecov bot commented Aug 28, 2024 •

edited

Loading

DrownFish19 left a comment

yuanlehome Aug 28, 2024

yuanlehome Aug 28, 2024

yuanlehome Aug 28, 2024

yuanlehome Aug 28, 2024

yuanlehome Aug 28, 2024

yuanlehome Aug 28, 2024

yuanlehome Aug 28, 2024

yuanlehome Aug 28, 2024

yuanlehome Aug 28, 2024

yuanlehome Aug 28, 2024

		@@ -0,0 +1,138 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,61 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,191 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,94 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,105 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,55 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

[Inference] Add a8w8(fp8) a8w8c8(int8) quant_type support #9032

[Inference] Add a8w8(fp8) a8w8c8(int8) quant_type support #9032

Conversation

lixcli commented Aug 28, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Aug 28, 2024

codecov bot commented Aug 28, 2024 • edited Loading

Codecov Report

DrownFish19 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lixcli commented Aug 28, 2024 •

edited

Loading

codecov bot commented Aug 28, 2024 •

edited

Loading