[Inference] Qwen2 support fp8 inference #8954

ckl117 · 2024-08-16T11:44:06Z

PR types

New features

PR changes

Others

Description

qwen2 支持a8w8_fp8和a8w8c8_fp8推理

paddle-bot · 2024-08-16T11:44:11Z

Thanks for your contribution!

codecov · 2024-08-16T12:16:17Z

Codecov Report

Attention: Patch coverage is 0% with 176 lines in your changes missing coverage. Please review.

Project coverage is 53.44%. Comparing base (a275ab7) to head (e4c7690).
Report is 214 commits behind head on develop.

Files with missing lines	Patch %	Lines
...dlenlp/experimental/transformers/qwen2/modeling.py	0.00%	176 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8954      +/-   ##
===========================================
+ Coverage    53.34%   53.44%   +0.09%     
===========================================
  Files          652      652              
  Lines       105484   105188     -296     
===========================================
- Hits         56270    56214      -56     
+ Misses       49214    48974     -240

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…nto qwen2-fp8

wawltor

LGTM

* qwen2 fp8 * fp8 check * fp8 cutlass * int8 cachekv * a8w8c8_fp8

ckl117 changed the title ~~Qwen2 fp8~~ [LLM Inference] Qwen2 support a8w8c8 and fp8 inference Aug 16, 2024

ckl117 changed the title ~~[LLM Inference] Qwen2 support a8w8c8 and fp8 inference~~ [LLM Inference] Qwen2 support fp8 inference Aug 20, 2024

qwen2 fp8

d03f486

ckl117 force-pushed the qwen2-fp8 branch from 2c8dac4 to d03f486 Compare August 20, 2024 08:25

ckl117 added 4 commits August 21, 2024 13:43

fp8 check

a1dd5ef

fp8 cutlass

0d92ac7

int8 cachekv

3f10c42

merge develop

1bde9b8

ckl117 force-pushed the qwen2-fp8 branch from ac29bc4 to 1bde9b8 Compare August 29, 2024 11:12

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

ed80c13

…nto qwen2-fp8

ckl117 force-pushed the qwen2-fp8 branch from 22807e9 to ed80c13 Compare August 29, 2024 13:28

ckl117 added 2 commits August 30, 2024 10:20

a8w8c8_fp8

1d94a44

merge develop

e4c7690

ckl117 changed the title ~~[LLM Inference] Qwen2 support fp8 inference~~ [Inference] Qwen2 support fp8 inference Sep 2, 2024

wawltor approved these changes Sep 2, 2024

View reviewed changes

wawltor merged commit 84469d6 into PaddlePaddle:develop Sep 2, 2024
10 of 14 checks passed

ckl117 deleted the qwen2-fp8 branch September 9, 2024 05:51

Mangodadada pushed a commit to Mangodadada/PaddleNLP that referenced this pull request Sep 10, 2024

[Inference] Qwen2 support fp8 inference (PaddlePaddle#8954)

fd9adf0

* qwen2 fp8 * fp8 check * fp8 cutlass * int8 cachekv * a8w8c8_fp8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference] Qwen2 support fp8 inference #8954

[Inference] Qwen2 support fp8 inference #8954

ckl117 commented Aug 16, 2024 •

edited

Loading

paddle-bot bot commented Aug 16, 2024

codecov bot commented Aug 16, 2024 •

edited

Loading

wawltor left a comment

[Inference] Qwen2 support fp8 inference #8954

[Inference] Qwen2 support fp8 inference #8954

Conversation

ckl117 commented Aug 16, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Aug 16, 2024

codecov bot commented Aug 16, 2024 • edited Loading

Codecov Report

wawltor left a comment

Choose a reason for hiding this comment

ckl117 commented Aug 16, 2024 •

edited

Loading

codecov bot commented Aug 16, 2024 •

edited

Loading