[Inference] FP8 dual gemm auto-tune and support compile parallelization #9151

ckl117 · 2024-09-19T08:17:29Z

PR types

New features

PR changes

Others

Description

添加FP8 cutlass dual gemm 自定义算子；
针对生成的cutlass算子的编译进行优化，在L20上将所有自定义算子编译时间由52分钟优化到3分钟，加速16.3倍；

…nto llama3-fp8

paddle-bot · 2024-09-19T08:17:34Z

Thanks for your contribution!

CLAassistant · 2024-09-19T08:17:35Z

All committers have signed the CLA.

codecov · 2024-09-19T08:52:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.26%. Comparing base (907ad20) to head (5ebdb13).
Report is 242 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #9151   +/-   ##
========================================
  Coverage    53.26%   53.26%           
========================================
  Files          652      652           
  Lines       105615   105615           
========================================
  Hits         56254    56254           
  Misses       49361    49361

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yuanlehome · 2024-09-20T04:04:06Z

csrc/generate_code_dual_gemm_fused_kernels.py

@@ -0,0 +1,628 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.


我建议在csrc/目录下建一个utils或者tools目录，把csrc/generate_code_dual_gemm_fused_kernels.py移进去，另外generate_code_dual_gemm_fused_kernels文件的命名也有点不太准确，改为auto_gen_fp8_fp8_dual_gemm_fused_kernels.py ?

另外顺便把csrc/test_tune_cublaslt_gemm.py也移到utils或者tools目录下，test_tune_cublaslt_gemm命名改为tune_cublaslt_int8_gemm.py吧

csrc/generate_code_gemm_fused_kernels.py文件同理～

yuanlehome

LGTM

ckl117 added 9 commits August 16, 2024 17:42

fp8

62483d3

check

3ebc594

check

7957f84

check

f79baaa

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

3c63b7a

…nto llama3-fp8

check

c88f712

cutlass fp8

3f41111

fp8 chech

6fe2613

check

bfc106d

ckl117 force-pushed the sx_fp8_tune branch 2 times, most recently from c35ebb6 to 1934c3f Compare September 19, 2024 09:12

ffn1 tune

83fe653

ckl117 force-pushed the sx_fp8_tune branch from 1934c3f to 83fe653 Compare September 19, 2024 09:18

ckl117 added 3 commits September 19, 2024 09:25

merge

4836310

delete

6feda1c

check

fa91dad

yuanlehome reviewed Sep 20, 2024

View reviewed changes

yuanlehome mentioned this pull request Sep 20, 2024

[INFER] Add cutlass fp8 gemm auto tune #9020

Closed

ckl117 added 3 commits September 20, 2024 05:18

change file path

9ebca00

merge develop

e312a75

top_p_sampling_reject.cu

5ebdb13

yuanlehome approved these changes Sep 20, 2024

View reviewed changes

yuanlehome merged commit c4f7acf into PaddlePaddle:develop Sep 20, 2024
9 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference] FP8 dual gemm auto-tune and support compile parallelization #9151

[Inference] FP8 dual gemm auto-tune and support compile parallelization #9151

ckl117 commented Sep 19, 2024 •

edited

Loading

paddle-bot bot commented Sep 19, 2024

CLAassistant commented Sep 19, 2024 •

edited

Loading

codecov bot commented Sep 19, 2024 •

edited

Loading

yuanlehome Sep 20, 2024

yuanlehome Sep 20, 2024

yuanlehome Sep 20, 2024

ckl117 Sep 20, 2024 •

edited

Loading

yuanlehome left a comment

		@@ -0,0 +1,628 @@
		# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.

[Inference] FP8 dual gemm auto-tune and support compile parallelization #9151

[Inference] FP8 dual gemm auto-tune and support compile parallelization #9151

Conversation

ckl117 commented Sep 19, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Sep 19, 2024

CLAassistant commented Sep 19, 2024 • edited Loading

codecov bot commented Sep 19, 2024 • edited Loading

Codecov Report

yuanlehome Sep 20, 2024

Choose a reason for hiding this comment

yuanlehome Sep 20, 2024

Choose a reason for hiding this comment

yuanlehome Sep 20, 2024

Choose a reason for hiding this comment

ckl117 Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

yuanlehome left a comment

Choose a reason for hiding this comment

ckl117 commented Sep 19, 2024 •

edited

Loading

CLAassistant commented Sep 19, 2024 •

edited

Loading

codecov bot commented Sep 19, 2024 •

edited

Loading

ckl117 Sep 20, 2024 •

edited

Loading