Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference] FP8 dual gemm auto-tune and support compile parallelization #9151

Merged
merged 16 commits into from
Sep 20, 2024

Conversation

ckl117
Copy link
Contributor

@ckl117 ckl117 commented Sep 19, 2024

PR types

New features

PR changes

Others

Description

添加FP8 cutlass dual gemm 自定义算子;
针对生成的cutlass算子的编译进行优化,在L20上将所有自定义算子编译时间由52分钟优化到3分钟,加速16.3倍;

Copy link

paddle-bot bot commented Sep 19, 2024

Thanks for your contribution!

@CLAassistant
Copy link

CLAassistant commented Sep 19, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

codecov bot commented Sep 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.26%. Comparing base (907ad20) to head (5ebdb13).
Report is 242 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #9151   +/-   ##
========================================
  Coverage    53.26%   53.26%           
========================================
  Files          652      652           
  Lines       105615   105615           
========================================
  Hits         56254    56254           
  Misses       49361    49361           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ckl117 ckl117 force-pushed the sx_fp8_tune branch 2 times, most recently from c35ebb6 to 1934c3f Compare September 19, 2024 09:12
@@ -0,0 +1,628 @@
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我建议在csrc/目录下建一个utils或者tools目录,把csrc/generate_code_dual_gemm_fused_kernels.py移进去,另外generate_code_dual_gemm_fused_kernels文件的命名也有点不太准确,改为auto_gen_fp8_fp8_dual_gemm_fused_kernels.py ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另外顺便把csrc/test_tune_cublaslt_gemm.py也移到utils或者tools目录下,test_tune_cublaslt_gemm命名改为tune_cublaslt_int8_gemm.py吧

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

csrc/generate_code_gemm_fused_kernels.py文件同理~

Copy link
Contributor Author

@ckl117 ckl117 Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done。

Copy link
Collaborator

@yuanlehome yuanlehome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yuanlehome yuanlehome merged commit c4f7acf into PaddlePaddle:develop Sep 20, 2024
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants