PaddlePaddle · qingqing01 · Sep 11, 2024 · Sep 9, 2024 · Sep 9, 2024 · Sep 9, 2024
diff --git a/.gitignore b/.gitignore
@@ -126,6 +126,6 @@ FETCH_HEAD
 ./ppdiffusers/ppdiffusers/version.py
 
 # third party
-csrc/gpu/cutlass_kernels/cutlass
+csrc/third_party/
 dataset/
 output/
diff --git a/csrc/README.md b/csrc/README.md
@@ -10,6 +10,12 @@ pip install -r requirements.txt
 
 ## 编译 Cuda 算子
 
+生成 FP8的 cutlass 算子(编译耗时较长)
+```shell
+python generate_code_gemm_fused_kernels.py
+```
+
+编译
 ```shell
 python setup_cuda.py install
 ```
@@ -20,9 +26,14 @@ python setup_cuda.py install
 2. 拉取代码:
     git clone -b v3.5.0 --single-branch https://github.com/NVIDIA/cutlass.git
 
-3. 将下载的 `cutlass` 目录放在 `csrc/gpu/cutlass_kernels/cutlass`下
+3. 将下载的 `cutlass` 目录放在 `csrc/third_party/cutlass`下
 
 4. 重新编译 Cuda 算子
 ```shell
 python setup_cuda.py install
 ```
+
+### FP8 GEMM 自动调优
+```shell
+sh tune_fp8_gemm.sh
+```