-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize batchnorm1d using 2D kernel #43530
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
Sorry to inform you that 938cde3's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
Sorry to inform you that 44ad03e's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
@@ -591,10 +591,12 @@ void BatchNormGradRawKernel(const Context &ctx, | |||
// ctx.GetPlace()), | |||
// epsilon, saved_mean_data, saved_var_data)); | |||
#else | |||
// CUDNN PER_ACTIVATION mode only support small batch size | |||
// CUDNN only support small batch size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
和版本有关吗 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前用到的版本是不支持太大的batch_size。
@@ -137,6 +138,398 @@ static __global__ LAUNCH_BOUNDS(BlockDim) void BNForwardTraining( | |||
} | |||
} | |||
|
|||
template <typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后续补充注释
PR types
Function optimization
PR changes
OPs
Describe
调研设计文档
1 Motivation:
2 Design:
3 Evaluation:
测试环境:NVIDIA V100 GPU
[N, C]输入获得了33倍的性能提升,[N, C, L]输入获得了8.2倍的性能提升