-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add deepep internode implementations. #71435
base: develop
Are you sure you want to change the base?
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
774a5f2
to
bc74744
Compare
@@ -173,7 +178,7 @@ struct LowLatencyLayout { | |||
// - 2 symmetric odd/even signaling buffers | |||
|
|||
// Message sizes | |||
EP_HOST_ASSERT(num_scales * sizeof(float) <= hidden); | |||
EP_HOST_ASSERT(num_scales * static_cast<int>(sizeof(float)) <= hidden); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为啥不是static_cast<int64_t>呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看num_scales
和hidden
是int
类型,所以cast到了相同的数据类型
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改成static_cast<int64_t>
num_channels * num_ranks * sizeof(int) + // Channel start offset | ||
num_channels * num_ranks * sizeof(int) + // Channel end offset | ||
num_channels * num_ranks * sizeof(int) * 2 + // Queue head and tail | ||
num_ranks * num_ranks * static_cast<int>(sizeof(int)) + // prefix matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为啥不是static_cast<int64_t>呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下同
b7fe54e
to
6531039
Compare
PR Category
Communication Library
PR Types
New features
Description
pcard-67164
基于#71358 ,集成deepep多机通信实现。deepep多机实现依赖第三方库nvshmem,本PR中通过cmake实现了nvshmem的自动下载、编译、动态库的打包和安装,通过cmake选项
WITH_NVSHMEM
控制,默认是OFF
。因nvshmem又依赖了gdrcopy库,如果用户的gdrcopy没有安装到系统目录,则需要通过GDRCOPY_HOME
来指定安装路径。待讨论和优化点:
90
架构则会编译deepep代码,是否要设置成包含90
架构则默认下载编译nvshmem库和deepep多机代码。