-
Notifications
You must be signed in to change notification settings - Fork 740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support coscheduling plugin #1722
Comments
Yes. This is great. We need to remove direct dependency of volcano in code and should be able to configure the scheduler. Currently, scheduler is taken as cmd argument but however, code has hard dependency on volcano. Related: |
@johnugeorge Does that mean we stop supporting Volcano? But I'm ok with either about removing Volcano support. |
No. I meant, it to be dynamic. Like decoupling the main code and volcano implementation. See https://github.com/kubeflow/training-operator/blob/master/pkg/controller.v1/pytorch/pytorchjob_controller.go#L88 In Katib, trial resources can be added via cmd. https://github.com/kubeflow/katib/blob/master/cmd/katib-controller/v1beta1/main.go#L62 I was thinking if we can achieve something like this |
Ah, I see. That makes sense.
Maybe, the solution work fine. |
Similar to #1518 |
As mentioned by @zw0610 in kubeflow/mpi-operator#500 (comment), I will work on kubeflow/common#185 and #1526. |
@tenzen-y Do you want to wait till #1714 (comment) is done? |
@johnugeorge If a community agrees with containing this feature in the next training-operator release (v1.6.0), I would like to work on this as soon as possible. WDYT? |
It would be great if you can make it in this release |
Sounds good. ASAP, I'm going to work on this. /assign |
/kind feature
Training Operator now supports the all-or-nothing semantic, queuing logic, and more features for batch workload by the Volcano.
Although, I think the maintenance cost for Volcano is a bit high for users who want to use only all-or-nothing semantic.
So I would like to support that semantic by coscheduling plugin.
Supporting the coscheduling plugin, users could use that semantic without additional components.
@kubeflow/wg-training-leads WDYT?
The text was updated successfully, but these errors were encountered: