Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RayClusterReplicaSet initial implementation #165

Merged
merged 3 commits into from
Sep 12, 2024

Conversation

Jeffwan
Copy link
Collaborator

@Jeffwan Jeffwan commented Sep 12, 2024

Pull Request Description

RayCluster RS controller implementation. I have not tested it from e2e yet, will cover it later.

Related Issues

Resolves: part of #161

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

It blocks var slice definition and have to switch to make. We cherry-pick some codes from upstream block this rule and I feel there’s no need to fix them at this moment.
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Sep 12, 2024

@Yicheng-Lu-llll Please take a look at the RayCluster status part. Since RayCluster doesn't have condition, some of the feature in upstream ReplicaSet may not be transferred here. for example, we do not know the RayCluster ready time, correct me if I am wrong.

@Jeffwan Jeffwan force-pushed the jiaxin/ray-new-api-implementation branch from 8ef01eb to 19021e0 Compare September 12, 2024 01:01
@Yicheng-Lu-llll
Copy link
Contributor

Yicheng-Lu-llll commented Sep 12, 2024

@Yicheng-Lu-llll Please take a look at the RayCluster status part. Since RayCluster doesn't have condition, some of the feature in upstream ReplicaSet may not be transferred here. for example, we do not know the RayCluster ready time, correct me if I am wrong.

RayCluster now has the condition! Documentation for the condition has just been merged: ray-project/ray#47462. It is available in the latest version of KubeRay.

@Yicheng-Lu-llll
Copy link
Contributor

Yicheng-Lu-llll commented Sep 12, 2024

We can use RayClusterProvisioned to know the first time all Ray pods are ready. After that, depending on how we interpret Ready, we can rely on condition HeadPodReady and ReadyWorkerReplicas in status.

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Sep 12, 2024

@Yicheng-Lu-llll BTW, is there a stable version release with the condition support? or it's in the master version now?

@Yicheng-Lu-llll
Copy link
Contributor

@Yicheng-Lu-llll BTW, is there a stable version release with the condition support? or it's in the master version now?

v1.2.1 release has the condition support but the default feature gate for this is false. We may need to enable it:

helm upgrade kuberay-operator kuberay/kuberay-operator --version 1.2.1 \
  --set featureGates\[0\].name=RayClusterStatusConditions \
  --set featureGates\[0\].enabled=true

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Sep 12, 2024

We can use ray-project/kuberay#2301 to know the first time all Ray pods are ready. After that, depending on how we interpret Ready, we can rely on condition ray-project/kuberay#2261 and ReadyWorkerReplicas in status.

I will leave this as a separate issue. tracked here #171

@Jeffwan Jeffwan merged commit 78ea085 into main Sep 12, 2024
3 checks passed
@Jeffwan Jeffwan deleted the jiaxin/ray-new-api-implementation branch September 12, 2024 20:47
gangmuk pushed a commit that referenced this pull request Jan 25, 2025
* Add ray cluster replicaset helpers

* Ignore prealloc check in linter config

It blocks var slice definition and have to switch to make. We cherry-pick some codes from upstream block this rule and I feel there’s no need to fix them at this moment.

* Add RayClusterReplicaSet rought implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants