Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model orchestration with heterogeneous hardwares #13

Open
Jeffwan opened this issue Jul 5, 2024 · 4 comments
Open

Model orchestration with heterogeneous hardwares #13

Jeffwan opened this issue Jul 5, 2024 · 4 comments
Assignees
Labels
area/heterogeneous kind/enhancement New feature or request priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@Jeffwan
Copy link
Collaborator

Jeffwan commented Jul 5, 2024

We meet a few cases that single deployment needs to be deployed across different chips due to quota or resource shortage. However, in Kubernetes, most of the time we use Deployment to manage a group of pods using one type of GPU, If we remove GPU type constraints, then it's hard to control the ratio. Technically, we can workaround the problem using multiple deployment, but the rolling upgrade control additional control, same as HPA. The RoleSet CRD is not able to manage the such cases as well.

  1. We may need other orchestrators for instances using heterogeneous hardwares, HPA, Rolling upgrade need to be revised as well.
  2. We need more advanced Traffic Routing solutions to handle such differences
  3. It also brings lots of challenges on monitoring at the service level etc
@Jeffwan Jeffwan added this to the v0.1.0-rc.2 milestone Jul 29, 2024
@Jeffwan Jeffwan changed the title Model orchestration with heterogeneous hardwares [RFC] Model orchestration with heterogeneous hardwares Jul 29, 2024
@Jeffwan Jeffwan added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jul 29, 2024
@Jeffwan Jeffwan modified the milestones: v0.1.0-rc.2, v0.1.0 Aug 29, 2024
@Jeffwan Jeffwan self-assigned this Aug 29, 2024
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Sep 11, 2024

I am considering to build a Model abstraction to hide the deployment details for users. It should cross GPU devices, cross clouds etc. It will leave us enough room for cost/performance optimization

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Sep 11, 2024

related paper: https://arxiv.org/abs/2404.14527

@Jeffwan Jeffwan modified the milestones: v0.1.0, v0.2.0 Nov 12, 2024
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Nov 13, 2024

We do not have plan in v0.2.0 to change the orchestration part. Let's firstly resolve the cost-efficient serving issue using multiple deployment with some common labels, that's enough. I will change this issue to a feature and part of RFC heterogenous part

@Jeffwan Jeffwan changed the title [RFC] Model orchestration with heterogeneous hardwares Model orchestration with heterogeneous hardwares Nov 13, 2024
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Nov 26, 2024

this is a sub-story of #425, we may use a lose way like labels to orchestrate the workload in v0.2.0. We can better orchestrate such workloads in v0.3.0 with model api. Postpone to v0.3.0.

@Jeffwan Jeffwan removed this from the v0.2.0 milestone Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/heterogeneous kind/enhancement New feature or request priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

1 participant