Model orchestration with heterogeneous hardwares #13

Jeffwan · 2024-07-05T02:28:04Z

We meet a few cases that single deployment needs to be deployed across different chips due to quota or resource shortage. However, in Kubernetes, most of the time we use Deployment to manage a group of pods using one type of GPU, If we remove GPU type constraints, then it's hard to control the ratio. Technically, we can workaround the problem using multiple deployment, but the rolling upgrade control additional control, same as HPA. The RoleSet CRD is not able to manage the such cases as well.

We may need other orchestrators for instances using heterogeneous hardwares, HPA, Rolling upgrade need to be revised as well.
We need more advanced Traffic Routing solutions to handle such differences
It also brings lots of challenges on monitoring at the service level etc

The text was updated successfully, but these errors were encountered:

Jeffwan · 2024-09-11T00:21:21Z

I am considering to build a Model abstraction to hide the deployment details for users. It should cross GPU devices, cross clouds etc. It will leave us enough room for cost/performance optimization

Jeffwan · 2024-09-11T00:21:49Z

related paper: https://arxiv.org/abs/2404.14527

Jeffwan · 2024-11-13T07:12:31Z

We do not have plan in v0.2.0 to change the orchestration part. Let's firstly resolve the cost-efficient serving issue using multiple deployment with some common labels, that's enough. I will change this issue to a feature and part of RFC heterogenous part

Jeffwan · 2024-11-26T08:33:59Z

this is a sub-story of #425, we may use a lose way like labels to orchestrate the workload in v0.2.0. We can better orchestrate such workloads in v0.3.0 with model api. Postpone to v0.3.0.

Jeffwan added kind/enhancement New feature or request area/heterogeneous labels Jul 5, 2024

Jeffwan added this to the v0.1.0-rc.2 milestone Jul 29, 2024

Jeffwan changed the title ~~Model orchestration with heterogeneous hardwares~~ [RFC] Model orchestration with heterogeneous hardwares Jul 29, 2024

Jeffwan added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jul 29, 2024

Jeffwan modified the milestones: v0.1.0-rc.2, v0.1.0 Aug 29, 2024

Jeffwan self-assigned this Aug 29, 2024

Jeffwan modified the milestones: v0.1.0, v0.2.0 Nov 12, 2024

Jeffwan changed the title ~~[RFC] Model orchestration with heterogeneous hardwares~~ Model orchestration with heterogeneous hardwares Nov 13, 2024

Jeffwan removed this from the v0.2.0 milestone Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model orchestration with heterogeneous hardwares #13

Model orchestration with heterogeneous hardwares #13

Jeffwan commented Jul 5, 2024 •

edited

Loading

Jeffwan commented Sep 11, 2024

Jeffwan commented Sep 11, 2024

Jeffwan commented Nov 13, 2024

Jeffwan commented Nov 26, 2024

Model orchestration with heterogeneous hardwares #13

Model orchestration with heterogeneous hardwares #13

Comments

Jeffwan commented Jul 5, 2024 • edited Loading

Jeffwan commented Sep 11, 2024

Jeffwan commented Sep 11, 2024

Jeffwan commented Nov 13, 2024

Jeffwan commented Nov 26, 2024

Jeffwan commented Jul 5, 2024 •

edited

Loading