Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support distributed kv cache orchestration #583

Merged
merged 9 commits into from
Jan 21, 2025

Conversation

Jeffwan
Copy link
Collaborator

@Jeffwan Jeffwan commented Jan 21, 2025

Pull Request Description

Support distributed kv cache orchestration #570

  1. Introduce a new API called KVCache, it will create the etcd, cache dataplane and necessary service for vLLM engine to communicate with
  2. This version is an initial version. Some API design ideas are from vineyard operator. We did a few different things.
  • Simplify the original design, vineyard has many API we do not need. We just like to keep a slim version.
  • Improve the reliability, original v6d use template framework instead of managing resource natively which brings stability problem, for example, if etcd instance goes down or being deleted, the controller won't create a new one. this is unaccepted from HA perspective
  • We Add gpu and workload affinity, this is from my original change in v6d Add SchedulingConfig for Enhanced GPU and Pod Affinity Scheduling aibrix/v6d#7

Example

apiVersion: orchestration.aibrix.ai/v1alpha1
kind: KVCache
metadata:
  name: aibrix-deepseek-33b-kvcache
  namespace: aibrix-system
  annotations:
    kvcache.orchestration.aibrix.ai/node-affinity-gpu-type: NVIDIA-L20
    kvcache.orchestration.aibrix.ai/pod-affinity-workload: aibrix-deepseek-33b
spec:
  replicas: 1
  service:
    type: ClusterIP
    port: 9600
  cacheSpec:
    image: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/vineyardd:20241120
    imagePullPolicy: IfNotPresent

it will create following resources

cache deployment

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
test-aibrix-model-deepseek-coder-33b-kvcache   1/1     1            1           8m54s

individual pod

test-aibrix-model-deepseek-coder-33b-kvcache-etcd-0            1/1     Running             0            51m

services

NAME                                                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
test-aibrix-model-deepseek-coder-33b-kvcache-etcd-0         ClusterIP   10.96.153.146    <none>        2379/TCP,2380/TCP   49m
test-aibrix-model-deepseek-coder-33b-kvcache-etcd-service   ClusterIP   10.100.223.243   <none>        2379/TCP            49m
test-aibrix-model-deepseek-coder-33b-kvcache-rpc            ClusterIP   10.108.217.57    <none>        9600/TCP            49m

Related Issues

Resolves: #570

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@Jeffwan Jeffwan requested a review from varungup90 January 21, 2025 19:07
@Jeffwan Jeffwan force-pushed the jiaxin/kv-cache-orchestrator branch from 9d497e0 to 787c672 Compare January 21, 2025 22:05
func needsUpdateDeployment(deployment *appsv1.Deployment, found *appsv1.Deployment) bool {
imageChanged := false
for i, container := range found.Spec.Template.Spec.Containers {
if len(deployment.Spec.Template.Spec.Containers) > i {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also have a scenario where number of containers are not same. Second, better to have a second loop to match by container name rather than depending on the container index (removes the assumption part). This can be a future TODO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great suggestions. Sure, I create a new issue to track your proposed improvements #587.

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Jan 21, 2025

I will merge this issue now and finish Varun's suggestion in later PRs.

@Jeffwan Jeffwan merged commit ece609e into main Jan 21, 2025
10 checks passed
@Jeffwan Jeffwan deleted the jiaxin/kv-cache-orchestrator branch January 21, 2025 23:43
gangmuk pushed a commit that referenced this pull request Jan 25, 2025
* Add kv cache api or distributed kv orchestration

* Update the kvcache api spec

* Add KV Cache controller initial implementation

* Support affinity Node & Pod settings

* Adjust manifest to orchestration folders

* fix the ci check

* Address review feedback

* Update code based on rebase refactor

* Fix the linter issue

---------

Signed-off-by: Jiaxin Shan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support cache orchestrator to support distributed kv cache scenarios
2 participants