Added lora support proposal #216

wangchen615 · 2025-03-03T04:48:48Z

This is a PR to start a proposal design for LoRA support.

Continue progress toward #205

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE

Make sure the code changes pass the pre-commit checks.
Sign-off your commit by using -s when doing git commit
[Feat] LoRA management support.

Proposal to support LoRA Adapter Management for vLLM Production Stack on Kubernetes

[Feature] LoRA Adapter Management Support

[Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
[Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

Signed-off-by: Chen Wang <[email protected]>

ApostaC

Thanks @wangchen615 ! This proposal looks pretty good!

Got a few questions though:

A clarification question: IIUC, the scheduler is on pod-level (which means different pods in the same deployment may get different LoRA adapters), right?
What should the scheduler do if there are newly added pods (i.e., when autoscale happens)
Would anything special happen when a vLLM pod dies?

Thanks!

wangchen615 · 2025-03-04T20:26:55Z

Thanks @wangchen615 ! This proposal looks pretty good!

Got a few questions though:

A clarification question: IIUC, the scheduler is on pod-level (which means different pods in the same deployment may get different LoRA adapters), right?

Yes, the scheduler is supposed to have algorithms at pod level, yet the initial default algorithms would be just register all adapters belonging to the same base model to all replicas of that base model deployment.

What should the scheduler do if there are newly added pods (i.e., when autoscale happens)
It will trigger the controller to reconcile, namely redo the scheduling of adapters belonging to that base model. The default algorithm will just register all adapters belonging to that base models to the new replica.

Would anything special happen when a vLLM pod dies?
We need some special mechanism to handle request failures at the router for a short period when the adapter to pod mapping cache has not been updated. Will come up with more designs when this controller and scheduler algorithms are put into work.

Thanks!

ApostaC · 2025-03-04T21:18:25Z

LGTM! Merging this proposal now, thanks!

wangchen615 added 2 commits March 2, 2025 23:36

added lora support proposal

b2fdd09

Signed-off-by: Chen Wang <[email protected]>

updated lora-k8s-arch.png

a3fced2

Signed-off-by: Chen Wang <[email protected]>

ApostaC approved these changes Mar 4, 2025

View reviewed changes

ApostaC merged commit c564acc into vllm-project:main Mar 4, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added lora support proposal #216

Added lora support proposal #216

wangchen615 commented Mar 3, 2025 •

edited

Loading

ApostaC left a comment

wangchen615 commented Mar 4, 2025

ApostaC commented Mar 4, 2025

Added lora support proposal #216

Added lora support proposal #216

Conversation

wangchen615 commented Mar 3, 2025 • edited Loading

[Feature] LoRA Adapter Management Support

Code Quality

DCO and Signed-off-by

What to Expect for the Reviews

ApostaC left a comment

Choose a reason for hiding this comment

wangchen615 commented Mar 4, 2025

ApostaC commented Mar 4, 2025

wangchen615 commented Mar 3, 2025 •

edited

Loading