Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Support LoRA loading for model deployments #205

Open
ApostaC opened this issue Mar 1, 2025 · 0 comments
Open

feature: Support LoRA loading for model deployments #205

ApostaC opened this issue Mar 1, 2025 · 0 comments
Labels
feature request New feature or request

Comments

@ApostaC
Copy link
Collaborator

ApostaC commented Mar 1, 2025

Describe the feature

Since we already see the trend of large-scale LoRA deployment in production, it would be great for production-stack to support dynamic LoRA loading. This will allow users to efficiently apply LoRA adapters without requiring full model reloading, improving both resource utilization and deployment agility.

More specifically, we want:

  • Enable dynamic loading and unloading of LoRA adapters on deployed vLLM instances.
  • Support specifying LoRA adapters at runtime via API or configuration updates.
  • Documentation and examples for configuring LoRA adapters in the deployment.

Why do you need this feature?

No response

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant