-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support Disaggregated Prefilling (experimental) #7
Comments
Thanks @gaocegege ! We are currently discussing the potential solutions with @KuntaiDu (the main contributor of vLLM disagg prefill functionality). One potential solution is to integrate the proxy server functionality into the router, so it does not need extra k8s-level configurations. |
Will create an RFC issue once we have a more concrete design. |
Thanks, I am closing this since there will be a RFC. |
@gaocegege The router in vLLM is currently not overlapping with the existing router inside this project, but in the future we will overwrite the router in vLLM to make it interact with the router of this project. The router's job is roughly into 2 layers:
|
Sounds reasonable. Thanks for your explanation! I am excited about this RFC then. |
Thanks for the project.
The documentation at https://docs.vllm.ai/en/latest/features/disagg_prefill.html introduces a proxy server along with prefill and decode instances. I am uncertain whether the proxy server overlaps with the router in this project.
However, I am confident that it is not compatible with the Helm chart. Ideally, a Kubernetes Custom Resource Definition (CRD) should be implemented instead of a Helm chart to accommodate more complex deployment configurations.
Just raising this for discussion—it shouldn't be considered a high priority at this time.
The text was updated successfully, but these errors were encountered: