feat: Support Disaggregated Prefilling (experimental) #7

gaocegege · 2025-01-23T06:50:20Z

Thanks for the project.

The documentation at https://docs.vllm.ai/en/latest/features/disagg_prefill.html introduces a proxy server along with prefill and decode instances. I am uncertain whether the proxy server overlaps with the router in this project.

However, I am confident that it is not compatible with the Helm chart. Ideally, a Kubernetes Custom Resource Definition (CRD) should be implemented instead of a Helm chart to accommodate more complex deployment configurations.

Just raising this for discussion—it shouldn't be considered a high priority at this time.

ApostaC · 2025-01-23T15:37:08Z

Thanks @gaocegege ! We are currently discussing the potential solutions with @KuntaiDu (the main contributor of vLLM disagg prefill functionality).

One potential solution is to integrate the proxy server functionality into the router, so it does not need extra k8s-level configurations.

ApostaC · 2025-01-23T15:37:58Z

Will create an RFC issue once we have a more concrete design.

gaocegege · 2025-01-24T08:37:49Z

Thanks, I am closing this since there will be a RFC.

KuntaiDu · 2025-01-29T14:13:52Z

@gaocegege The router in vLLM is currently not overlapping with the existing router inside this project, but in the future we will overwrite the router in vLLM to make it interact with the router of this project.

The router's job is roughly into 2 layers:

Global router, which handles global request orchestration (e.g. fault tolerance / service discovery / prefix-cache-aware routing) and should be implemented by exterior code (e.g. this project).
Local router, which handles the inference of one single request and vLLM project provides an example implementation. But ofc local router needs to get some information from global router, so we will overwrite this router in the future.

gaocegege · 2025-01-29T23:43:00Z

Sounds reasonable. Thanks for your explanation! I am excited about this RFC then.

gaocegege closed this as completed Jan 24, 2025

spron-in mentioned this issue Jan 29, 2025

[Roadmap] vLLM production stack roadmap for 2025 Q1 #26

Open

17 tasks

gaocegege mentioned this issue Feb 24, 2025

v0.3.0 roadmap vllm-project/aibrix#698

Open

41 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support Disaggregated Prefilling (experimental) #7

feat: Support Disaggregated Prefilling (experimental) #7

gaocegege commented Jan 23, 2025

ApostaC commented Jan 23, 2025

ApostaC commented Jan 23, 2025

gaocegege commented Jan 24, 2025

KuntaiDu commented Jan 29, 2025

gaocegege commented Jan 29, 2025

feat: Support Disaggregated Prefilling (experimental) #7

feat: Support Disaggregated Prefilling (experimental) #7

Comments

gaocegege commented Jan 23, 2025

ApostaC commented Jan 23, 2025

ApostaC commented Jan 23, 2025

gaocegege commented Jan 24, 2025

KuntaiDu commented Jan 29, 2025

gaocegege commented Jan 29, 2025