Is prefix-aware routing implemented in this repo? #19

tdoublep · 2025-01-24T13:01:18Z

It says "(WIP) prefix-aware routing" in the README here:
https://github.com/vllm-project/production-stack/tree/main/src/router

A very quick scan through the Python files there, I can't see anything.

Can the results in the blog be reproduced using the code that is currently in this repo?

gaocegege · 2025-01-25T04:52:31Z

From what I’ve seen in the vllm-dev.slack.com discussions, this feature might roll out in the next few weeks. I’m guessing the performance boost could come from session-id-based routing, which seems similar to how prefix caching works in regular QA tasks—basically, requests from the same session share the same prefix. Just my two cents, though.

KuntaiDu · 2025-01-25T14:14:48Z

In the current results shown in the blogpost, we use Session-ID-based routing, which can achieve most of the prefix caching potential in general chatting applications. We are working on the prefix-aware routing (it's going to be in the roadmap). cc @ApostaC

gaocegege · 2025-01-29T23:53:42Z

I'm curious about how to implement this feature. Will we maintain a radix tree in the router to simulate the KV cache status, similar to the SGLang router?

ApostaC · 2025-01-30T00:07:41Z

I'm curious about how to implement this feature. Will we maintain a radix tree in the router to simulate the KV cache status, similar to the SGLang router?

Good question. I think an alternative way is to simulate the "page" and "eviction" logic at a coarser granularity.
@KuntaiDu is right now working on the design, and there should be an RFC soon.

gaocegege · 2025-01-30T00:23:02Z

Good question. I think an alternative way is to simulate the "page" and "eviction" logic at a coarser granularity.

Makes a lot of sense. Will keep an eye on it.

KuntaiDu · 2025-02-04T17:07:33Z

Please check #59 for the initial design of prefix-aware routing

gaocegege · 2025-02-07T03:16:28Z

I think we could close this issue since we've got the RFC to keep track of it. What do you think?

tdoublep · 2025-02-07T16:54:03Z

Sure thing.

ApostaC mentioned this issue Jan 27, 2025

[Roadmap] vLLM production stack roadmap for 2025 Q1 #26

Open

17 tasks

tdoublep closed this as completed Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is prefix-aware routing implemented in this repo? #19

Is prefix-aware routing implemented in this repo? #19

tdoublep commented Jan 24, 2025

gaocegege commented Jan 25, 2025

KuntaiDu commented Jan 25, 2025

gaocegege commented Jan 29, 2025

ApostaC commented Jan 30, 2025

gaocegege commented Jan 30, 2025

KuntaiDu commented Feb 4, 2025

gaocegege commented Feb 7, 2025

tdoublep commented Feb 7, 2025

Is prefix-aware routing implemented in this repo? #19

Is prefix-aware routing implemented in this repo? #19

Comments

tdoublep commented Jan 24, 2025

gaocegege commented Jan 25, 2025

KuntaiDu commented Jan 25, 2025

gaocegege commented Jan 29, 2025

ApostaC commented Jan 30, 2025

gaocegege commented Jan 30, 2025

KuntaiDu commented Feb 4, 2025

gaocegege commented Feb 7, 2025

tdoublep commented Feb 7, 2025