Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement exact Preble routing algorithm in AIBRix #647

Open
Tracked by #698
Jeffwan opened this issue Feb 11, 2025 · 0 comments
Open
Tracked by #698

Implement exact Preble routing algorithm in AIBRix #647

Jeffwan opened this issue Feb 11, 2025 · 0 comments
Assignees
Labels
area/gateway kind/feature Categorizes issue or PR as related to a new feature.

Comments

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 11, 2025

🚀 Feature Description and Motivation

Preble (https://arxiv.org/abs/2407.00023) did solid work on prefix-cache and load-aware routing.

The prefix-cache aware version we are implementing is a little bit different from Preble, while we also borrow some ideas of their metadata design.

  1. We use hash blocks or linked hash blocks rather than radix tree. Technically, we ignore the efficiency issues at this moment.
  2. vLLM already have the local "logical" tree instead, in that case, we do not need it anymore.
  3. We should consider the "load-aware" part in the new PR, Load cost calculation is kind of essential and should be used with prefix-cache aware routing together. While, they implements load-aware in 3 steps. a. historical load cost b. eviction cost c. current request cost. We can implement many load-aware strategies as we can, some existing like least-request, least-kv-cache etc all fits under this category and can be reused as well

Use Case

No response

Proposed Solution

No response

@Jeffwan Jeffwan added area/gateway kind/feature Categorizes issue or PR as related to a new feature. labels Feb 11, 2025
@Jeffwan Jeffwan mentioned this issue Feb 26, 2025
41 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gateway kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants