Implement exact Preble routing algorithm in AIBRix #647

Jeffwan · 2025-02-11T06:33:18Z

Preble (https://arxiv.org/abs/2407.00023) did solid work on prefix-cache and load-aware routing.

The prefix-cache aware version we are implementing is a little bit different from Preble, while we also borrow some ideas of their metadata design.

We use hash blocks or linked hash blocks rather than radix tree. Technically, we ignore the efficiency issues at this moment.
vLLM already have the local "logical" tree instead, in that case, we do not need it anymore.
We should consider the "load-aware" part in the new PR, Load cost calculation is kind of essential and should be used with prefix-cache aware routing together. While, they implements load-aware in 3 steps. a. historical load cost b. eviction cost c. current request cost. We can implement many load-aware strategies as we can, some existing like least-request, least-kv-cache etc all fits under this category and can be reused as well

No response

No response

The text was updated successfully, but these errors were encountered:

Jeffwan added area/gateway kind/feature Categorizes issue or PR as related to a new feature. labels Feb 11, 2025

Jeffwan assigned gangmuk Feb 11, 2025

Jeffwan mentioned this issue Feb 26, 2025

v0.3.0 roadmap #698

Open

41 tasks

Provide feedback