production-stack-router

Usage:

Run pip install -r requirements.txt to install python dependencies, and then run python test_router.py to test the router.

Benchmarking results:

Example:

INFO:test_router:Nreq distribution: [400 200   0 400   0 400   0 400   0 400]
INFO:test_router:Cache-miss length: mean: 382.99, std: 368.24
INFO:test_router:142 out of 424 share-prefix requests were routed to the same endpoint

Abstractions

Prefix Matcher

The prefix matcher matches the request to a set of endpoints.

match(text: str, unavailable_endpoints: Set[str]) -> str: Routes a request text to an available endpoint
update(text: str, endpoint: str) -> None: Updates the matcher state after a request is routed
add_endpoints(endpoints: Set[str]) -> None: Adds new endpoints to the matcher
remove_endpoints(endpoints: Set[str]) -> None: Removes endpoints from the matcher

Unavailable Endpoints Detector

The unavailable endpoints detector identifies overloaded endpoints that should not receive new requests. It has:

detect(endpoints_to_status: Dict[str, EndpointStatus]) -> Set[str]: Returns set of endpoints that are currently unavailable

Hyper-parameters

router_class: can be LCPMatcher (longest common prefix matcher) or SimhashMatcher.
dataset_size: Number of requests to generate for testing (default: 2200)
num_running_requests_threshold: Maximum number of concurrent requests an endpoint can handle before being marked as unavailable (default: 400)
minimum_endpoints: Minimum number of endpoints needed based on dataset size and request threshold
num_endpoints: Total number of endpoints to create, set to 2x minimum required (default: 2 * minimum_endpoints)
continue_chat_probability: Probability that a new request continues an existing conversation (default: 0.2)
tokens_typed_by_user_per_request: Number of tokens in each simulated user request (default: 100)

python router.py --routing-logic LCP --routing-logic "{}" --overload-detector max_kv_cache --unavaiable-endpoint-detector-config "{}"

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
load_balancer		load_balancer
prefix_matcher		prefix_matcher
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
test_router.py		test_router.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

production-stack-router

Usage:

Benchmarking results:

Abstractions

Prefix Matcher

Unavailable Endpoints Detector

Hyper-parameters

About

Releases

Packages

Languages

KuntaiDu/production-stack-router

Folders and files

Latest commit

History

Repository files navigation

production-stack-router

Usage:

Benchmarking results:

Abstractions

Prefix Matcher

Unavailable Endpoints Detector

Hyper-parameters

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages