-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use vllm metrics for routing #274
Conversation
return metricValue, nil | ||
} | ||
|
||
func parseMetricFromBody(body []byte, metricName string) (float64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
autoscaler has similar features. Let's refactor this part later and make sure cache and autoscaler fetcher can use same library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Let me discuss with him
PR looks good to me. |
Updated the MR with cache to pull metrics once for each pod. |
* Cache bug fix in update pod and model mapping (#259) * test * Use vllm metrics for routing * nit reverts * update log level * refactor cache to fetch metrics once * remove port from random routing
No description provided.