You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When multiple router replicas are deployed, the QPS routing on a router replica only relies on its in-memory Request Stats.
This means that the routing logic doesn't consider the overall QPS on the serving URL but only the QPS routed via this replica.
This might cause the overload the URL when it has more requests coming through other router replicas.
The text was updated successfully, but these errors were encountered:
aishwaryaraimule21
changed the title
Discussion - QPS routing done in cases of multiple router replicas
Discussion - QPS routing when there are multiple router replicas
Feb 21, 2025
Thanks for submitting the issue! Seems that vLLM does not directly report QPS in their metrics endpoint. That's the reason why we calculate the stats by ourselves.
Sorry, I got it wrong. We do need to have some global information, but the interface is not defined yet. Welcome to share if you have any thoughts on this.
When multiple router replicas are deployed, the QPS routing on a router replica only relies on its in-memory Request Stats.
This means that the routing logic doesn't consider the overall QPS on the serving URL but only the QPS routed via this replica.
This might cause the overload the URL when it has more requests coming through other router replicas.
The text was updated successfully, but these errors were encountered: