Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion - QPS routing when there are multiple router replicas #166

Open
aishwaryaraimule21 opened this issue Feb 21, 2025 · 2 comments
Open
Labels
discussion question Further information is requested

Comments

@aishwaryaraimule21
Copy link

When multiple router replicas are deployed, the QPS routing on a router replica only relies on its in-memory Request Stats.
This means that the routing logic doesn't consider the overall QPS on the serving URL but only the QPS routed via this replica.
This might cause the overload the URL when it has more requests coming through other router replicas.

@aishwaryaraimule21 aishwaryaraimule21 changed the title Discussion - QPS routing done in cases of multiple router replicas Discussion - QPS routing when there are multiple router replicas Feb 21, 2025
@gaocegege gaocegege added question Further information is requested discussion labels Feb 23, 2025
@gaocegege
Copy link
Collaborator

Yes, it may be better if we could make it in prom metrics and get the stats there.

@ApostaC
Copy link
Collaborator

ApostaC commented Feb 24, 2025

Thanks for submitting the issue! Seems that vLLM does not directly report QPS in their metrics endpoint. That's the reason why we calculate the stats by ourselves.


Sorry, I got it wrong. We do need to have some global information, but the interface is not defined yet. Welcome to share if you have any thoughts on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants