Feat: Router observability (Current QPS, router-side queueing delay, etc) Part 1 #119

sitloboi2012 · 2025-02-12T17:44:30Z

Feature Router observability (Current QPS, router-side queueing delay, etc) Part 1: #78

Due to the complexity and large task, I will split this feature into 2 sub-small PR for easier manage:

First one will be the setup visualization, update router for localhost logging, etc.
Second will be connect the Router with Prometheus and visualize the output on Grafana

Description

This PR is to create 4 main things for the first part of the Router Observability update:

Update vLLM Dashboard Grafana to align with the reference from @YuhanLiu11 combine with add-on. The metrics include
- Overview of the system:
  - Available vLLM instances
  - Request latency distribution
  - Number of currently healthy vLLM pods
  - Average latency
- QoS Information:
  - Current QPS
  - Request TTFT distribution
  - Router-side Queueing Delay
  - Average Prefill Length
  - Number of Prefilling Requests
  - Number of Decoding Requests
  - Average Decoding Length
  - Average ITL (Inter-Token Latency)
- Serving Engine Load:
  - Number of Running Requests
  - Number of Pending Requests
  - GPU KV Usage Percentage
  - GPU KV Cache Hit Rate
  - Number of swapped requests
- Current Resource Usage:
  - GPU Usage
  - CPU Usage
  - Memory Usage
  - Disk Usage
Update router.py to tracking information related to the vLLM Dashboard, add on logging for easier tracking
Update README.md file to include exposing port for Prometheus dashboard as well
Update utils/install-minikube-cluster.sh to include update minikube context, restart docker when setting up the observability, add on logging for easier interpretation
Update engine_stats.py and request_stats.py to align with the metrics information

Minor update:

Update requirements.txt for tests and vllm_router
Update observe/install.sh from helm install to helm upgrade to solve the problem exist namespace when re-install
Update base_url in tests/perftest to align with latest v1 endpoint
Update documentation to all the files for easier readability and understand how the functions work

Next step:

Connect router.py tracking to Prometheus and visualize it on Grafana dashboard
Add on metrics calculation in router.py to calculate some customize metrics
Update README.md file to explain metrics and dashboard information

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE

PR Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
[Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

sitloboi2012 · 2025-02-12T17:46:19Z

@ApostaC @gaocegege @YuhanLiu11 please help me review the first part of the PR. I split into 2 small PRs to make it easier to manage because this feature will require quite a amount of files to be changes if do all-in-one. This would also save time to check the work as well 😃

ApostaC · 2025-02-12T19:19:33Z

@sitloboi2012 Thank Really appreciate your contribution!

@YuhanLiu11 @Shaoting-Feng PTAL 🙏

YuhanLiu11 · 2025-02-12T19:59:21Z

@sitloboi2012 Thanks for your contribution!

Can you fix the pre-commit errors?

sitloboi2012 · 2025-02-13T03:44:13Z

hi @YuhanLiu11 @Shaoting-Feng, I ran pre-commit again, please help me check this PR

ApostaC · 2025-02-13T03:49:28Z

The PR seems to contain a lot of unrelated stuff (the changes in previous PRs). Is it caused by the rebase?

@sitloboi2012 Can you try fixing this? Maybe cherry-pick your modifications? Or create a new PR based on the diff with the latest main?

ApostaC · 2025-02-13T03:51:07Z

Btw, this is really great work and we appreciate your contribution! Hopefully the PR problem can be fixed easily.

sitloboi2012 · 2025-02-13T04:29:02Z

yep should I will cherry picking the stuff out to clean the PR

sitloboi2012 · 2025-02-13T04:37:17Z

@ApostaC should be good now, no more redundant file changes 😃

ApostaC · 2025-02-13T05:14:49Z

Great! Will take a look soon! Thanks!

YuhanLiu11 · 2025-02-12T20:16:11Z

src/vllm_router/router.py

+    stats = GetRequestStatsMonitor().get_request_stats(time.time())
+    for server, stat in stats.items():
+        current_qps.labels(server=server).set(stat.qps)
+        avg_decoding_length.labels(server=server).set(stat.ttft)


Why using TTFT as decoding length here?

YuhanLiu11 · 2025-02-12T21:15:57Z

src/vllm_router/router.py

+                es = engine_stats[url]
+                logstr += (
+                    f" Engine Stats (Dashboard): Running Requests: {es.num_running_requests}, "
+                    f"Queueing Delay (requests): {es.num_queuing_requests}, "


Can you change this name to "Queueing Delay"?

src/vllm_router/router.py

sitloboi2012 · 2025-02-13T15:45:14Z

hi @ApostaC @YuhanLiu11 , I updated the PR with some update related to the metrics and other minor fixes to align with the latest PR that we currently have on main for the tests part. I also updated the PR message again as well so please have a look through. Cheers 😄

YuhanLiu11 · 2025-02-13T19:08:42Z

hi @ApostaC @YuhanLiu11 , I updated the PR with some update related to the metrics and other minor fixes to align with the latest PR that we currently have on main for the tests part. I also updated the PR message again as well so please have a look through. Cheers 😄

Thanks for your great contribution! We will take a look soon.

Shaoting-Feng

Thanks for your contirbution.

The Prometheus Metrics endpoint in the router looks good to me. However, could you please leave the other parts of the router unchanged? I've noticed many changes, such as comment modifications and variable redefinitions, that seem unrelated to this PR.

src/vllm_router/request_stats.py

Shaoting-Feng · 2025-02-13T19:08:36Z

src/vllm_router/router.py

-    """
-    Route the incoming request to the backend server and stream the response
-    back to the client.
-    """


Could you please keep these comments?

Shaoting-Feng · 2025-02-13T19:08:45Z

src/vllm_router/router.py

    in_router_time = time.time()
    request_id = str(uuid.uuid4())
-
-    # TODO (ApostaC): merge two awaits into one


Could you please keep these comments?

Shaoting-Feng · 2025-02-13T19:10:14Z

src/vllm_router/router.py

    server_url = GetRoutingLogic().route_request(
        endpoints, engine_stats, request_stats, request
    )
-
+    logger.info(f"Request {request_id} routed to {server_url}")


This information has already been logged in the two lines below.

Shaoting-Feng · 2025-02-13T19:12:06Z

src/vllm_router/router.py

-        purpose = "unknown"
-    else:
-        purpose = form["purpose"]
+    purpose = form.get("purpose", "unknown")


Do we need to delete the conditional statement ?

Shaoting-Feng · 2025-02-13T19:12:30Z

src/vllm_router/router.py

@@ -181,8 +194,7 @@ async def route_get_file(file_id: str):
 @app.get("/v1/files/{file_id}/content")
 async def route_get_file_content(file_id: str):
    try:
-        # TODO(gaocegege): Stream the file content with chunks to support
-        # openai uploads interface.
+        # TODO: Stream file content in chunks to support large files.


Could you please keep these comments?

Shaoting-Feng · 2025-02-13T19:13:41Z

src/vllm_router/router.py

-        raise ValueError(
-            "Static backends must be provided when using static service discovery."
-        )
-


Do we need to delete this conditional statement?

sitloboi2012 · 2025-02-14T17:09:24Z

hi @ApostaC @YuhanLiu11 @Shaoting-Feng , I updated the code again based on your comment, please help me review it again please, cheers team 😄

src/vllm_router/request_stats.py

YuhanLiu11 · 2025-02-17T00:44:32Z

hey @sitloboi2012 , thanks again for your great contribution! Can you also provide some examples showing the code can run and produce expected output?

sitloboi2012 · 2025-02-17T01:13:07Z

yep sure, let me update the code to include your comment, test case and instruction on how to reproduce the work

gaocegege · 2025-02-17T09:32:54Z

Please resolve the conflicts.

sitloboi2012 · 2025-02-18T01:24:10Z

hi @YuhanLiu11, to test this code, you can use the following file that has been available. Currently the work that connecting between router + prometheus + grafana is WIP and this PR only focusing on setting the stuff that will be use in the next step as I mentioned in the description of the PR.

To test vllm-dashboard.json, you can first run utils/install-minikube-cluster.sh to setup K8S, Helm Chart, etc. After that run observability/install.sh to install the kube-prom-stack-grafana. After that you should be able to port-forwarding and login to Grafana and upload the vllm-dashboard.json in to view the layout
To test router.py new metrics info, you can setup the src/tests/fake-openai-service with bash run-server.sh 9000 2. After that, setup the main src/vllm_router/router.py with bash run-router.sh 8000. Finally, trigger the fake request generator with python request_generator.py --qps 1 --num-workers 1. This would help you seeing the metrics are being log out into console log

If run successfully, you should have some output like this in your terminal:

[2025-02-18 08:31:16,075] DEBUG: Got session id: 12196 (routing_logic.py:153:vllm_router.routing_logic)
INFO:     Routing request 30a133a0-ec2f-4c4d-ba4f-e25ce547daa9 to http://localhost:9000 at 1739842276.0759501, process time = 0.0002
INFO:     Started request 30a133a0-ec2f-4c4d-ba4f-e25ce547daa9 for backend http://localhost:9000
INFO:     127.0.0.1:38004 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     
==================================================
Model: fake_model_name
Server: http://localhost:9000
 Engine Stats: Running Requests: 19.0, Queued Requests: 0.0, GPU Cache Hit Rate: 0.00
 Request Stats: QPS: 1.00, Avg Latency: -1, Avg ITL: -1, Prefill Requests: 0, Decoding Requests: 29, Swapped Requests: 0, Finished: 0, Uptime: 28.75 sec
--------------------------------------------------
==================================================

[2025-02-18 08:31:16,839] INFO: Scraping metrics from 1 serving engine(s) (engine_stats.py:114:vllm_router.engine_stats)

YuhanLiu11 · 2025-02-18T03:36:41Z

hi @YuhanLiu11, to test this code, you can use the following file that has been available. Currently the work that connecting between router + prometheus + grafana is WIP and this PR only focusing on setting the stuff that will be use in the next step as I mentioned in the description of the PR.

To test vllm-dashboard.json, you can first run utils/install-minikube-cluster.sh to setup K8S, Helm Chart, etc. After that run observability/install.sh to install the kube-prom-stack-grafana. After that you should be able to port-forwarding and login to Grafana and upload the vllm-dashboard.json in to view the layout

To test router.py new metrics info, you can setup the src/tests/fake-openai-service with bash run-server.sh 9000 2. After that, setup the main src/vllm_router/router.py with bash run-router.sh 8000. Finally, trigger the fake request generator with python request_generator.py --qps 1 --num-workers 1. This would help you seeing the metrics are being log out into console log

If run successfully, you should have some output like this in your terminal:
[2025-02-18 08:31:16,075] DEBUG: Got session id: 12196 (routing_logic.py:153:vllm_router.routing_logic)
INFO:     Routing request 30a133a0-ec2f-4c4d-ba4f-e25ce547daa9 to http://localhost:9000 at 1739842276.0759501, process time = 0.0002
INFO:     Started request 30a133a0-ec2f-4c4d-ba4f-e25ce547daa9 for backend http://localhost:9000
INFO:     127.0.0.1:38004 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     
==================================================
Model: fake_model_name
Server: http://localhost:9000
 Engine Stats: Running Requests: 19.0, Queued Requests: 0.0, GPU Cache Hit Rate: 0.00
 Request Stats: QPS: 1.00, Avg Latency: -1, Avg ITL: -1, Prefill Requests: 0, Decoding Requests: 29, Swapped Requests: 0, Finished: 0, Uptime: 28.75 sec
--------------------------------------------------
==================================================

[2025-02-18 08:31:16,839] INFO: Scraping metrics from 1 serving engine(s) (engine_stats.py:114:vllm_router.engine_stats)

Thanks! Will take a look soon.

YuhanLiu11 · 2025-02-18T05:52:31Z

hey @sitloboi2012 I'm able to follow the instructions you gave and get the the expected output.

Can you fix the pre-commit issues? (sorry for going back-and-forth for the tests..) and then I can merge your PR.

Thanks again!

…ree vLLM metrics, operational metrics, router observe metrics, update requirement.txt for router and tests, update install-minikube-cluster to be more logging info, restart docker service and minikube context after the run Signed-off-by: sitloboi2012 <[email protected]>

…YuhanLiu11

…YuhanLiu11 Signed-off-by: sitloboi2012 <[email protected]>

…service discovery Signed-off-by: sitloboi2012 <[email protected]>

Signed-off-by: sitloboi2012 <[email protected]>

…ree vLLM metrics, operational metrics, router observe metrics, update requirement.txt for router and tests, update install-minikube-cluster to be more logging info, restart docker service and minikube context after the run Signed-off-by: sitloboi2012 <[email protected]>

…YuhanLiu11

…YuhanLiu11 Signed-off-by: sitloboi2012 <[email protected]>

Signed-off-by: sitloboi2012 <[email protected]>

…s.py and request_stats.py, update observe/install.sh to helm update to solve the problem already run helm service same name, update dashboard layout with latest metrics, update request_generator to align with the new endpoint /v1 Signed-off-by: sitloboi2012 <[email protected]>

Signed-off-by: sitloboi2012 <[email protected]>

sitloboi2012 · 2025-02-18T12:14:02Z

hi @YuhanLiu11 just updated, merged and ran pre-commit for formatting and linting, please help me merge and close this PR 😄 thank you team

gaocegege · 2025-02-19T02:02:25Z

The test Functionality test for helm chart / Multiple-Models (pull_request) is flaky. I rerun it several times to pass it. Now I think it is ready to merge. /cc @YuhanLiu11

YuhanLiu11

LGTM now. Thanks!

sitloboi2012 force-pushed the feat/update-prom-dashboard branch from 29260ac to 2ba572e Compare February 12, 2025 17:45

sitloboi2012 force-pushed the feat/update-prom-dashboard branch 2 times, most recently from 69e88fb to 1261ee4 Compare February 13, 2025 03:43

sitloboi2012 force-pushed the feat/update-prom-dashboard branch from 1261ee4 to 209dac7 Compare February 13, 2025 04:35

ApostaC self-requested a review February 13, 2025 05:14

sitloboi2012 force-pushed the feat/update-prom-dashboard branch from c355195 to c2033eb Compare February 13, 2025 05:15

YuhanLiu11 reviewed Feb 13, 2025

View reviewed changes

Shaoting-Feng reviewed Feb 13, 2025

View reviewed changes

sitloboi2012 force-pushed the feat/update-prom-dashboard branch from ea69391 to 4cb025f Compare February 14, 2025 17:08

sitloboi2012 force-pushed the feat/update-prom-dashboard branch from 4cb025f to 77e09e3 Compare February 14, 2025 17:24

sitloboi2012 requested review from Shaoting-Feng and YuhanLiu11 February 16, 2025 14:31

YuhanLiu11 reviewed Feb 16, 2025

View reviewed changes

src/vllm_router/request_stats.py Outdated Show resolved Hide resolved

YuhanLiu11 reviewed Feb 16, 2025

View reviewed changes

src/vllm_router/request_stats.py Outdated Show resolved Hide resolved

sitloboi2012 force-pushed the feat/update-prom-dashboard branch from 77e09e3 to feec809 Compare February 18, 2025 00:54

sitloboi2012 requested a review from YuhanLiu11 February 18, 2025 03:22

sitloboi2012 added 14 commits February 18, 2025 19:10

update the router and vllm-dashboard to align with the reference from @…

ebdb6b2

…YuhanLiu11 Signed-off-by: sitloboi2012 <[email protected]>

update the run router for case with static service discovery and k8s …

c133f82

…service discovery Signed-off-by: sitloboi2012 <[email protected]>

rebase with main

629f88d

Signed-off-by: sitloboi2012 <[email protected]>

update the router and vllm-dashboard to align with the reference from @…

c4b1405

…YuhanLiu11 Signed-off-by: sitloboi2012 <[email protected]>

run pre-commit for linting

ef19047

Signed-off-by: sitloboi2012 <[email protected]>

remove conflict info data in router.py

61efe0c

Signed-off-by: sitloboi2012 <[email protected]>

running pre-commit to make sure linting pass

14669bc

Signed-off-by: sitloboi2012 <[email protected]>

update vllm dashboard to remove redundancy

5e4bf85

Signed-off-by: sitloboi2012 <[email protected]>

remove redundancy in run-router.sh

90e885c

Signed-off-by: sitloboi2012 <[email protected]>

update based on Shaoting-Feng comment

beaaa62

Signed-off-by: sitloboi2012 <[email protected]>

modify to keep the original comment info

d7b7f4b

Signed-off-by: sitloboi2012 <[email protected]>

sitloboi2012 force-pushed the feat/update-prom-dashboard branch from 4f190d3 to d7b7f4b Compare February 18, 2025 12:11

run pre-commit for formatting

03cf9a9

Signed-off-by: sitloboi2012 <[email protected]>

gaocegege mentioned this pull request Feb 19, 2025

bug: flaky test case Functionality test for helm chart / Multiple-Models #152

Open

YuhanLiu11 approved these changes Feb 19, 2025

View reviewed changes

YuhanLiu11 merged commit 3275cf0 into vllm-project:main Feb 19, 2025
9 checks passed

ApostaC mentioned this pull request Feb 24, 2025

[Roadmap] vLLM production stack roadmap for 2025 Q1 #26

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Router observability (Current QPS, router-side queueing delay, etc) Part 1 #119

Feat: Router observability (Current QPS, router-side queueing delay, etc) Part 1 #119

sitloboi2012 commented Feb 12, 2025 •

edited

Loading

sitloboi2012 commented Feb 12, 2025 •

edited

Loading

ApostaC commented Feb 12, 2025

YuhanLiu11 commented Feb 12, 2025

sitloboi2012 commented Feb 13, 2025

ApostaC commented Feb 13, 2025

ApostaC commented Feb 13, 2025

sitloboi2012 commented Feb 13, 2025

sitloboi2012 commented Feb 13, 2025

ApostaC commented Feb 13, 2025

YuhanLiu11 Feb 12, 2025

YuhanLiu11 Feb 12, 2025

sitloboi2012 commented Feb 13, 2025

YuhanLiu11 commented Feb 13, 2025

Shaoting-Feng left a comment

Shaoting-Feng Feb 13, 2025

Shaoting-Feng Feb 13, 2025

Shaoting-Feng Feb 13, 2025

Shaoting-Feng Feb 13, 2025

Shaoting-Feng Feb 13, 2025

Shaoting-Feng Feb 13, 2025

sitloboi2012 commented Feb 14, 2025

YuhanLiu11 commented Feb 17, 2025

sitloboi2012 commented Feb 17, 2025

gaocegege commented Feb 17, 2025

sitloboi2012 commented Feb 18, 2025 •

edited

Loading

YuhanLiu11 commented Feb 18, 2025

YuhanLiu11 commented Feb 18, 2025

sitloboi2012 commented Feb 18, 2025

gaocegege commented Feb 19, 2025 •

edited

Loading

YuhanLiu11 left a comment

Feat: Router observability (Current QPS, router-side queueing delay, etc) Part 1 #119

Feat: Router observability (Current QPS, router-side queueing delay, etc) Part 1 #119

Conversation

sitloboi2012 commented Feb 12, 2025 • edited Loading

Description

PR Title and Classification

Code Quality

DCO and Signed-off-by

What to Expect for the Reviews

sitloboi2012 commented Feb 12, 2025 • edited Loading

ApostaC commented Feb 12, 2025

YuhanLiu11 commented Feb 12, 2025

sitloboi2012 commented Feb 13, 2025

ApostaC commented Feb 13, 2025

ApostaC commented Feb 13, 2025

sitloboi2012 commented Feb 13, 2025

sitloboi2012 commented Feb 13, 2025

ApostaC commented Feb 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sitloboi2012 commented Feb 13, 2025

YuhanLiu11 commented Feb 13, 2025

Shaoting-Feng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sitloboi2012 commented Feb 14, 2025

YuhanLiu11 commented Feb 17, 2025

sitloboi2012 commented Feb 17, 2025

gaocegege commented Feb 17, 2025

sitloboi2012 commented Feb 18, 2025 • edited Loading

YuhanLiu11 commented Feb 18, 2025

YuhanLiu11 commented Feb 18, 2025

sitloboi2012 commented Feb 18, 2025

gaocegege commented Feb 19, 2025 • edited Loading

YuhanLiu11 left a comment

Choose a reason for hiding this comment

sitloboi2012 commented Feb 12, 2025 •

edited

Loading

sitloboi2012 commented Feb 12, 2025 •

edited

Loading

sitloboi2012 commented Feb 18, 2025 •

edited

Loading

gaocegege commented Feb 19, 2025 •

edited

Loading