Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Router observability (Current QPS, router-side queueing delay, etc) Part 1 #119

Merged
merged 15 commits into from
Feb 19, 2025

Conversation

sitloboi2012
Copy link
Contributor

@sitloboi2012 sitloboi2012 commented Feb 12, 2025

Feature Router observability (Current QPS, router-side queueing delay, etc) Part 1: #78

Due to the complexity and large task, I will split this feature into 2 sub-small PR for easier manage:

  • First one will be the setup visualization, update router for localhost logging, etc.
  • Second will be connect the Router with Prometheus and visualize the output on Grafana

Description

This PR is to create 4 main things for the first part of the Router Observability update:

  • Update vLLM Dashboard Grafana to align with the reference from @YuhanLiu11 combine with add-on. The metrics include

    • Overview of the system:
      • Available vLLM instances
      • Request latency distribution
      • Number of currently healthy vLLM pods
      • Average latency
    • QoS Information:
      • Current QPS
      • Request TTFT distribution
      • Router-side Queueing Delay
      • Average Prefill Length
      • Number of Prefilling Requests
      • Number of Decoding Requests
      • Average Decoding Length
      • Average ITL (Inter-Token Latency)
    • Serving Engine Load:
      • Number of Running Requests
      • Number of Pending Requests
      • GPU KV Usage Percentage
      • GPU KV Cache Hit Rate
      • Number of swapped requests
    • Current Resource Usage:
      • GPU Usage
      • CPU Usage
      • Memory Usage
      • Disk Usage
  • Update router.py to tracking information related to the vLLM Dashboard, add on logging for easier tracking

  • Update README.md file to include exposing port for Prometheus dashboard as well

  • Update utils/install-minikube-cluster.sh to include update minikube context, restart docker when setting up the observability, add on logging for easier interpretation

  • Update engine_stats.py and request_stats.py to align with the metrics information

Minor update:

  • Update requirements.txt for tests and vllm_router
  • Update observe/install.sh from helm install to helm upgrade to solve the problem exist namespace when re-install
  • Update base_url in tests/perftest to align with latest v1 endpoint
  • Update documentation to all the files for easier readability and understand how the functions work

Next step:

  • Connect router.py tracking to Prometheus and visualize it on Grafana dashboard
  • Add on metrics calculation in router.py to calculate some customize metrics
  • Update README.md file to explain metrics and dashboard information

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE


PR Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

@sitloboi2012 sitloboi2012 force-pushed the feat/update-prom-dashboard branch from 29260ac to 2ba572e Compare February 12, 2025 17:45
@sitloboi2012
Copy link
Contributor Author

sitloboi2012 commented Feb 12, 2025

@ApostaC @gaocegege @YuhanLiu11 please help me review the first part of the PR. I split into 2 small PRs to make it easier to manage because this feature will require quite a amount of files to be changes if do all-in-one. This would also save time to check the work as well 😃

@ApostaC
Copy link
Collaborator

ApostaC commented Feb 12, 2025

@sitloboi2012 Thank Really appreciate your contribution!

@YuhanLiu11 @Shaoting-Feng PTAL 🙏

@YuhanLiu11
Copy link
Collaborator

@sitloboi2012 Thanks for your contribution!

Can you fix the pre-commit errors?

@sitloboi2012 sitloboi2012 force-pushed the feat/update-prom-dashboard branch 2 times, most recently from 69e88fb to 1261ee4 Compare February 13, 2025 03:43
@sitloboi2012
Copy link
Contributor Author

hi @YuhanLiu11 @Shaoting-Feng, I ran pre-commit again, please help me check this PR

@ApostaC
Copy link
Collaborator

ApostaC commented Feb 13, 2025

The PR seems to contain a lot of unrelated stuff (the changes in previous PRs). Is it caused by the rebase?

@sitloboi2012 Can you try fixing this? Maybe cherry-pick your modifications? Or create a new PR based on the diff with the latest main?

@ApostaC
Copy link
Collaborator

ApostaC commented Feb 13, 2025

Btw, this is really great work and we appreciate your contribution! Hopefully the PR problem can be fixed easily.

@sitloboi2012
Copy link
Contributor Author

yep should I will cherry picking the stuff out to clean the PR

@sitloboi2012 sitloboi2012 force-pushed the feat/update-prom-dashboard branch from 1261ee4 to 209dac7 Compare February 13, 2025 04:35
@sitloboi2012
Copy link
Contributor Author

@ApostaC should be good now, no more redundant file changes 😃

@ApostaC
Copy link
Collaborator

ApostaC commented Feb 13, 2025

Great! Will take a look soon! Thanks!

@ApostaC ApostaC self-requested a review February 13, 2025 05:14
@sitloboi2012 sitloboi2012 force-pushed the feat/update-prom-dashboard branch from c355195 to c2033eb Compare February 13, 2025 05:15
stats = GetRequestStatsMonitor().get_request_stats(time.time())
for server, stat in stats.items():
current_qps.labels(server=server).set(stat.qps)
avg_decoding_length.labels(server=server).set(stat.ttft)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using TTFT as decoding length here?

es = engine_stats[url]
logstr += (
f" Engine Stats (Dashboard): Running Requests: {es.num_running_requests}, "
f"Queueing Delay (requests): {es.num_queuing_requests}, "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this name to "Queueing Delay"?

@sitloboi2012
Copy link
Contributor Author

hi @ApostaC @YuhanLiu11 , I updated the PR with some update related to the metrics and other minor fixes to align with the latest PR that we currently have on main for the tests part. I also updated the PR message again as well so please have a look through. Cheers 😄

@YuhanLiu11
Copy link
Collaborator

hi @ApostaC @YuhanLiu11 , I updated the PR with some update related to the metrics and other minor fixes to align with the latest PR that we currently have on main for the tests part. I also updated the PR message again as well so please have a look through. Cheers 😄

Thanks for your great contribution! We will take a look soon.

Copy link
Collaborator

@Shaoting-Feng Shaoting-Feng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contirbution.

The Prometheus Metrics endpoint in the router looks good to me. However, could you please leave the other parts of the router unchanged? I've noticed many changes, such as comment modifications and variable redefinitions, that seem unrelated to this PR.

"""
Route the incoming request to the backend server and stream the response
back to the client.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please keep these comments?

in_router_time = time.time()
request_id = str(uuid.uuid4())

# TODO (ApostaC): merge two awaits into one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please keep these comments?

server_url = GetRoutingLogic().route_request(
endpoints, engine_stats, request_stats, request
)

logger.info(f"Request {request_id} routed to {server_url}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information has already been logged in the two lines below.

purpose = "unknown"
else:
purpose = form["purpose"]
purpose = form.get("purpose", "unknown")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to delete the conditional statement ?

@@ -181,8 +194,7 @@ async def route_get_file(file_id: str):
@app.get("/v1/files/{file_id}/content")
async def route_get_file_content(file_id: str):
try:
# TODO(gaocegege): Stream the file content with chunks to support
# openai uploads interface.
# TODO: Stream file content in chunks to support large files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please keep these comments?

raise ValueError(
"Static backends must be provided when using static service discovery."
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to delete this conditional statement?

@sitloboi2012 sitloboi2012 force-pushed the feat/update-prom-dashboard branch from ea69391 to 4cb025f Compare February 14, 2025 17:08
@sitloboi2012
Copy link
Contributor Author

hi @ApostaC @YuhanLiu11 @Shaoting-Feng , I updated the code again based on your comment, please help me review it again please, cheers team 😄

@YuhanLiu11
Copy link
Collaborator

hey @sitloboi2012 , thanks again for your great contribution! Can you also provide some examples showing the code can run and produce expected output?

@sitloboi2012
Copy link
Contributor Author

yep sure, let me update the code to include your comment, test case and instruction on how to reproduce the work

@gaocegege
Copy link
Collaborator

Please resolve the conflicts.

@sitloboi2012 sitloboi2012 force-pushed the feat/update-prom-dashboard branch from 77e09e3 to feec809 Compare February 18, 2025 00:54
@sitloboi2012
Copy link
Contributor Author

sitloboi2012 commented Feb 18, 2025

hi @YuhanLiu11, to test this code, you can use the following file that has been available. Currently the work that connecting between router + prometheus + grafana is WIP and this PR only focusing on setting the stuff that will be use in the next step as I mentioned in the description of the PR.

  • To test vllm-dashboard.json, you can first run utils/install-minikube-cluster.sh to setup K8S, Helm Chart, etc. After that run observability/install.sh to install the kube-prom-stack-grafana. After that you should be able to port-forwarding and login to Grafana and upload the vllm-dashboard.json in to view the layout
  • To test router.py new metrics info, you can setup the src/tests/fake-openai-service with bash run-server.sh 9000 2. After that, setup the main src/vllm_router/router.py with bash run-router.sh 8000. Finally, trigger the fake request generator with python request_generator.py --qps 1 --num-workers 1. This would help you seeing the metrics are being log out into console log

If run successfully, you should have some output like this in your terminal:

[2025-02-18 08:31:16,075] DEBUG: Got session id: 12196 (routing_logic.py:153:vllm_router.routing_logic)
INFO:     Routing request 30a133a0-ec2f-4c4d-ba4f-e25ce547daa9 to http://localhost:9000 at 1739842276.0759501, process time = 0.0002
INFO:     Started request 30a133a0-ec2f-4c4d-ba4f-e25ce547daa9 for backend http://localhost:9000
INFO:     127.0.0.1:38004 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     
==================================================
Model: fake_model_name
Server: http://localhost:9000
 Engine Stats: Running Requests: 19.0, Queued Requests: 0.0, GPU Cache Hit Rate: 0.00
 Request Stats: QPS: 1.00, Avg Latency: -1, Avg ITL: -1, Prefill Requests: 0, Decoding Requests: 29, Swapped Requests: 0, Finished: 0, Uptime: 28.75 sec
--------------------------------------------------
==================================================

[2025-02-18 08:31:16,839] INFO: Scraping metrics from 1 serving engine(s) (engine_stats.py:114:vllm_router.engine_stats)

@YuhanLiu11
Copy link
Collaborator

hi @YuhanLiu11, to test this code, you can use the following file that has been available. Currently the work that connecting between router + prometheus + grafana is WIP and this PR only focusing on setting the stuff that will be use in the next step as I mentioned in the description of the PR.

  • To test vllm-dashboard.json, you can first run utils/install-minikube-cluster.sh to setup K8S, Helm Chart, etc. After that run observability/install.sh to install the kube-prom-stack-grafana. After that you should be able to port-forwarding and login to Grafana and upload the vllm-dashboard.json in to view the layout
  • To test router.py new metrics info, you can setup the src/tests/fake-openai-service with bash run-server.sh 9000 2. After that, setup the main src/vllm_router/router.py with bash run-router.sh 8000. Finally, trigger the fake request generator with python request_generator.py --qps 1 --num-workers 1. This would help you seeing the metrics are being log out into console log

If run successfully, you should have some output like this in your terminal:

[2025-02-18 08:31:16,075] DEBUG: Got session id: 12196 (routing_logic.py:153:vllm_router.routing_logic)
INFO:     Routing request 30a133a0-ec2f-4c4d-ba4f-e25ce547daa9 to http://localhost:9000 at 1739842276.0759501, process time = 0.0002
INFO:     Started request 30a133a0-ec2f-4c4d-ba4f-e25ce547daa9 for backend http://localhost:9000
INFO:     127.0.0.1:38004 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     
==================================================
Model: fake_model_name
Server: http://localhost:9000
 Engine Stats: Running Requests: 19.0, Queued Requests: 0.0, GPU Cache Hit Rate: 0.00
 Request Stats: QPS: 1.00, Avg Latency: -1, Avg ITL: -1, Prefill Requests: 0, Decoding Requests: 29, Swapped Requests: 0, Finished: 0, Uptime: 28.75 sec
--------------------------------------------------
==================================================

[2025-02-18 08:31:16,839] INFO: Scraping metrics from 1 serving engine(s) (engine_stats.py:114:vllm_router.engine_stats)

Thanks! Will take a look soon.

@YuhanLiu11
Copy link
Collaborator

hey @sitloboi2012 I'm able to follow the instructions you gave and get the the expected output.

Can you fix the pre-commit issues? (sorry for going back-and-forth for the tests..) and then I can merge your PR.

Thanks again!

…ree vLLM metrics, operational metrics, router observe metrics, update requirement.txt for router and tests, update install-minikube-cluster to be more logging info, restart docker service and minikube context after the run

Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
…ree vLLM metrics, operational metrics, router observe metrics, update requirement.txt for router and tests, update install-minikube-cluster to be more logging info, restart docker service and minikube context after the run

Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
…s.py and request_stats.py, update observe/install.sh to helm update to solve the problem already run helm service same name, update dashboard layout with latest metrics, update request_generator to align with the new endpoint /v1

Signed-off-by: sitloboi2012 <[email protected]>
@sitloboi2012 sitloboi2012 force-pushed the feat/update-prom-dashboard branch from 4f190d3 to d7b7f4b Compare February 18, 2025 12:11
Signed-off-by: sitloboi2012 <[email protected]>
@sitloboi2012
Copy link
Contributor Author

hi @YuhanLiu11 just updated, merged and ran pre-commit for formatting and linting, please help me merge and close this PR 😄 thank you team

@gaocegege
Copy link
Collaborator

gaocegege commented Feb 19, 2025

The test Functionality test for helm chart / Multiple-Models (pull_request) is flaky. I rerun it several times to pass it. Now I think it is ready to merge. /cc @YuhanLiu11

Copy link
Collaborator

@YuhanLiu11 YuhanLiu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Thanks!

@YuhanLiu11 YuhanLiu11 merged commit 3275cf0 into vllm-project:main Feb 19, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants