Add vllm graceful termination configuration #568

nwangfw · 2025-01-13T16:56:12Z

Pull Request Description

This PR is used to add graceful termination after a workload pod is deleted by podautoscaler. In the current configuration, 5 mins is given for the pod to finish running and pending requests before it is actually deleted.

Related Issues

Resolves: #553

Important: Before submitting, please complete the description above and review the checklist below.

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

Jeffwan · 2025-01-14T18:54:38Z

benchmarks/autoscaling/7b.yaml

@@ -22,6 +22,7 @@ spec:
      labels:
        model.aibrix.ai/name: deepseek-coder-7b-instruct
    spec:
+      terminationGracePeriodSeconds: 300


Even considering we may have request in the queue, this is probably a little long. Our longest query probably take ~30s. Did you see long request there? I did see your prestop hook exit if there's no running or waiting requests but there's some extreme case that vLLM may experience some issue which will delay the terminting process to 300s

In my tests, the terminating process will be delayed up to 300s only if there are hundreds of pending requests. In other cases, the termination process will not be that long.

I think 30s should be good enough for a single long query. However, what about if there are multiple pending requests? Should we triple it, say 90s?

Agree. 60 or 90s makes more sense.

I commit a suggestion to change to 60 at this moment.

benchmarks/autoscaling/7b.yaml

Jeffwan · 2025-01-14T19:01:58Z

benchmarks/autoscaling/7b.yaml

+                      RUNNING=$(curl -s http://localhost:8000/metrics | grep 'vllm:num_requests_running' | grep -v '#' | awk '{print $2}')
+                      WAITING=$(curl -s http://localhost:8000/metrics | grep 'vllm:num_requests_waiting' | grep -v '#' | awk '{print $2}')
+                      if [ "$RUNNING" = "0.0" ] && [ "$WAITING" = "0.0" ]; then
+                        echo "Terminating: No active or waiting requests, safe to terminate" >> /proc/1/fd/1


em. here you forward the logs to main container outputs. this way works. Technically, we can also check FailedPreStopHook. I am ok with this way.

BTW, in this way, we manually handle the termination handling. did you check whether vLLM itself handles the termination? for example, send SIGTERM does it exit immediately or wait for the request to be finished?

BTW, in this way, we manually handle the termination handling. did you check whether vLLM itself handles the termination? for example, send SIGTERM does it exit immediately or wait for the request to be finished?

I tested and the vllm will exit immediately.

Jeffwan · 2025-01-15T01:39:37Z

/cc this is a change @brosoul should be aware of.

benchmarks/autoscaling/7b.yaml

Signed-off-by: Jiaxin Shan <[email protected]>

* vllm gracefull termination configed * Update benchmarks/autoscaling/7b.yaml Signed-off-by: Jiaxin Shan <[email protected]> --------- Signed-off-by: Jiaxin Shan <[email protected]> Co-authored-by: Jiaxin Shan <[email protected]>

nwangfw requested review from zhangjyr and Jeffwan January 13, 2025 16:56

nwangfw added area/autoscaling area/heterogeneous labels Jan 13, 2025

nwangfw added this to the v0.2.0 milestone Jan 13, 2025

vllm gracefull termination configed

efea10c

nwangfw force-pushed the ning/vllm-graceful-shutdown branch from e52e3de to efea10c Compare January 13, 2025 22:56

Jeffwan reviewed Jan 14, 2025

View reviewed changes

Jeffwan reviewed Jan 16, 2025

View reviewed changes

benchmarks/autoscaling/7b.yaml Outdated Show resolved Hide resolved

Update benchmarks/autoscaling/7b.yaml

e7c9825

Signed-off-by: Jiaxin Shan <[email protected]>

Jeffwan approved these changes Jan 16, 2025

View reviewed changes

Jeffwan merged commit 0e9dd75 into main Jan 16, 2025
2 checks passed

Jeffwan deleted the ning/vllm-graceful-shutdown branch January 16, 2025 05:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vllm graceful termination configuration #568

Add vllm graceful termination configuration #568

nwangfw commented Jan 13, 2025

Jeffwan Jan 14, 2025

nwangfw Jan 15, 2025 •

edited

Loading

Jeffwan Jan 16, 2025

Jeffwan Jan 16, 2025

nwangfw Jan 16, 2025

Jeffwan Jan 14, 2025

Jeffwan Jan 14, 2025

nwangfw Jan 16, 2025

Jeffwan commented Jan 15, 2025

Add vllm graceful termination configuration #568

Add vllm graceful termination configuration #568

Conversation

nwangfw commented Jan 13, 2025

Pull Request Description

Related Issues

Pull Request Title Format

Submission Checklist

Jeffwan Jan 14, 2025

Choose a reason for hiding this comment

nwangfw Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Jeffwan Jan 16, 2025

Choose a reason for hiding this comment

Jeffwan Jan 16, 2025

Choose a reason for hiding this comment

nwangfw Jan 16, 2025

Choose a reason for hiding this comment

Jeffwan Jan 14, 2025

Choose a reason for hiding this comment

Jeffwan Jan 14, 2025

Choose a reason for hiding this comment

nwangfw Jan 16, 2025

Choose a reason for hiding this comment

Jeffwan commented Jan 15, 2025

nwangfw Jan 15, 2025 •

edited

Loading