Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Router] Support Batch API part 2 #109

Merged
merged 2 commits into from
Feb 17, 2025
Merged

Conversation

gaocegege
Copy link
Collaborator

@gaocegege gaocegege commented Feb 11, 2025

Ref #47

This PR introduces a new abstraction called BatchProcessor along with a default implementation, LocalBatchProcessor, which utilizes local synchronous tasks and SQLite to support batch APIs.

This PR works with this example https://github.com/vllm-project/production-stack/pull/109/files#diff-27b40f07027e82bfd7a96ae291892a045f6fcf3c1954ca8d357508b9b233066a. But currently, the core logic for handling batch requests to the backend inferences is not included https://github.com/vllm-project/production-stack/pull/109/files#diff-fcc40545eb2d5c49343a84bf1a2f326966017ed5651395c3f51092a459cbb755R180. I’ve identified the need for some refactoring of the HTTPXClientWrapper and the router to enable this during the implementation process.

I suggest merging this PR first, and then I will submit another to refactor the existing router.py to further implement the batch API. It will not affect the original behavior since there is a flag enable-batch-api to control.


PR Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

@gaocegege gaocegege marked this pull request as ready for review February 11, 2025 12:10
@gaocegege gaocegege requested a review from ApostaC February 11, 2025 12:11
@ApostaC
Copy link
Collaborator

ApostaC commented Feb 11, 2025

Thanks! We will take a look soon!
cc @YuhanLiu11 @Shaoting-Feng

Copy link
Collaborator

@Shaoting-Feng Shaoting-Feng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contributions! Just left some comments.

@Shaoting-Feng
Copy link
Collaborator

Could you add more supports to run your example?

  1. The change in helm charts: the deployment of router (to enable batch API) and the examples values.yaml
  2. The Dockerfile to build router and install the new packages

Signed-off-by: Ce Gao <[email protected]>
Signed-off-by: Ce Gao <[email protected]>
@gaocegege
Copy link
Collaborator Author

gaocegege commented Feb 15, 2025

Could you add more supports to run your example?

  1. The change in helm charts: the deployment of router (to enable batch API) and the examples values.yaml
  2. The Dockerfile to build router and install the new packages

Hi, I believe it's not the right time since the batch API isn't ready yet, and we haven't implemented the routing logic (https://github.com/vllm-project/production-stack/pull/109/files#diff-fcc40545eb2d5c49343a84bf1a2f326966017ed5651395c3f51092a459cbb755R179). I plan to address this in a future PR.

I suggest merging this PR first, and then I will submit another to refactor the existing router.py to further implement the batch API. It will not affect the original behavior since there is a flag enable-batch-api to control.

@gaocegege
Copy link
Collaborator Author

Please take a look, thanks for your time 😄

@ApostaC
Copy link
Collaborator

ApostaC commented Feb 16, 2025

Hey @gaocegege , sorry I was pretty busy with some other stuff these days. Will take a look at this asap!

Copy link
Collaborator

@Shaoting-Feng Shaoting-Feng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in the router and the examples look good to me. Thanks for adding v1 to the @app.post in your latest commit.

@Shaoting-Feng Shaoting-Feng merged commit 3084283 into vllm-project:main Feb 17, 2025
8 checks passed
@gaocegege gaocegege deleted the batch branch February 17, 2025 07:08
sitloboi2012 pushed a commit to sitloboi2012/production-stack that referenced this pull request Feb 17, 2025
* feat: Support batch

Signed-off-by: Ce Gao <[email protected]>

* fix: Add TODO

Signed-off-by: Ce Gao <[email protected]>

---------

Signed-off-by: Ce Gao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants