-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Router] Support Batch API part 2 #109
Conversation
Thanks! We will take a look soon! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contributions! Just left some comments.
Could you add more supports to run your example?
|
Signed-off-by: Ce Gao <[email protected]>
Signed-off-by: Ce Gao <[email protected]>
Hi, I believe it's not the right time since the batch API isn't ready yet, and we haven't implemented the routing logic (https://github.com/vllm-project/production-stack/pull/109/files#diff-fcc40545eb2d5c49343a84bf1a2f326966017ed5651395c3f51092a459cbb755R179). I plan to address this in a future PR. I suggest merging this PR first, and then I will submit another to refactor the existing router.py to further implement the batch API. It will not affect the original behavior since there is a flag enable-batch-api to control. |
Please take a look, thanks for your time 😄 |
Hey @gaocegege , sorry I was pretty busy with some other stuff these days. Will take a look at this asap! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in the router and the examples look good to me. Thanks for adding v1
to the @app.post
in your latest commit.
* feat: Support batch Signed-off-by: Ce Gao <[email protected]> * fix: Add TODO Signed-off-by: Ce Gao <[email protected]> --------- Signed-off-by: Ce Gao <[email protected]>
Ref #47
This PR introduces a new abstraction called BatchProcessor along with a default implementation, LocalBatchProcessor, which utilizes local synchronous tasks and SQLite to support batch APIs.
This PR works with this example https://github.com/vllm-project/production-stack/pull/109/files#diff-27b40f07027e82bfd7a96ae291892a045f6fcf3c1954ca8d357508b9b233066a. But currently, the core logic for handling batch requests to the backend inferences is not included https://github.com/vllm-project/production-stack/pull/109/files#diff-fcc40545eb2d5c49343a84bf1a2f326966017ed5651395c3f51092a459cbb755R180. I’ve identified the need for some refactoring of the HTTPXClientWrapper and the router to enable this during the implementation process.
I suggest merging this PR first, and then I will submit another to refactor the existing router.py to further implement the batch API. It will not affect the original behavior since there is a flag
enable-batch-api
to control.PR Checklist (Click to Expand)
Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]
for bug fixes.[CI/Build]
for build or continuous integration improvements.[Doc]
for documentation fixes and improvements.[Feat]
for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).[Router]
for changes to thevllm_router
(e.g., routing algorithm, router observability, etc.).[Misc]
for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
pre-commit
to format your code. SeeREADME.md
for installation.DCO and Signed-off-by
When contributing changes to this project, you must agree to the DCO. Commits must include a
Signed-off-by:
header which certifies agreement with the terms of the DCO.Using
-s
withgit commit
will automatically add this header.What to Expect for the Reviews
We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.