-
-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1][Core] Generic mechanism for handling engine utility methods #13060
Conversation
We now have a number of utility / "control" operations that need to be called on the engine (add_lora, profile, sleep, wakeup, ...). It should be possible to call these in a synchronous manner to know when the operation is complete and whether it succeeded. Some operations in future may require results to be returned. These changes centralize the mechanism for doing this in V1, and add the result-returning part. Signed-off-by: Nick Hill <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/v1/engine/__init__.py # vllm/v1/engine/async_llm.py
Signed-off-by: Nick Hill <[email protected]>
# Conflicts: # vllm/v1/engine/core.py
Signed-off-by: Nick Hill <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
…ty-funcs # Conflicts: # vllm/v1/engine/__init__.py # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
@youkaichao this should be ready to go 🤞 |
@robertgshaw2-redhat can you help review this? I'm not familiar with this part of the code, but I will need it in #12987 🥺 |
This pull request has merge conflicts that must be resolved before it can be |
# Conflicts: # vllm/v1/engine/core.py
msgspec.convert(v, type=p.annotation) if isclass(p.annotation) | ||
and issubclass(p.annotation, msgspec.Struct) | ||
and not isinstance(v, p.annotation) else v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aesthetically, a helper function might clean up this code, i.e.
return tuple(msgspec.convert(v, type=p.annotation)
if needs_conversion(v,p) else v
for v, p in zip(args, arg_types))
however this is the engine core, perhaps multiple helper-function calls would be too costly.
vllm/v1/engine/core_client.py
Outdated
# Ensure that the outputs socket processing thread does not have | ||
# a ref to the client which prevents gc. | ||
output_socket = self.output_socket | ||
decoder = self.decoder | ||
utility_results = self.utility_results | ||
outputs_queue = self.outputs_queue | ||
|
||
(frame, ) = self.output_socket.recv_multipart(copy=False) | ||
return self.decoder.decode(frame.buffer) | ||
def process_outputs_socket(): | ||
while True: | ||
(frame, ) = output_socket.recv_multipart(copy=False) | ||
outputs = decoder.decode(frame.buffer) | ||
if outputs.utility_output: | ||
_process_utility_output(outputs.utility_output, | ||
utility_results) | ||
else: | ||
outputs_queue.put_nowait(outputs) | ||
|
||
# Process outputs from engine in separate thread. | ||
Thread(target=process_outputs_socket, daemon=True).start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a strong opinion, but this code to launch the engine output processing thread could go in a separate helper function.
@@ -236,6 +275,17 @@ def _send_input(self, request_type: EngineCoreRequestType, | |||
msg = (request_type.value, self.encoder.encode(request)) | |||
self.input_socket.send_multipart(msg, copy=False) | |||
|
|||
def _call_utility(self, method: str, *args, unary: bool = False) -> Any: | |||
call_id = uuid.uuid1().int >> 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you use call_id = uuid.uuid1().int >> 64
in at least two places, perhaps makes sense to have a new_call_id()
helper function?
method: str, | ||
*args, | ||
unary: bool = False) -> Any: | ||
call_id = uuid.uuid1().int >> 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second occurrence which could be replaced with a helper function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Nick, I left a little bit of feedback. Thanks for the PR!
Signed-off-by: Nick Hill <[email protected]>
Thanks for the great comments @afeldman-nm. I've addressed some of them. Re helper functions it's a bit subjective but I personally lean towards being more selective; the indirection can make the code less readable, especially when the logic itself is trivial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank for the great work!
…t#13060) Signed-off-by: Nick Hill <[email protected]>
…t#13060) Signed-off-by: Nick Hill <[email protected]>
…t#13060) Signed-off-by: Nick Hill <[email protected]>
…t#13060) Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Linkun Chen <[email protected]>
We now have a number of utility / "control" operations that need to be called on the engine (
add_lora
,profile
,sleep
,wakeup
, ...). It should be possible to call these in a synchronous manner to know when the operation is complete and whether it succeeded. Some operations in future may require results to be returned.These changes centralize the mechanism for doing this in V1, and add the result-returning part.