You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current example shows how to run gemlite backend with the LLM class by applying the patch in the same process. However, this approach doesn't work if a user wants to run the openai compatible api server of vLLM with the MQLLMEngine which is generally more suitable for production loads.
To use that engine with the openai api server we need to directly patch the vLLM engine.py the reason for this is that it is using the spawn method to create a child process here.
Here is my simple script to apply the patch into the engine.py. I am not sure how you would like to incorporate this but wanted to share.
Thanks Kerem! We internally use vllm with ray via LLM, but this could be useful for people using via the openai api server indeed, unless they do it manually in engine.py !
Maybe we can put it in examples/vllm_opanaiserver.py or something?
Sounds good, if you are fine with it to be added as an example I can do that and maybe we can add a little note in the readme for those who want to use the api server in vLLM. By the way is there a reason for you to prefer ray? Native vLLM fastapi server had been working fine for us so far but would love to learn more about ray's advantages. Thanks!
It's because we support different backends, not just vllm, since we also need to run other non-llm models.
We have the SDK code open-source by the way: https://github.com/mobiusml/aana_sdk
Congrats on the vLLM update!
The current example shows how to run gemlite backend with the
LLM
class by applying the patch in the same process. However, this approach doesn't work if a user wants to run the openai compatible api server of vLLM with theMQLLMEngine
which is generally more suitable for production loads.To use that engine with the openai api server we need to directly patch the vLLM engine.py the reason for this is that it is using the
spawn
method to create a child process here.Here is my simple script to apply the patch into the
engine.py
. I am not sure how you would like to incorporate this but wanted to share.The text was updated successfully, but these errors were encountered: