Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP, RFC] Production Stack on Ray Serve #195

Open
Hanchenli opened this issue Feb 27, 2025 · 4 comments
Open

[WIP, RFC] Production Stack on Ray Serve #195

Hanchenli opened this issue Feb 27, 2025 · 4 comments

Comments

@Hanchenli
Copy link
Collaborator

This is an issue explaining the upcoming Production Stack on Ray Serve structure.
The router will be a DeploymentHandle with fastAPI set as ingress for OpenAI API compatibility.

The inference nodes will each initialize with a subprocess running an vllm-lmcache OpenAI compatible server. The current design has the following functions:
report_status which returns the server status including model_name.
streaming_response which returns a streaming response to router for post request such as "v1/completion"

The session based routing will be implemented with a dictionary in the router.

@ApostaC ApostaC changed the title [WIP] Production Stack on Ray Serve [WIP, RFC] Production Stack on Ray Serve Feb 28, 2025
@Hanchenli
Copy link
Collaborator Author

Note that Ray Serve just open-sourced their LLM serving stack at here: https://github.com/ray-project/ray/tree/master/python/ray/llm. The design is similar but comes without smart routing. We will keep the original design but also try to reuse some code from the link.

@thousandhu
Copy link

Note that Ray Serve just open-sourced their LLM serving stack at here: https://github.com/ray-project/ray/tree/master/python/ray/llm. The design is similar but comes without smart routing. We will keep the original design but also try to reuse some code from the link.

hi @Hanchenli, do you have the roadmap or design doc for ray llm?

@gaocegege
Copy link
Collaborator

By the way, do we have a plan on SkyPilot?

@Hanchenli
Copy link
Collaborator Author

Not currently. We do have plan to support on Terraform to support multiple cloud.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants