[WIP, RFC] Production Stack on Ray Serve #195

Hanchenli · 2025-02-27T18:57:28Z

This is an issue explaining the upcoming Production Stack on Ray Serve structure.
The router will be a DeploymentHandle with fastAPI set as ingress for OpenAI API compatibility.

The inference nodes will each initialize with a subprocess running an vllm-lmcache OpenAI compatible server. The current design has the following functions:
report_status which returns the server status including model_name.
streaming_response which returns a streaming response to router for post request such as "v1/completion"

The session based routing will be implemented with a dictionary in the router.

Hanchenli · 2025-03-01T06:52:13Z

Note that Ray Serve just open-sourced their LLM serving stack at here: https://github.com/ray-project/ray/tree/master/python/ray/llm. The design is similar but comes without smart routing. We will keep the original design but also try to reuse some code from the link.

thousandhu · 2025-03-04T06:59:23Z

Note that Ray Serve just open-sourced their LLM serving stack at here: https://github.com/ray-project/ray/tree/master/python/ray/llm. The design is similar but comes without smart routing. We will keep the original design but also try to reuse some code from the link.

hi @Hanchenli, do you have the roadmap or design doc for ray llm?

gaocegege · 2025-03-04T07:02:17Z

By the way, do we have a plan on SkyPilot?

Hanchenli · 2025-03-05T20:30:21Z

Not currently. We do have plan to support on Terraform to support multiple cloud.

Hanchenli mentioned this issue Feb 27, 2025

[Roadmap] vLLM production stack roadmap for 2025 Q1 #26

Open

17 tasks

ApostaC added the discussion label Feb 28, 2025

ApostaC changed the title ~~[WIP] Production Stack on Ray Serve~~ [WIP, RFC] Production Stack on Ray Serve Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP, RFC] Production Stack on Ray Serve #195

[WIP, RFC] Production Stack on Ray Serve #195

Hanchenli commented Feb 27, 2025

Hanchenli commented Mar 1, 2025

thousandhu commented Mar 4, 2025

gaocegege commented Mar 4, 2025

Hanchenli commented Mar 5, 2025

[WIP, RFC] Production Stack on Ray Serve #195

[WIP, RFC] Production Stack on Ray Serve #195

Comments

Hanchenli commented Feb 27, 2025

Hanchenli commented Mar 1, 2025

thousandhu commented Mar 4, 2025

gaocegege commented Mar 4, 2025

Hanchenli commented Mar 5, 2025