Skip to content

Latest commit

 

History

History
39 lines (25 loc) · 1.87 KB

README.md

File metadata and controls

39 lines (25 loc) · 1.87 KB

vLLM Production Stack Tutorials

Welcome to the tutorials for vLLM Production Stack! This series of tutorials is designed to guide you through setting up and utilizing the vLLM production stack efficiently. Whether you're new to Kubernetes, Helm, or vLLM, or looking to deepen your understanding of advanced features like multi-model management and KV cache offloading, this series has you covered.

Table of Contents

  1. Install Kubernetes Environment Learn how to set up a Kubernetes environment as the foundation for running vLLM Production Stack.

  2. Minimal Helm Installation A step-by-step guide for deploying vLLM Production Stack using Helm with minimal configuration.

  3. Basic vLLM Configuration Learn how to customize vLLM options when using vLLM Production Stack.

  4. Load Model from Persistent Volume Discover how to load models from a persistent volume to ensure efficient resource usage.

  5. Launch Multiple Models Learn how to deploy and manage multiple models simultaneously in your vLLM environment.

  6. Offload KV Cache Understand how to offload the KV cache to CPU to improve the performance in production use cases.

Getting Started

These tutorials are designed to be followed sequentially for beginners, but you can also jump to a specific tutorial based on your needs. Each tutorial includes:

  • Prerequisites
  • Detailed steps
  • Commands to execute
  • Expected outputs
  • Explanations to enhance your understanding

Feedback and Contributions

If you encounter any issues or have suggestions for improving these tutorials, feel free to contribute by opening a pull request or an issue on our GitHub repository.

Happy learning!