Make markdownlint happy

Signed-off-by: Harry Mellor <[email protected]>
vllm-project · Jan 29, 2025 · c953ce2 · c953ce2
1 parent e8a26da
commit c953ce2
Show file tree

Hide file tree

Showing 12 changed files with 262 additions and 222 deletions.
diff --git a/.markdownlint.yaml b/.markdownlint.yaml
@@ -0,0 +1,5 @@
+MD013: false # line-length
+MD028: false # no-blanks-blockquote
+MD029: # ol-prefix
+  style: ordered
+MD033: false # no-inline-html
diff --git a/README.md b/README.md
@@ -1,13 +1,12 @@
 # vLLM Production Stack: reference stack for production vLLM deployment
 
-
 **vLLM Production Stack** project provides a reference implementation on how to build an inference stack on top of vLLM, which allows you to:
 
 - 🚀 Scale from single vLLM instance to distributed vLLM deployment without changing any application code
 - 💻 Monitor the  through a web dashboard
 - 😄 Enjoy the performance benefits brought by request routing and KV cache offloading
 
-## Latest News:
+## Latest News
 
 - 🔥 vLLM Production Stack is released! Checkout our [release blogs](https://blog.lmcache.ai/2025-01-21-stack-release) [01-22-2025]
 - ✨Join us at #production-stack channel of vLLM [slack](https://slack.vllm.ai/), LMCache [slack](https://join.slack.com/t/lmcacheworkspace/shared_invite/zt-2viziwhue-5Amprc9k5hcIdXT7XevTaQ), or fill out this [interest form](https://forms.gle/wSoeNpncmPVdXppg8) for a chat!
@@ -20,7 +19,6 @@ The stack is set up using [Helm](https://helm.sh/docs/), and contains the follow
 - **Request router**: Directs requests to appropriate backends based on routing keys or session IDs to maximize KV cache reuse.
 - **Observability stack**: monitors the metrics of the backends through [Prometheus](https://github.com/prometheus/prometheus) + [Grafana](https://grafana.com/)
 
-
  <img src="https://github.com/user-attachments/assets/8f05e7b9-0513-40a9-9ba9-2d3acca77c0c" alt="Architecture of the stack" width="800"/>
 
 ## Roadmap
@@ -42,6 +40,7 @@ We are actively working on this project and will release the following features
 ### Deployment
 
 vLLM Production Stack can be deployed via helm charts. Clone the repo to local and execute the following commands for a minimal deployment:
+
 ```bash
 git clone https://github.com/vllm-project/production-stack.git
 cd production-stack/
@@ -55,21 +54,18 @@ To validate the installation and and send query to the stack, refer to [this tut
 
 For more information about customizing the helm chart, please refer to [values.yaml](https://github.com/vllm-project/production-stack/blob/main/helm/values.yaml) and our other [tutorials](https://github.com/vllm-project/production-stack/tree/main/tutorials).
 
-
 ### Uninstall
 
 ```bash
 sudo helm uninstall vllm
 ```
 
-
 ## Grafana Dashboard
 
 ### Features
 
 The Grafana dashboard provides the following insights:
 
-
 1. **Available vLLM Instances**: Displays the number of healthy instances.
 2. **Request Latency Distribution**: Visualizes end-to-end request latency.
 3. **Time-to-First-Token (TTFT) Distribution**: Monitors response times for token generation.
@@ -98,7 +94,6 @@ The router ensures efficient request distribution among backends. It supports:
   - Session-ID based routing
   - (WIP) prefix-aware routing
 
-
 ## Contributing
 
 Contributions are welcome! Please follow the standard GitHub flow:
@@ -109,12 +104,12 @@ Contributions are welcome! Please follow the standard GitHub flow:
 
 We use `pre-commit` for formatting, it is installed as follows:
 
-```console
+```bash
 pip install -r requirements-lint.txt
 pre-commit install
 ```
 
-> You can read more about `pre-commit` at https://pre-commit.com.
+> You can read more about `pre-commit` at <https://pre-commit.com>.
 
 ## License
 

diff --git a/helm/README.md b/helm/README.md
@@ -2,14 +2,14 @@
 
 This helm chart lets users deploy multiple serving engines and a router into the Kubernetes cluster.
 
-## Key features:
+## Key features
 
 - Support running multiple serving engines with multiple different models
 - Load the model weights directly from the existing PersistentVolumes
 
 ## Prerequisites
 
-1. A running Kubernetes cluster with GPU. (You can set it up through `minikube`: https://minikube.sigs.k8s.io/docs/tutorials/nvidia/)
+1. A running Kubernetes cluster with GPU. (You can set it up through `minikube`: <https://minikube.sigs.k8s.io/docs/tutorials/nvidia/>)
 2. [Helm](https://helm.sh/docs/intro/install/)
 
 ## Install the helm chart

diff --git a/src/tests/README.md b/src/tests/README.md
@@ -15,6 +15,7 @@ MODEL = "meta-llama/Llama-3.1-8B-Instruct"
 ```
 
 Then, execute the following command in terminal:
+
 ```bash
 python3 test-openai.py
 ```
@@ -30,7 +31,7 @@ The `perftest/` folder contains the performance test scripts for the router. Spe
 - `run-server.sh` and `run-multi-server.sh`: launches one or multiple mock-up OpenAI API server
 - `clean-up.sh`: kills the mock-up OpenAI API server processes.
 
-### Example router performance test:
+### Example router performance test
 
 Here's an example setup of running the router performance test:
 

diff --git a/src/vllm_router/README.md b/src/vllm_router/README.md
@@ -53,7 +53,8 @@ docker build -t <image_name>:<tag> -f docker/Dockerfile .
 ## Example commands to run the router
 
 **Example 1:** running the router locally at port 8000 in front of multiple serving engines:
-```
+
+```bash
 python3 router.py --port 800 \
     --service-discovery static \
     --static-backends "http://localhost:9001,http://localhost:9002,http://localhost:9003" \

diff --git a/tutorials/00-install-kubernetes-env.md b/tutorials/00-install-kubernetes-env.md
@@ -4,8 +4,6 @@
 
 This tutorial guides you through the process of setting up a Kubernetes environment on a GPU-enabled server. We will install and configure `kubectl`, `helm`, and `minikube`, ensuring GPU compatibility for workloads requiring accelerated computing. By the end of this tutorial, you will have a fully functional Kubernetes environment ready for deploy the vLLM Production Stack.
 
----
-
 ## Table of Contents
 
 - [Introduction](#introduction)
@@ -17,8 +15,6 @@ This tutorial guides you through the process of setting up a Kubernetes environm
   - [Step 3: Installing Minikube with GPU Support](#step-3-installing-minikube-with-gpu-support)
   - [Step 4: Verifying GPU Configuration](#step-4-verifying-gpu-configuration)
 
----
-
 ## Prerequisites
 
 Before you begin, ensure the following:
@@ -35,8 +31,6 @@ Before you begin, ensure the following:
    - A Linux-based operating system (e.g., Ubuntu 20.04 or later).
    - Basic understanding of Linux shell commands.
 
----
-
 ## Steps
 
 ### Step 1: Installing kubectl
@@ -71,8 +65,6 @@ Before you begin, ensure the following:
    Client Version: v1.32.1
    ```
 
----
-
 ### Step 2: Installing Helm
 
 1. Execute the script `install-helm.sh`:
@@ -99,8 +91,6 @@ Before you begin, ensure the following:
    version.BuildInfo{Version:"v3.17.0", GitCommit:"301108edc7ac2a8ba79e4ebf5701b0b6ce6a31e4", GitTreeState:"clean", GoVersion:"go1.23.4"}
    ```
 
----
-
 ### Step 3: Installing Minikube with GPU Support
 
 1. Execute the script `install-minikube-cluster.sh`:
@@ -116,6 +106,7 @@ Before you begin, ensure the following:
 
 3. **Expected Output:**
    If everything goes smoothly, you should see the example output like following:
+
    ```plaintext
    😄  minikube v1.35.0 on Ubuntu 22.04 (kvm/amd64)
    ❗  minikube skips various validations when --force is supplied; this may lead to unexpected behavior
@@ -135,8 +126,6 @@ Before you begin, ensure the following:
    TEST SUITE: None
    ```
 
----
-
 ### Step 4: Verifying GPU Configuration
 
 1. Ensure Minikube is running:
@@ -145,7 +134,7 @@ Before you begin, ensure the following:
    sudo minikube status
    ```
 
-   Expected Output:
+   Expected output:
 
    ```plaintext
    minikube
@@ -162,7 +151,7 @@ Before you begin, ensure the following:
    sudo kubectl describe nodes | grep -i gpu
    ```
 
-   Expected Output:
+   Expected output:
 
    ```plaintext
      nvidia.com/gpu: 1
@@ -181,12 +170,12 @@ Before you begin, ensure the following:
    sudo kubectl logs gpu-test
    ```
 
-    You should see the nvidia-smi output from the terminal
----
+   You should see the nvidia-smi output from the terminal
 
 ## Conclusion
 
 By following this tutorial, you have successfully set up a Kubernetes environment with GPU support on your server. You are now ready to deploy and test vLLM Production Stack on Kubernetes. For further configuration and workload-specific setups, consult the official documentation for `kubectl`, `helm`, and `minikube`.
 
 What's next:
+
 - [01-minimal-helm-installation](https://github.com/vllm-project/production-stack/blob/main/tutorials/01-minimal-helm-installation.md)
diff --git a/tutorials/01-minimal-helm-installation.md b/tutorials/01-minimal-helm-installation.md
@@ -1,9 +1,11 @@
 # Tutorial: Minimal Setup of the vLLM Production Stack
 
 ## Introduction
+
 This tutorial guides you through a minimal setup of the vLLM Production Stack using one vLLM instance with the `facebook/opt-125m` model. By the end of this tutorial, you will have a working deployment of vLLM on a Kubernetes environment with GPU.
 
 ## Table of Contents
+
 - [Introduction](#introduction)
 - [Table of Contents](#table-of-contents)
 - [Prerequisites](#prerequisites)
@@ -12,11 +14,12 @@ This tutorial guides you through a minimal setup of the vLLM Production Stack us
   - [2. Validate Installation](#2-validate-installation)
   - [3. Send a Query to the Stack](#3-send-a-query-to-the-stack)
     - [3.1. Forward the Service Port](#31-forward-the-service-port)
-    - [3.2. Query the OpenAI-Compatible API](#32-query-the-openai-compatible-api)
+    - [3.2. Query the OpenAI-Compatible API to list the available models](#32-query-the-openai-compatible-api-to-list-the-available-models)
     - [3.3. Query the OpenAI Completion Endpoint](#33-query-the-openai-completion-endpoint)
   - [4. Uninstall](#4-uninstall)
 
 ## Prerequisites
+
 1. A Kubernetes environment with GPU support. If not set up, follow the [00-install-kubernetes-env](00-install-kubernetes-env.md) guide.
 2. Helm installed. Refer to the [install-helm.sh](install-helm.sh) script for instructions.
 3. kubectl installed. Refer to the [install-kubectl.sh](install-kubectl.sh) script for instructions.
@@ -27,7 +30,8 @@ This tutorial guides you through a minimal setup of the vLLM Production Stack us
 
 ### 1. Deploy vLLM Instance
 
-#### Step 1.1: Use Predefined Configuration
+#### 1.1: Use Predefined Configuration
+
 The vLLM Production Stack repository provides a predefined configuration file, `values-minimal-example.yaml`, located at `tutorials/assets/values-minimal-example.yaml`. This file contains the following content:
 
 ```yaml
@@ -48,6 +52,7 @@ servingEngineSpec:
 ```
 
 Explanation of the key fields:
+
 - **`modelSpec`**: Defines the model configuration, including:
   - `name`: A name for the model deployment.
   - `repository`: Docker repository hosting the model image.
@@ -58,47 +63,63 @@ Explanation of the key fields:
 - **`requestGPU`**: Specifies the number of GPUs required.
 - **`pvcStorage`**: Allocates persistent storage for the model.
 
-#### Step 1.2: Deploy the Helm Chart
+#### 1.2: Deploy the Helm Chart
+
 Deploy the Helm chart using the predefined configuration file:
+
 ```bash
 helm repo add vllm https://vllm-project.github.io/production-stack
 helm install vllm vllm/production-stack -f tutorials/assets/values-minimal-example.yaml
 ```
+
 Explanation of the command:
+
 - `vllm` in the first command: The Helm repository.
 - `vllm` in the second command: The name of the Helm release.
 - `-f tutorials/assets/values-minimal-example.yaml`: Specifies the predefined configuration file.
 
 ### 2. Validate Installation
 
-#### Step 2.1: Monitor Deployment Status
+#### 2.1: Monitor Deployment Status
+
 Monitor the deployment status using:
+
 ```bash
 sudo kubectl get pods
 ```
+
 Expected output:
+
 - Pods for the `vllm` deployment should transition to `Ready` and the `Running` state.
-```
+
+```plaintext
 NAME                                               READY   STATUS    RESTARTS   AGE
 vllm-deployment-router-859d8fb668-2x2b7        1/1     Running   0          2m38s
 vllm-opt125m-deployment-vllm-84dfc9bd7-vb9bs   1/1     Running   0          2m38s
 ```
+
 _Note_: It may take some time for the containers to download the Docker images and LLM weights.
 
 ### 3. Send a Query to the Stack
 
-#### Step 3.1: Forward the Service Port
+#### 3.1: Forward the Service Port
+
 Expose the `vllm-router-service` port to the host machine:
+
 ```bash
 sudo kubectl port-forward svc/vllm-router-service 30080:80
 ```
 
-#### Step 3.2: Query the OpenAI-Compatible API to list the available models
+#### 3.2: Query the OpenAI-Compatible API to list the available models
+
 Test the stack's OpenAI-compatible API by querying the available models:
+
 ```bash
 curl -o- http://localhost:30080/models
 ```
+
 Expected output:
+
 ```json
 {
   "object": "list",
@@ -114,8 +135,10 @@ Expected output:
 }
 ```
 
-#### Step 3.3: Query the OpenAI Completion Endpoint
+#### 3.3: Query the OpenAI Completion Endpoint
+
 Send a query to the OpenAI `/completion` endpoint to generate a completion for a prompt:
+
 ```bash
 curl -X POST http://localhost:30080/completions \
   -H "Content-Type: application/json" \
@@ -125,7 +148,9 @@ curl -X POST http://localhost:30080/completions \
     "max_tokens": 10
   }'
 ```
+
 Expected output:
+
 ```json
 {
   "id": "completion-id",
@@ -141,11 +166,13 @@ Expected output:
   ]
 }
 ```
+
 This demonstrates the model generating a continuation for the provided prompt.
 
 ### 4. Uninstall
 
 To remove the deployment, run:
+
 ```bash
 sudo helm uninstall vllm
 ```