Skip to content

Commit

Permalink
Add pre-commit based linting and formatting (vllm-project#35)
Browse files Browse the repository at this point in the history
* Add pre-commit workflow

* Add actionlint

* Add generic hooks

* Add black, isort, shellcheck

* Add requirements and markdown linting

* Add toml

* Add Dockerfile

* Add codespell

* Use Node.js version of `markdownlint`

* Add `requirements-lint.txt`

* Use CLI version of Node.js `markdownlint`

* Add `pre-commit` instructions to `Contributing`

* `pre-commit run -a` automatic fixes

* Exclude helm templates from `check-yaml`

* Comment hooks that require installed tools

* Make `codespell` happy

* Make `actionlint` happy

* Disable `shellcheck` until it can be installed properly

* Make `markdownlint` happy

* Add note about running pre-commit

---------

Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: 0xThresh.eth <[email protected]>
  • Loading branch information
hmellor authored and 0xThresh committed Jan 30, 2025
1 parent aca24d6 commit 3246edd
Show file tree
Hide file tree
Showing 54 changed files with 993 additions and 671 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: '3.8'

Expand Down
1 change: 0 additions & 1 deletion .github/workflows/helm-lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,3 @@ jobs:
- name: Lint open-webui Helm Chart
run: |
helm lint ./helm
5 changes: 2 additions & 3 deletions .github/workflows/helm-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
git config user.name "$GITHUB_ACTOR"
git config user.email "[email protected]"
# Could add Prometheus as a dependent chart here if desired
# Could add Prometheus as a dependent chart here if desired
# - name: Add Dependency Repos
# run: |
# helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
Expand Down Expand Up @@ -52,6 +52,5 @@ jobs:
break
fi
REPO=$(echo '${{ github.repository }}' | tr '[:upper:]' '[:lower:]')
helm push "${pkg}" oci://ghcr.io/$REPO
helm push "${pkg}" "oci://ghcr.io/$REPO"
done
17 changes: 17 additions & 0 deletions .github/workflows/matchers/actionlint.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"problemMatcher": [
{
"owner": "actionlint",
"pattern": [
{
"regexp": "^(?:\\x1b\\[\\d+m)?(.+?)(?:\\x1b\\[\\d+m)*:(?:\\x1b\\[\\d+m)*(\\d+)(?:\\x1b\\[\\d+m)*:(?:\\x1b\\[\\d+m)*(\\d+)(?:\\x1b\\[\\d+m)*: (?:\\x1b\\[\\d+m)*(.+?)(?:\\x1b\\[\\d+m)* \\[(.+?)\\]$",
"file": 1,
"line": 2,
"column": 3,
"message": 4,
"code": 5
}
]
}
]
}
17 changes: 17 additions & 0 deletions .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: pre-commit

on:
pull_request:
push:
branches: [main]

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
with:
python-version: "3.12"
- run: echo "::add-matcher::.github/workflows/matchers/actionlint.json"
- uses: pre-commit/action@2c7b3805fd2a0fd8c1884dcaebf91fc102a13ecd # v3.0.1
5 changes: 5 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
MD013: false # line-length
MD028: false # no-blanks-blockquote
MD029: # ol-prefix
style: ordered
MD033: false # no-inline-html
45 changes: 45 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
repos:
- repo: https://github.com/rhysd/actionlint
rev: v1.7.7
hooks:
- id: actionlint
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: check-json
- id: check-toml
- id: check-yaml
exclude: ^helm/templates/
- id: end-of-file-fixer
- id: requirements-txt-fixer
- id: trailing-whitespace
# TODO: Enable these hooks when environment issues are resolved
# - repo: https://github.com/hadolint/hadolint
# rev: v2.12.0
# hooks:
# - id: hadolint
# - repo: https://github.com/gruntwork-io/pre-commit
# rev: v0.1.25
# hooks:
# - id: helmlint
- repo: https://github.com/psf/black
rev: '25.1.0'
hooks:
- id: black
- repo: https://github.com/pycqa/isort
rev: '6.0.0'
hooks:
- id: isort
# TODO: Enable this hook when environment issues are resolved
# - repo: https://github.com/koalaman/shellcheck-precommit
# rev: v0.10.0
# hooks:
# - id: shellcheck
- repo: https://github.com/igorshubovych/markdownlint-cli
rev: v0.44.0
hooks:
- id: markdownlint
- repo: https://github.com/codespell-project/codespell
rev: v2.4.1
hooks:
- id: codespell
27 changes: 18 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# vLLM Production Stack: reference stack for production vLLM deployment

# vLLM Production Stack: reference stack for production vLLM deployment

**vLLM Production Stack** project provides a reference implementation on how to build an inference stack on top of vLLM, which allows you to:

- 🚀 Scale from single vLLM instance to distributed vLLM deployment without changing any application code
- 💻 Monitor the through a web dashboard
- 😄 Enjoy the performance benefits brought by request routing and KV cache offloading

## Latest News:
## Latest News

- 🔥 vLLM Production Stack is released! Checkout our [release blogs](https://blog.lmcache.ai/2025-01-21-stack-release) [01-22-2025]
- ✨Join us at #production-stack channel of vLLM [slack](https://slack.vllm.ai/), LMCache [slack](https://join.slack.com/t/lmcacheworkspace/shared_invite/zt-2viziwhue-5Amprc9k5hcIdXT7XevTaQ), or fill out this [interest form](https://forms.gle/wSoeNpncmPVdXppg8) for a chat!
Expand All @@ -20,7 +19,6 @@ The stack is set up using [Helm](https://helm.sh/docs/), and contains the follow
- **Request router**: Directs requests to appropriate backends based on routing keys or session IDs to maximize KV cache reuse.
- **Observability stack**: monitors the metrics of the backends through [Prometheus](https://github.com/prometheus/prometheus) + [Grafana](https://grafana.com/)


<img src="https://github.com/user-attachments/assets/8f05e7b9-0513-40a9-9ba9-2d3acca77c0c" alt="Architecture of the stack" width="800"/>

## Roadmap
Expand All @@ -42,6 +40,7 @@ We are actively working on this project and will release the following features
### Deployment

vLLM Production Stack can be deployed via helm charts. Clone the repo to local and execute the following commands for a minimal deployment:

```bash
git clone https://github.com/vllm-project/production-stack.git
cd production-stack/
Expand All @@ -55,21 +54,18 @@ To validate the installation and and send query to the stack, refer to [this tut

For more information about customizing the helm chart, please refer to [values.yaml](https://github.com/vllm-project/production-stack/blob/main/helm/values.yaml) and our other [tutorials](https://github.com/vllm-project/production-stack/tree/main/tutorials).


### Uninstall

```bash
sudo helm uninstall vllm
```


## Grafana Dashboard

### Features

The Grafana dashboard provides the following insights:


1. **Available vLLM Instances**: Displays the number of healthy instances.
2. **Request Latency Distribution**: Visualizes end-to-end request latency.
3. **Time-to-First-Token (TTFT) Distribution**: Monitors response times for token generation.
Expand Down Expand Up @@ -98,7 +94,6 @@ The router ensures efficient request distribution among backends. It supports:
- Session-ID based routing
- (WIP) prefix-aware routing


## Contributing

Contributions are welcome! Please follow the standard GitHub flow:
Expand All @@ -107,11 +102,25 @@ Contributions are welcome! Please follow the standard GitHub flow:
2. Create a feature branch.
3. Submit a pull request with detailed descriptions.

We use `pre-commit` for formatting, it is installed as follows:

```bash
pip install -r requirements-lint.txt
pre-commit install
```

It will run automatically before every commit. You cana also run it manually on all files with:

```bash
pre-commit run --all-files
```

> You can read more about `pre-commit` at <https://pre-commit.com>.
## License

This project is licensed under the MIT License. See the `LICENSE` file for details.

---

For any issues or questions, feel free to open an issue or contact the maintainers.

6 changes: 3 additions & 3 deletions helm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

This helm chart lets users deploy multiple serving engines and a router into the Kubernetes cluster.

## Key features:
## Key features

- Support running multiple serving engines with multiple different models
- Load the model weights directly from the existing PersistentVolumes
- Load the model weights directly from the existing PersistentVolumes

## Prerequisites

1. A running Kubernetes cluster with GPU. (You can set it up through `minikube`: https://minikube.sigs.k8s.io/docs/tutorials/nvidia/)
1. A running Kubernetes cluster with GPU. (You can set it up through `minikube`: <https://minikube.sigs.k8s.io/docs/tutorials/nvidia/>)
2. [Helm](https://helm.sh/docs/intro/install/)

## Install the helm chart
Expand Down
2 changes: 1 addition & 1 deletion helm/ct.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
chart-dirs:
- charts
validate-maintainers: false
validate-maintainers: false
2 changes: 1 addition & 1 deletion helm/lintconf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,4 @@ rules:
type: unix
trailing-spaces: enable
truthy:
level: warning
level: warning
10 changes: 5 additions & 5 deletions helm/templates/deployment-vllm-multi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ spec:
value: /data
{{- if $modelSpec.hf_token }}
- name: HF_TOKEN
valueFrom:
valueFrom:
secretKeyRef:
name: {{ .Release.Name }}-secrets
key: hf_token_{{ $modelSpec.name }}
Expand All @@ -89,7 +89,7 @@ spec:
value: "{{ $modelSpec.lmcacheConfig.cpuOffloadingBufferSize }}"
{{- end }}
{{- if $modelSpec.lmcacheConfig.diskOffloadingBufferSize }}
- name: LMCACHE_LOCAL_DISK
- name: LMCACHE_LOCAL_DISK
value: "True"
- name: LMCACHE_MAX_LOCAL_DISK_SIZE
value: "{{ $modelSpec.lmcacheConfig.diskOffloadingBufferSize }}"
Expand All @@ -99,7 +99,7 @@ spec:
envFrom:
- configMapRef:
name: "{{ .Release.Name }}-configs"
{{- end }}
{{- end }}
ports:
- name: {{ include "chart.container-port-name" . }}
containerPort: {{ include "chart.container-port" . }}
Expand All @@ -123,7 +123,7 @@ spec:

{{- if .Values.servingEngineSpec.runtimeClassName }}
runtimeClassName: nvidia
{{- end }}
{{- end }}
{{- if $modelSpec.nodeSelectorTerms}}
affinity:
nodeAffinity:
Expand All @@ -132,7 +132,7 @@ spec:
{{- with $modelSpec.nodeSelectorTerms }}
{{- toYaml . | nindent 12 }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}
---
{{- end }}
1 change: 0 additions & 1 deletion helm/templates/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,3 @@ rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods"]
verbs: ["get", "watch", "list"]

1 change: 0 additions & 1 deletion helm/templates/serviceaccount.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,3 @@ kind: ServiceAccount
metadata:
name: "{{ .Release.Name }}-router-service-account"
namespace: {{ .Release.Namespace }}

2 changes: 1 addition & 1 deletion helm/test.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
#helm upgrade --install --create-namespace --namespace=ns-vllm test-vllm . -f values-yihua.yaml
#helm upgrade --install --create-namespace --namespace=ns-vllm test-vllm . -f values-yihua.yaml
helm upgrade --install test-vllm . -f values-additional.yaml #--create-namespace --namespace=vllm
3 changes: 1 addition & 2 deletions helm/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@
}
}
},
"runtimeClassName": {
"runtimeClassName": {
"type": "string"
}
}
Expand Down Expand Up @@ -170,4 +170,3 @@
}
}
}

20 changes: 10 additions & 10 deletions helm/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,13 @@ servingEngineSpec:
#
# requestCPU: 10
# requestMemory: "64Gi"
# requestGPU: 1
# requestGPU: 1
#
# pvcStorage: "50Gi"
# pvcMatchLabels:
# pvcMatchLabels:
# model: "mistral"
#
# vllmConfig:
# vllmConfig:
# enableChunkedPrefill: false
# enablePrefixCaching: false
# maxModelLen: 16384
Expand All @@ -80,14 +80,14 @@ servingEngineSpec:
# - "NVIDIA-RTX-A6000"
modelSpec: []

# -- Container port
# -- Container port
containerPort: 8000
# -- Service port
# -- Service port
servicePort: 80

# -- Set other environment variables from config map
configs: {}

# -- Readiness probe configuration
startupProbe:
# -- Number of seconds after the container has started before startup probe is initiated
Expand All @@ -102,7 +102,7 @@ servingEngineSpec:
path: /health
# -- Name or number of the port to access on the container, on which the server is listening
port: 8000

# -- Liveness probe configuration
livenessProbe:
# -- Number of seconds after the container has started before liveness probe is initiated
Expand All @@ -117,7 +117,7 @@ servingEngineSpec:
path: /health
# -- Name or number of the port to access on the container, on which the server is listening
port: 8000

# -- Disruption Budget Configuration
maxUnavailablePodDisruptionBudget: ""

Expand All @@ -135,7 +135,7 @@ servingEngineSpec:
routerSpec:
# -- Number of replicas
replicaCount: 1

# -- Container port
containerPort: 8000

Expand Down
2 changes: 1 addition & 1 deletion observability/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

## Deploy the observability stack

The observability stack is based on [kube-prom-stack](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/README.md).
The observability stack is based on [kube-prom-stack](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/README.md).

To launch the observability stack:

Expand Down
1 change: 0 additions & 1 deletion observability/upgrade.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
helm upgrade kube-prom-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
-f "values.yaml"

Loading

0 comments on commit 3246edd

Please sign in to comment.