Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Update aibrix documentation #533

Merged
merged 3 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions docs/source/development/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,45 @@
===========
Development
===========

Build and Run
-------------

We encourage contributors to build and test aibrix on Local dev environment for most of the cases.
If you use Macbook, `Docker for Desktop <https://www.docker.com/products/docker-desktop/>`_ is the most convenient tool to use.

Following commands will build ``nightly`` docker images.

.. code-block:: bash

make docker-build-all

Run following command to quickly deploy the latest code changes to your dev kubernetes environment.

.. code-block:: bash

kubectl create -f config/dependency
kubectl create -f config/default


If you want to clean up everything and reinstall the latest code

.. code-block:: bash

kubectl delete -f config/default
kubectl delete -f config/dependency

Mocked CPU App
--------------

In order to run the control plane and data plane e2e in development environments, we build a mocked app to mock a model server.
Now, it supports basic model inference, metrics and lora feature. Feel free to enrich the features. Check ``development`` folder for more details.


Test on GPU Cluster
-------------------

If you need to test the model in real GPU environment, we highly recommended `Lambda Labs <https://lambdalabs.com/>`_ platform to install and test kind based deployment.

.. attention::
Kind itself doesn't support GPU yet. In order to use the kind version with GPU support, feel free to checkout `nvkind <https://github.com/klueska/nvkind>`_.
29 changes: 14 additions & 15 deletions docs/source/development/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,17 @@ Release
This process outlines the steps required to create and publish a release for AIBrix Github project.
Follow these steps to ensure a smooth and consistent release cycle.

1. Prepare the code
-----------------------------
Prepare the code
----------------

Option 1 RC version release
^^^^^^^^^^^^^^^^^^^^^^^^^^^

For RC release like ``v0.2.0-rc.1``, there's no need to checkout a new branch, Let's cut the tag & release
directly against ``main`` branch.

Option 1 minor version release

Option 2 minor version release
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For new minor version release like ``v0.1.0``, please checkout a new branch named ``release-0.1``.
Expand All @@ -25,24 +32,16 @@ For new minor version release like ``v0.1.0``, please checkout a new branch name
git push origin release-0.1

.. note::
If origin doesn't points to upstream, let's say you fork the remote, ``upstream`` or other remotes should be right remote to push to.
Here we assume ``origin`` points to upstream, if it doesn't, other remotes like ``upstream`` should be right remote to push to.

Option 2: patch version release
Option 3: patch version release
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Cut a PR to sync `main` branch changes to `release-0.1`, a example PR is like `Sync main branch changes to release-0.1 for rc4 release <https://github.com/aibrix/aibrix/pull/312>`_
Bug fixes should be merged on ``main`` first. Then cherry-pick the bugfix to target release branch like ``release-0.1``.
Due to ``main`` changes, the fix may not able to be cherry-picked to ``release-0.1``. If that's the case, cut PR to release branch directly.
For patch version like ``v0.1.1``, please reuse the release branch ``release-0.1``, it should be created earlier from the minor version release.
for patch release, we do not rebase ``main`` because it will introduce new features. All fixes have to be cherry-picked or cut PR against ``release-0.1`` directly.

.. code-block:: bash

git checkout release-0.1
git fetch origin
git rebase origin/release-0.1

# not need to push, it should be update to date.


Cut a PR
--------

Expand Down
2 changes: 1 addition & 1 deletion docs/source/features/autoscaling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ In the following sections, we will demonstrate how users can create various type
KPA Autoscaler
--------------

The KPA, inspired by Knative, maintains two time windows: a longer "stable window" and a shorter "panic window". It rapidly scales up resources in response to sudden spikes in traffic based on the panic window measurements.
The KPA, inspired by Knative, maintains two time windows: a longer ``stable window`` and a shorter ``panic window``. It rapidly scales up resources in response to sudden spikes in traffic based on the panic window measurements.

Unlike other solutions that might rely on Prometheus for gathering deployment metrics, AIBrix fetches and maintains metrics internally, enabling faster response times.

Expand Down
9 changes: 5 additions & 4 deletions docs/source/features/gateway-plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,14 +53,15 @@ To set up rate limiting, add the user header in the request, like this:
Routing Strategies
------------------

Gateway supports two routing strategies right now.
1. least-request: routes request to a pod with least ongoing request.
2. throughput: routes request to a pod which has processed lowest tokens.
Gateway supports three routing strategies right now.

* random: routes request to a random pod.
* least-request: routes request to a pod with least ongoing request.
* throughput: routes request to a pod which has processed lowest tokens.

.. code-block:: bash

curl -v http://localhost:8888/v1/chat/completions \
-H "user: your-user-name" \
-H "routing-strategy: least-request" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer any_key" \
Expand Down
33 changes: 19 additions & 14 deletions docs/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,11 @@ Stable Version

.. code:: bash

kubectl apply -f https://github.com/aibrix/aibrix/releases/download/v0.1.1/aibrix-dependency-v0.2.0-rc.1.yaml
kubectl apply -f https://github.com/aibrix/aibrix/releases/download/v0.1.1/aibrix-core-v0.2.0-rc.1.yaml
# Install component dependencies
kubectl apply -f https://github.com/aibrix/aibrix/releases/download/v0.2.0-rc.1/aibrix-dependency-v0.2.0-rc.1.yaml

# Install aibrix components
kubectl apply -f https://github.com/aibrix/aibrix/releases/download/v0.2.0-rc.1/aibrix-core-v0.2.0-rc.1.yaml


Nightly Version
Expand All @@ -37,33 +40,35 @@ Nightly Version

# Install component dependencies
kubectl create -k config/dependency

# Install aibrix components
kubectl create -k config/default


Install Individual AIBrix Components
------------------------------------


Autoscaler
^^^^^^^^^^

.. code:: bash

# autoscaler controller
kubectl apply -k config/standalone/autoscaler-controller/

# distributed inference orchestrations controller

Distributed Inference
^^^^^^^^^^^^^^^^^^^^^

.. code:: bash

kubectl apply -k config/standalone/distributed-inference-controller/

# model adapter controller
kubectl apply -k config/standalone/model-adapter-controller


Model Adapter(Lora)
^^^^^^^^^^^^^^^^^^^

Install AIBrix on Kind Cluster
------------------------------
.. code:: bash

.. attention::
Kind itself doesn't support GPU yet. In order to use the kind version with GPU support, feel free to checkout `nvkind <https://github.com/klueska/nvkind>`_.
kubectl apply -k config/standalone/model-adapter-controller

We use `Lambda Labs <https://lambdalabs.com/>`_ platform to install and test kind based deployment.

TODO
53 changes: 26 additions & 27 deletions docs/source/getting_started/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,28 @@ Quickstart
Install AIBrix
^^^^^^^^^^^^^^

Get your kubernetes cluster ready, run following commands to install aibrix components in your cluster.

.. note::
If following way doesn't work for you, please check installation guidance for more installation options.
If you just want to install specific components or specific version, please check installation guidance for more installation options.

.. code-block:: bash

kubectl apply -f https://github.com/aibrix/aibrix/releases/download/v0.2.0-rc.1/aibrix-dependency-v0.2.0-rc.1.yaml
kubectl apply -f https://github.com/aibrix/aibrix/releases/download/v0.2.0-rc.1/aibrix-core-v0.2.0-rc.1.yaml

Wait for few minutes and run `kubectl get pods -n aibrix-system` to check pod status util they are ready.

.. code-block:: bash

NAME READY STATUS RESTARTS AGE
aibrix-controller-manager-56576666d6-gsl8s 1/1 Running 0 5h24m
aibrix-gateway-plugins-c6cb7545-r4xwj 1/1 Running 0 5h24m
aibrix-gpu-optimizer-89b9d9895-t8wnq 1/1 Running 0 5h24m
aibrix-kuberay-operator-6dcf94b49f-l4522 1/1 Running 0 5h24m
aibrix-metadata-service-6b4d44d5bd-h5g2r 1/1 Running 0 5h24m
aibrix-redis-master-84769768cb-fsq45 1/1 Running 0 5h24m


Deploy base model
^^^^^^^^^^^^^^^^^
Expand All @@ -28,16 +42,16 @@ Save yaml as `deployment.yaml` and run `kubectl apply -f deployment.yaml`.
metadata:
labels:
# Note: The label value `model.aibrix.ai/name` here must match with the service name.
model.aibrix.ai/name: llama-2-7b-hf
model.aibrix.ai/name: qwen25-7b-Instruct
model.aibrix.ai/port: "8000"
adapter.model.aibrix.ai/enabled: true
name: llama-2-7b-hf
name: qwen25-7b-Instruct
namespace: default
spec:
replicas: 1
selector:
matchLabels:
model.aibrix.ai/name: llama-2-7b-hf
model.aibrix.ai/name: qwen25-7b-Instruct
strategy:
rollingUpdate:
maxSurge: 25%
Expand All @@ -46,7 +60,7 @@ Save yaml as `deployment.yaml` and run `kubectl apply -f deployment.yaml`.
template:
metadata:
labels:
model.aibrix.ai/name: llama-2-7b-hf
model.aibrix.ai/name: qwen25-7b-Instruct
spec:
containers:
- command:
Expand All @@ -58,10 +72,10 @@ Save yaml as `deployment.yaml` and run `kubectl apply -f deployment.yaml`.
- --port
- "8000"
- --model
- meta-llama/Llama-2-7b-hf
- Qwen/Qwen2.5-7B-Instruct
- --served-model-name
# Note: The `--served-model-name` argument value must also match the Service name and the Deployment label `model.aibrix.ai/name`
- llama-2-7b-hf
- qwen25-7b-Instruct
- --trust-remote-code
- --enable-lora
env:
Expand Down Expand Up @@ -116,12 +130,12 @@ Save yaml as `service.yaml` and run `kubectl apply -f service.yaml`.
metadata:
labels:
# Note: The Service name must match the label value `model.aibrix.ai/name` in the Deployment
model.aibrix.ai/name: llama-2-7b-hf
model.aibrix.ai/name: qwen25-7b-Instruct
prometheus-discovery: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
name: llama-2-7b-hf
name: qwen25-7b-Instruct
namespace: default
spec:
ports:
Expand All @@ -134,7 +148,7 @@ Save yaml as `service.yaml` and run `kubectl apply -f service.yaml`.
protocol: TCP
targetPort: 8080
selector:
model.aibrix.ai/name: llama-2-7b-hf
model.aibrix.ai/name: qwen25-7b-Instruct
type: ClusterIP

.. note::
Expand All @@ -145,20 +159,6 @@ Save yaml as `service.yaml` and run `kubectl apply -f service.yaml`.
2. The `--served-model-name` argument value in the `Deployment` command is also consistent with the `Service` name and `model.aibrix.ai/name` label.


Register a user to authenticate the gateway
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

kubectl -n aibrix-system port-forward svc/aibrix-gateway-users 8090:8090

.. code-block:: bash

curl http://localhost:8090/CreateUser \
-H "Content-Type: application/json" \
-d '{"name": "test-user","rpm": 100,"tpm": 10000}'



Invoke the model endpoint using gateway api
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -174,10 +174,9 @@ Invoke the model endpoint using gateway api

curl -v http://localhost:8888/v1/completions \
-H "Content-Type: application/json" \
-H "user: test-user" \
-H "model: meta-llama/Llama-2-7b-hf" \
-H "model: qwen25-7b-Instruct" \
-d '{
"model": "meta-llama/llama-2-7b-hf",
"model": "qwen25-7b-Instruct",
"prompt": "San Francisco is a",
"max_tokens": 128,
"temperature": 0
Expand Down