Skip to content

Commit

Permalink
[Fix] Minor Fixs for Tutorial and Bumped version to 0.0.9 (#154)
Browse files Browse the repository at this point in the history
* Updated AWS, bumped release version

Signed-off-by: hanchenli <[email protected]>

* minor fix to readme in aws

Signed-off-by: hanchenli <[email protected]>

* minor fix to readme in aws

Signed-off-by: hanchenli <[email protected]>

* fix format

Signed-off-by: hanchenli <[email protected]>

* fix folder

Signed-off-by: hanchenli <[email protected]>

---------

Signed-off-by: hanchenli <[email protected]>
  • Loading branch information
Hanchenli authored Feb 19, 2025
1 parent a5717ac commit 4c3aeef
Show file tree
Hide file tree
Showing 8 changed files with 65 additions and 64 deletions.
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,12 @@
## Step-By-Step Tutorials

0. How To [*Install Kubernetes (kubectl, helm, minikube, etc)*](https://github.com/vllm-project/production-stack/blob/main/tutorials/00-install-kubernetes-env.md)?
1. How To [*Setup a Minimal vLLM Production Stack*](https://github.com/vllm-project/production-stack/blob/main/tutorials/01-minimal-helm-installation.md)?
2. How To [*Customize vLLM Configs (optional)*](https://github.com/vllm-project/production-stack/blob/main/tutorials/02-basic-vllm-config.md)?
3. How to [*Load Your LLM Weights*](https://github.com/vllm-project/production-stack/blob/main/tutorials/03-load-model-from-pv.md)?
4. How to [*Launch Different LLMs in vLLM Production Stack*](https://github.com/vllm-project/production-stack/blob/main/tutorials/04-launch-multiple-model.md)?
5. How to [*Enable KV Cache Offloading with LMCache*](https://github.com/vllm-project/production-stack/blob/main/tutorials/05-offload-kv-cache.md)?
1. How to [*Deploy Production Stack on Major Cloud Platforms (AWS, GCP, Azure)*](https://github.com/vllm-project/production-stack/blob/main/tutorials/cloud_deployments)?
2. How To [*Setup a Minimal vLLM Production Stack*](https://github.com/vllm-project/production-stack/blob/main/tutorials/01-minimal-helm-installation.md)?
3. How To [*Customize vLLM Configs (optional)*](https://github.com/vllm-project/production-stack/blob/main/tutorials/02-basic-vllm-config.md)?
4. How to [*Load Your LLM Weights*](https://github.com/vllm-project/production-stack/blob/main/tutorials/03-load-model-from-pv.md)?
5. How to [*Launch Different LLMs in vLLM Production Stack*](https://github.com/vllm-project/production-stack/blob/main/tutorials/04-launch-multiple-model.md)?
6. How to [*Enable KV Cache Offloading with LMCache*](https://github.com/vllm-project/production-stack/blob/main/tutorials/05-offload-kv-cache.md)?

## Architecture

Expand Down
6 changes: 4 additions & 2 deletions deployment_on_cloud/aws/Readme.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
# Setting up EKS vLLM stack with one command

This script automatically configures a EKS LLM inference cluster.
Make sure your AWS cli is set up, logged in, and region set up. You have eksctl, kubectl, helm installed.
Make sure your AWS cli (v2) is installed, logged in, and region set up. You have eksctl, kubectl, helm installed.

Modify fields production_stack_specification.yaml and execute as:

```bash
bash entry_point.sh YOUR_AWSREGION YAML_FILE_PATH
```

Clean up the service (not the VPC) with:
Clean up the service with:

```bash
bash clean_up.sh production-stack YOUR_AWSREGION
```

You may also want to manually delete the VPC and clean up the cloud formation in the AWS Console.
95 changes: 46 additions & 49 deletions deployment_on_cloud/aws/clean_up.sh
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,9 @@ for TG_ARN in $TG_ARNs; do
aws elbv2 delete-target-group --target-group-arn "$TG_ARN" --region "$REGION"
done

# Delete NAT Gateways
# # Delete NAT Gateways
echo "Deleting NAT Gateways..."
NAT_GATEWAYS=$(aws ec2 describe-nat-gateways --filter "Name=tag:eks:cluster-name,Values=$CLUSTER_NAME" --query "NatGateways[].NatGatewayId" --output text --region "$REGION")
NAT_GATEWAYS=$(aws ec2 describe-nat-gateways --filter "Name=tag:Name,Values=eksctl-${CLUSTER_NAME}-cluster/NATGateway" --query "NatGateways[].NatGatewayId" --output text --region "$REGION")
for NAT_ID in $NAT_GATEWAYS; do
aws ec2 delete-nat-gateway --nat-gateway-id "$NAT_ID" --region "$REGION"
echo "Waiting for NAT Gateway $NAT_ID to be deleted..."
Expand All @@ -72,30 +72,27 @@ for EIP in $EIP_ALLOCS; do
done

# Release EFS and the created security group
while read -r fs_id; do
echo "Processing File System: $fs_id"
read -r fs_id < temp.txt
echo "Processing File System: $fs_id"

# Get the list of mount targets
mount_targets=$(aws efs describe-mount-targets --file-system-id "$fs_id" --query "MountTargets[*].MountTargetId" --output text)
# Get the list of mount targets
mount_targets=$(aws efs describe-mount-targets --file-system-id "$fs_id" --query "MountTargets[*].MountTargetId" --output text)

# Delete each mount target
for mt_id in $mount_targets; do
echo "Deleting Mount Target: $mt_id"
aws efs delete-mount-target --mount-target-id "$mt_id"
done

# Wait for mount targets to be deleted (optional, prevents API conflicts)
while [[ -n $(aws efs describe-mount-targets --file-system-id "$fs_id" --query "MountTargets[*].MountTargetId" --output text) ]]; do
echo "Waiting for mount targets to be deleted..."
sleep 10
done

# Delete the file system
echo "Deleting File System: $fs_id"
aws efs delete-file-system --file-system-id "$fs_id"
# Delete each mount target
for mt_id in $mount_targets; do
echo "Deleting Mount Target: $mt_id"
aws efs delete-mount-target --mount-target-id "$mt_id"
done

done < temp.txt
# Wait for mount targets to be deleted (optional, prevents API conflicts)
while [[ -n $(aws efs describe-mount-targets --file-system-id "$fs_id" --query "MountTargets[*].MountTargetId" --output text) ]]; do
echo "Waiting for mount targets to be deleted..."
sleep 10
done

# Delete the file system
echo "Deleting File System: $fs_id"
aws efs delete-file-system --file-system-id "$fs_id"

for sg in $(aws ec2 describe-security-groups --filters "Name=group-name,Values=efs-sg" --query "SecurityGroups[*].GroupId" --output text); do

Expand All @@ -113,32 +110,32 @@ aws eks delete-cluster --name "$CLUSTER_NAME" --region "$REGION"
echo "Waiting for cluster $CLUSTER_NAME to be deleted..."
aws eks wait cluster-deleted --name "$CLUSTER_NAME" --region "$REGION"

# Delete CloudFormation Stack
echo "Checking if CloudFormation stack exists for EKS cluster..."
STACK_NAME="eksctl-${CLUSTER_NAME}-cluster"
STACK_STATUS=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region "$REGION" --query "Stacks[0].StackStatus" --output text 2>/dev/null)

if [ -n "$STACK_STATUS" ]; then
echo "Deleting CloudFormation stack: $STACK_NAME"
aws cloudformation delete-stack --stack-name "$STACK_NAME" --region "$REGION"
echo "Waiting for CloudFormation stack $STACK_NAME to be deleted..."
aws cloudformation wait stack-delete-complete --stack-name "$STACK_NAME" --region "$REGION"
echo "CloudFormation stack $STACK_NAME has been deleted successfully!"
else
echo "CloudFormation stack $STACK_NAME not found, skipping..."
fi

STACK_NAME="eksctl-${CLUSTER_NAME}-cluster-nodegroup-gpu-nodegroup"
STACK_STATUS=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region "$REGION" --query "Stacks[0].StackStatus" --output text 2>/dev/null)

if [ -n "$STACK_STATUS" ]; then
echo "Deleting CloudFormation stack: $STACK_NAME"
aws cloudformation delete-stack --stack-name "$STACK_NAME" --region "$REGION"
echo "Waiting for CloudFormation stack $STACK_NAME to be deleted..."
aws cloudformation wait stack-delete-complete --stack-name "$STACK_NAME" --region "$REGION"
echo "CloudFormation stack $STACK_NAME has been deleted successfully!"
else
echo "CloudFormation stack $STACK_NAME not found, skipping..."
fi
# Clean up VPC
# echo "Cleaning up VPC..."
# VPC_ID=$(aws ec2 describe-vpcs \
# --filters "Name=tag:Name,Values=eksctl-${CLUSTER_NAME}-cluster/VPC" \
# --query "Vpcs[0].VpcId" \
# --output text \
# --region "$REGION")
# if [ -n "$VPC_ID" ]; then
# echo "Deleting VPC: $VPC_ID"
# aws ec2 delete-vpc --vpc-id "$VPC_ID" --region "$REGION"
# else
# echo "VPC not found, skipping..."
# fi

# Delete CloudFormation Stackecho "Deleting CloudFormation stacks..."
# STACKS=( "eksctl-${CLUSTER_NAME}-cluster" "eksctl-${CLUSTER_NAME}-cluster-nodegroup-gpu-nodegroup" )
# for STACK_NAME in "${STACKS[@]}"; do
# STACK_STATUS=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" --region "$REGION" --query "Stacks[0].StackStatus" --output text 2>/dev/null)
# if [ -n "$STACK_STATUS" ]; then
# echo "Deleting CloudFormation stack: $STACK_NAME"
# aws cloudformation delete-stack --stack-name "$STACK_NAME" --region "$REGION"
# echo "Waiting for CloudFormation stack $STACK_NAME to be deleted..."
# aws cloudformation wait stack-delete-complete --stack-name "$STACK_NAME" --region "$REGION"
# else
# echo "CloudFormation stack $STACK_NAME not found, skipping..."
# fi
# done

echo "EKS cluster $CLUSTER_NAME cleanup completed successfully!"
2 changes: 1 addition & 1 deletion deployment_on_cloud/aws/entry_point.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ eksctl create iamserviceaccount \

#create pv after modify the filesys id to be the filesys id
#storage needed is based on model weights
EFS_ID=$(cat temp.text)
EFS_ID=$(cat temp.txt)

cat <<EOF > efs-pv.yaml
apiVersion: v1
Expand Down
2 changes: 1 addition & 1 deletion deployment_on_cloud/aws/set_up_efs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -70,4 +70,4 @@ done

echo "EFS setup complete!"
echo "File System ID: $EFS_ID"
echo "$EFS_ID" > temp.text
echo "$EFS_ID" > temp.txt
2 changes: 1 addition & 1 deletion helm/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.0.8
version: 0.0.9

maintainers:
- name: apostac
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,13 @@ This guide walks you through the script that sets up a vLLM production-stack on

Before running this setup, ensure you have:

1. AWS CLI installed and configured with credential and region set up.
2. AWS eksctl
3. Kubectl
4. Helm
1. AWS CLI (version higher than v2) installed and configured with credential and region [[Link]](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
2. AWS eksctl [[Link]](https://eksctl.io/installation/)
3. Kubectl and Helm [[Link]](https://github.com/vllm-project/production-stack/blob/main/tutorials/00-install-kubernetes-env.md)

## TLDR

To run the service
To run the service, go into the "deployment_on_cloud/aws" folder and run:

```bash
bash entry_point.sh YOUR_AWSREGION EXAMPLE_YAML_PATH
Expand Down Expand Up @@ -243,6 +242,8 @@ This step cleans up EKS, mount-points, created security groups, EFS.
bash clean_up.sh "$CLUSTER_NAME" "$AWS_REGION"
```

You may also want to manually delete the VPC and clean up the cloud formation in the AWS Console.

## Summary

This tutorial covers:
Expand Down

0 comments on commit 4c3aeef

Please sign in to comment.