Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End-to-end benchmark pipeline for autoscalers and routing policies #650

Merged
merged 9 commits into from
Feb 12, 2025

Conversation

gangmuk
Copy link
Collaborator

@gangmuk gangmuk commented Feb 11, 2025

Pull Request Description

This PR contains end-to-end benchmark pipeline for autoscaling experiments and routing policy experiments.

Currently, the main benchmark scripts are located in aibrix/benchmarks/autoscaling directory, including overnight_run.sh, run-test.sh with k8s yaml files and python scripts.

The client is in aibrix/benchmarks/generator/client.py

User should input the target autoscaler, routing policy, and workload trace file.

The pipeline should do

  1. running the benchmark with client from the fresh state (restarting deployment, etc)
  2. collecting related logs from pods and k8s api server
  3. collecting the client side performance numbers
  4. generating a report

WIP items:

  • Improving aibrix/benchmarks/generator/client.py
    • TTFT
    • TPOT
    • Goodput
  • Improving plot script
    • Tokens/s times series plot
    • TTFT times series plot
    • TPOT times series plot
    • Goodput times series plot
  • README.md

Related Issues

Resolves: #[Insert issue number(s)]

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@gangmuk gangmuk requested a review from Jeffwan February 11, 2025 21:27
@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 11, 2025

move the TODO items to a separate issue, please create it

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 11, 2025

what's the purpose of check_k8s_is_ready.py, count_num_pods.py and set_num_replicas.py is it being integrated to somewhere else?

@@ -0,0 +1,203 @@
{"timestamp": 21, "requests": [{"Prompt Length": 909, "Output Length": 22, "prompt": "Here is the introduction I have so far:\n\n# \\*\\*Introduction\\*\\*\n\nThis is a comprehensive introduction meant to bring you, the reader, up to speed with the current outline and motivations of the project.\n\n## \\*\\*What is \u2018The Journey\u2019\\*\\*\n\nThe Journey, derived from The Hero\u2019s Journey, is a theoretical roadmap to the development of the key elements that evoke powerful emotional reactions in people. The Hero\u2019s Journey is a structure that has been followed by some of the greatest stories ever told and ever lived.\n\nThe version of this journey described throughout the document is tailored for Kadence and is meant to serve as a reference point and a workspace for ideas, planning, and the execution of specialized tactics in order to continuously develop and progress the story which underlies the public representation of the ideas covered in this document.\n\nThe Journey, its associated ambitions, milestones, challenges, and gimmicks are experimental in nature, and thus are used to further our own research into the business of artist development, and quite possible leaving our mark on the World.\n\n## \\*\\*What is within this document?\\*\\*\n\nThis document contains a whole lot of useful information about characters, possible pathways of progression, theoretical understandings from The Hero\u2019s Journey, and much more. Overall, this document is a collection of all types of information that is relevant to the project undertakings described above.\n\n## \\*\\*How should this document be used?\\*\\*\n\nThis document should be seen strictly as an experimental guideline line used to plan and execute experimental content and plot lines with the intent of learning from experiment results and making changes to procedures in the future where necessary. With regards to content, the content database provided in this document will be used to visualize the potential timeline of events that will transpire once the official process has gone underway (ie. when the first piece of planned content is released to the public and the timeline must be adhered to.)\n\nIn addition to the content calendar, the document will be the gathering place for information deemed useful during the planning and execution process of projects such as the Docu-Series. This information serves to fuel the end-user content that is scheduled and created. By using the Hero\u2019s Journey as a guideline, maximum impact can be gradually attained via meticulous planning and execution of ordered story elements once it is distilled into its relevant parts here inside this document.\n\n## \\*\\*What is The Story\\*\\*\n\nThe Story is a character arch guideline for the Docu-series that is derived from the content of this page. It occurs over a discrete time period, subtly growing in both complexity and depth. The point of using a story is simple, it allows us as the creators of content to understand what type of activities, emotions, themes, places, people, and other story elements to include in order to progress the story, in film format, from a clear beginning to a decisive end without relying on specific events or in real life occurrences that might be outside of our control. By determining the characters in the story, as well as their personalities, aspirations, fears, hopes, and desires, we will be able to translate the implied reality of those characters into practical actions and plot points that can be made in the real world to add a touch of fantasy-like takeaways to the project.\n\nBy taking the time to understand both the created characters and their real life counterparts, we ensure maximum compatibility with your (you reading this document) own personal journey, as well as the journey of the characters within the story. This allows us to create a seamless and coherent narrative that will captivate and engage our audience, while also providing a meaningful and impactful experience for everyone involved.\n\nIn order to fully realize the potential of this project and to bring [The Story] to life, it is important to have a clear understanding of the key elements of The Hero\u2019s Journey, as well as an understanding of the specific goals and objectives of the project. With this information in hand, we can begin to craft a unique and compelling story that will capture the hearts and minds of our audience.\n\n### The Journey Ahead\n\nAs we embark on this journey, it is important to remember that this is an experiment and that the path ahead may be uncertain. However, by using The Hero\u2019s Journey as our guide, we can be sure to make the most of every opportunity that comes our way. With careful planning and execution, we can turn this project into something truly special and leave a lasting impact on the world.\n\"\n\n What is the next portion I should write?"}]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just document the steps to generate

  • 5s
  • 8min_up_and_down.jsonl

etc let's avoid checking to many static files

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay. It will be done by a separate client code update PR.


TODO

`./overnight_run.sh workload/8min_up_and_down.jsonl`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember the experiment last 20mins, what does 8min mean here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was making the scripts general. but I can make it only for autoscaling. I will update it.


# If you don't want to deploy any autoscaler use none. e.g., autoscalers="none"
autoscalers="hpa kpa apa optimizer-kpa"
routing_policies="random least-request least-kv-cache least-busy-time least-latency throughput"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should split the tests into two separate tests. Can we just use naive way instead of customized routing policies for autoscaling expeirments?

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 11, 2025

/cc @nwangfw @OrdinaryCrazy Please help take a look. You guys should be super familiar with all the setups

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 11, 2025

could you please attach the deployment and autoscaling configurations for the autoscaling experiment?

@gangmuk
Copy link
Collaborator Author

gangmuk commented Feb 11, 2025

check_k8s_is_ready.py is to check all the pods are in ready state before starting the experiment.
count_num_pods.py is running on background counting the number of pods periodically
set_num_replicas.py is to start experiments with the same number of instances (1 in autoscaling experiment case).

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 12, 2025

I mean these can be done by kubectl, do you need such files for programmable integration?

@gangmuk
Copy link
Collaborator Author

gangmuk commented Feb 12, 2025

You mean by checking kubectl get pod for example?

@gangmuk
Copy link
Collaborator Author

gangmuk commented Feb 12, 2025

The scripts are to automate it. It restarts the deployments first and experiment will start when everything is ready. It is a part of script if that was what you were asking.

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 12, 2025

@gangmuk makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems thatoptimizer-kpa.yaml is same as hetro-autoscaler.yaml. Why do we want to commit both of them?

Copy link
Collaborator

@nwangfw nwangfw Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have double compared the heter-gpu/deployment yaml and deploy.yaml. I am thinking if we can remove heter-gpu folder in this PR and use deploy.yaml for all experiments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to have this 8_replica_hpa.yaml in this PR? Is it used for routing experiments?

@gangmuk
Copy link
Collaborator Author

gangmuk commented Feb 12, 2025

I updated the readme and removed redundant yaml files. please take a look.

It works with the updated client which will is done by Le in a separate PR.

apiVersion: autoscaling.aibrix.ai/v1alpha1
kind: PodAutoscaler
metadata:
name: podautoscaler-deepseek-llm-7b-chat-v100-apa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the indent is not correct. can you change to indent 2 to get aligned with other files?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove the detail gpu card specs like v100 in this folder

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as name

spec:
scalingStrategy: "APA"
minReplicas: 1
maxReplicas: 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you only have 8 pods, why set to 10 here?

apiVersion: autoscaling.aibrix.ai/v1alpha1
kind: PodAutoscaler
metadata:
name: podautoscaler-deepseek-llm-7b-chat-v100-hpa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same indent issues

@gangmuk gangmuk marked this pull request as draft February 12, 2025 22:46
@gangmuk gangmuk changed the title [WIP] End-to-end benchmark pipeline for autoscalers and routing policies End-to-end benchmark pipeline for autoscalers and routing policies Feb 12, 2025
@Jeffwan Jeffwan marked this pull request as ready for review February 12, 2025 23:58
@Jeffwan Jeffwan merged commit 7a341d6 into main Feb 12, 2025
2 checks passed
@Jeffwan Jeffwan deleted the benchmark_pipeline branch February 12, 2025 23:58
@gangmuk gangmuk restored the benchmark_pipeline branch February 13, 2025 01:11
varungup90 pushed a commit that referenced this pull request Feb 20, 2025
)

* Added related benchmark scripts

* Added a set of workload traces

* Removed experiment result files

* Init Readme(WIP)

* Remove routing part in script

* Added k8s manifest (deployment and autoscalers)

* Removed redundant yaml files

* README for autoscaling experiment

* Updated README

---------

Co-authored-by: Gangmuk <[email protected]>
Signed-off-by: Varun Gupta <[email protected]>
@Jeffwan Jeffwan deleted the benchmark_pipeline branch February 26, 2025 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants