End-to-end benchmark pipeline for autoscalers and routing policies #650

gangmuk · 2025-02-11T21:27:08Z

Pull Request Description

This PR contains end-to-end benchmark pipeline for autoscaling experiments and routing policy experiments.

Currently, the main benchmark scripts are located in aibrix/benchmarks/autoscaling directory, including overnight_run.sh, run-test.sh with k8s yaml files and python scripts.

The client is in aibrix/benchmarks/generator/client.py

User should input the target autoscaler, routing policy, and workload trace file.

The pipeline should do

running the benchmark with client from the fresh state (restarting deployment, etc)
collecting related logs from pods and k8s api server
collecting the client side performance numbers
generating a report

WIP items:

Improving aibrix/benchmarks/generator/client.py
- TTFT
- TPOT
- Goodput
Improving plot script
- Tokens/s times series plot
- TTFT times series plot
- TPOT times series plot
- Goodput times series plot
README.md

Related Issues

Resolves: #[Insert issue number(s)]

Important: Before submitting, please complete the description above and review the checklist below.

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

Jeffwan · 2025-02-11T22:12:32Z

move the TODO items to a separate issue, please create it

Jeffwan · 2025-02-11T22:13:37Z

what's the purpose of check_k8s_is_ready.py, count_num_pods.py and set_num_replicas.py is it being integrated to somewhere else?

Jeffwan · 2025-02-11T22:08:03Z

benchmarks/autoscaling/workload/5s.jsonl

@@ -0,0 +1,203 @@
+{"timestamp": 21, "requests": [{"Prompt Length": 909, "Output Length": 22, "prompt": "Here is the introduction I have so far:\n\n# \\*\\*Introduction\\*\\*\n\nThis is a comprehensive introduction meant to bring you, the reader, up to speed with the current outline and motivations of the project.\n\n## \\*\\*What is \u2018The Journey\u2019\\*\\*\n\nThe Journey, derived from The Hero\u2019s Journey, is a theoretical roadmap to the development of the key elements that evoke powerful emotional reactions in people. The Hero\u2019s Journey is a structure that has been followed by some of the greatest stories ever told and ever lived.\n\nThe version of this journey described throughout the document is tailored for Kadence and is meant to serve as a reference point and a workspace for ideas, planning, and the execution of specialized tactics in order to continuously develop and progress the story which underlies the public representation of the ideas covered in this document.\n\nThe Journey, its associated ambitions, milestones, challenges, and gimmicks are experimental in nature, and thus are used to further our own research into the business of artist development, and quite possible leaving our mark on the World.\n\n## \\*\\*What is within this document?\\*\\*\n\nThis document contains a whole lot of useful information about characters, possible pathways of progression, theoretical understandings from The Hero\u2019s Journey, and much more. Overall, this document is a collection of all types of information that is relevant to the project undertakings described above.\n\n## \\*\\*How should this document be used?\\*\\*\n\nThis document should be seen strictly as an experimental guideline line used to plan and execute experimental content and plot lines with the intent of learning from experiment results and making changes to procedures in the future where necessary. With regards to content, the content database provided in this document will be used to visualize the potential timeline of events that will transpire once the official process has gone underway (ie. when the first piece of planned content is released to the public and the timeline must be adhered to.)\n\nIn addition to the content calendar, the document will be the gathering place for information deemed useful during the planning and execution process of projects such as the Docu-Series. This information serves to fuel the end-user content that is scheduled and created. By using the Hero\u2019s Journey as a guideline, maximum impact can be gradually attained via meticulous planning and execution of ordered story elements once it is distilled into its relevant parts here inside this document.\n\n## \\*\\*What is The Story\\*\\*\n\nThe Story is a character arch guideline for the Docu-series that is derived from the content of this page. It occurs over a discrete time period, subtly growing in both complexity and depth. The point of using a story is simple, it allows us as the creators of content to understand what type of activities, emotions, themes, places, people, and other story elements to include in order to progress the story, in film format, from a clear beginning to a decisive end without relying on specific events or in real life occurrences that might be outside of our control. By determining the characters in the story, as well as their personalities, aspirations, fears, hopes, and desires, we will be able to translate the implied reality of those characters into practical actions and plot points that can be made in the real world to add a touch of fantasy-like takeaways to the project.\n\nBy taking the time to understand both the created characters and their real life counterparts, we ensure maximum compatibility with your (you reading this document) own personal journey, as well as the journey of the characters within the story. This allows us to create a seamless and coherent narrative that will captivate and engage our audience, while also providing a meaningful and impactful experience for everyone involved.\n\nIn order to fully realize the potential of this project and to bring [The Story] to life, it is important to have a clear understanding of the key elements of The Hero\u2019s Journey, as well as an understanding of the specific goals and objectives of the project. With this information in hand, we can begin to craft a unique and compelling story that will capture the hearts and minds of our audience.\n\n### The Journey Ahead\n\nAs we embark on this journey, it is important to remember that this is an experiment and that the path ahead may be uncertain. However, by using The Hero\u2019s Journey as our guide, we can be sure to make the most of every opportunity that comes our way. With careful planning and execution, we can turn this project into something truly special and leave a lasting impact on the world.\n\"\n\n What is the next portion I should write?"}]}


Can we just document the steps to generate

5s

8min_up_and_down.jsonl

etc let's avoid checking to many static files

okay. It will be done by a separate client code update PR.

Jeffwan · 2025-02-11T22:09:03Z

benchmarks/autoscaling/README.md

+
+TODO
+
+`./overnight_run.sh workload/8min_up_and_down.jsonl`


I remember the experiment last 20mins, what does 8min mean here?

I was making the scripts general. but I can make it only for autoscaling. I will update it.

Jeffwan · 2025-02-11T22:10:53Z

benchmarks/autoscaling/overnight_run.sh

+
+# If you don't want to deploy any autoscaler use none. e.g., autoscalers="none"
+autoscalers="hpa kpa apa optimizer-kpa"
+routing_policies="random least-request least-kv-cache least-busy-time least-latency throughput"


we should split the tests into two separate tests. Can we just use naive way instead of customized routing policies for autoscaling expeirments?

Jeffwan · 2025-02-11T22:14:59Z

/cc @nwangfw @OrdinaryCrazy Please help take a look. You guys should be super familiar with all the setups

Jeffwan · 2025-02-11T23:35:20Z

could you please attach the deployment and autoscaling configurations for the autoscaling experiment?

gangmuk · 2025-02-11T23:43:51Z

check_k8s_is_ready.py is to check all the pods are in ready state before starting the experiment.
count_num_pods.py is running on background counting the number of pods periodically
set_num_replicas.py is to start experiments with the same number of instances (1 in autoscaling experiment case).

Jeffwan · 2025-02-12T01:53:34Z

I mean these can be done by kubectl, do you need such files for programmable integration?

gangmuk · 2025-02-12T01:55:25Z

You mean by checking kubectl get pod for example?

gangmuk · 2025-02-12T02:20:37Z

The scripts are to automate it. It restarts the deployments first and experiment will start when everything is ready. It is a part of script if that was what you were asking.

Jeffwan · 2025-02-12T05:19:30Z

@gangmuk makes sense.

nwangfw · 2025-02-12T16:38:50Z

benchmarks/autoscaling/deepseek-llm-7b-chat-v100/optimizer-kpa.yaml

Seems thatoptimizer-kpa.yaml is same as hetro-autoscaler.yaml. Why do we want to commit both of them?

nwangfw · 2025-02-12T17:50:09Z

benchmarks/autoscaling/deepseek-llm-7b-chat-v100/heter-gpu/deployment.yaml

We have double compared the heter-gpu/deployment yaml and deploy.yaml. I am thinking if we can remove heter-gpu folder in this PR and use deploy.yaml for all experiments.

nwangfw · 2025-02-12T18:02:44Z

benchmarks/autoscaling/deepseek-llm-7b-chat-v100/8_replica_hpa.yaml

Why do we need to have this 8_replica_hpa.yaml in this PR? Is it used for routing experiments?

gangmuk · 2025-02-12T21:29:54Z

I updated the readme and removed redundant yaml files. please take a look.

It works with the updated client which will is done by Le in a separate PR.

Jeffwan · 2025-02-12T22:35:20Z

benchmarks/autoscaling/deepseek-llm-7b-chat-v100/apa.yaml

+apiVersion: autoscaling.aibrix.ai/v1alpha1
+kind: PodAutoscaler
+metadata:
+    name: podautoscaler-deepseek-llm-7b-chat-v100-apa


seems the indent is not correct. can you change to indent 2 to get aligned with other files?

let's remove the detail gpu card specs like v100 in this folder

same as name

Jeffwan · 2025-02-12T22:35:33Z

benchmarks/autoscaling/deepseek-llm-7b-chat-v100/apa.yaml

+spec:
+    scalingStrategy: "APA"
+    minReplicas: 1
+    maxReplicas: 10


you only have 8 pods, why set to 10 here?

Jeffwan · 2025-02-12T22:36:57Z

benchmarks/autoscaling/deepseek-llm-7b-chat-v100/hpa.yaml

+apiVersion: autoscaling.aibrix.ai/v1alpha1
+kind: PodAutoscaler
+metadata:
+    name: podautoscaler-deepseek-llm-7b-chat-v100-hpa


same indent issues

) * Added related benchmark scripts * Added a set of workload traces * Removed experiment result files * Init Readme(WIP) * Remove routing part in script * Added k8s manifest (deployment and autoscalers) * Removed redundant yaml files * README for autoscaling experiment * Updated README --------- Co-authored-by: Gangmuk <[email protected]> Signed-off-by: Varun Gupta <[email protected]>

Gangmuk added 4 commits February 11, 2025 13:13

Added related benchmark scripts

a8a6f61

Added a set of workload traces

e616f80

Removed experiment result files

18c1225

Init Readme(WIP)

ae40ba9

gangmuk requested a review from Jeffwan February 11, 2025 21:27

Jeffwan reviewed Feb 11, 2025

View reviewed changes

Gangmuk added 2 commits February 12, 2025 00:22

Remove routing part in script

67fd469

Added k8s manifest (deployment and autoscalers)

84b8511

nwangfw reviewed Feb 12, 2025

View reviewed changes

Gangmuk added 3 commits February 12, 2025 10:34

Removed redundant yaml files

e633877

README for autoscaling experiment

0963a92

Updated README

597ddd3

Jeffwan reviewed Feb 12, 2025

View reviewed changes

gangmuk marked this pull request as draft February 12, 2025 22:46

gangmuk changed the title ~~[WIP] End-to-end benchmark pipeline for autoscalers and routing policies~~ End-to-end benchmark pipeline for autoscalers and routing policies Feb 12, 2025

Jeffwan marked this pull request as ready for review February 12, 2025 23:58

Jeffwan merged commit 7a341d6 into main Feb 12, 2025
2 checks passed

Jeffwan deleted the benchmark_pipeline branch February 12, 2025 23:58

gangmuk restored the benchmark_pipeline branch February 13, 2025 01:11

Jeffwan mentioned this pull request Feb 13, 2025

Improve the autoscaling benchmark scripts #666

Open

10 tasks

Jeffwan deleted the benchmark_pipeline branch February 26, 2025 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End-to-end benchmark pipeline for autoscalers and routing policies #650

End-to-end benchmark pipeline for autoscalers and routing policies #650

gangmuk commented Feb 11, 2025

Jeffwan commented Feb 11, 2025

Jeffwan commented Feb 11, 2025 •

edited

Loading

Jeffwan Feb 11, 2025

gangmuk Feb 11, 2025

Jeffwan Feb 11, 2025

gangmuk Feb 11, 2025

Jeffwan Feb 11, 2025

Jeffwan commented Feb 11, 2025

Jeffwan commented Feb 11, 2025

gangmuk commented Feb 11, 2025

Jeffwan commented Feb 12, 2025

gangmuk commented Feb 12, 2025

gangmuk commented Feb 12, 2025

Jeffwan commented Feb 12, 2025

nwangfw Feb 12, 2025

nwangfw Feb 12, 2025 •

edited

Loading

nwangfw Feb 12, 2025

gangmuk commented Feb 12, 2025

Jeffwan Feb 12, 2025

Jeffwan Feb 12, 2025

Jeffwan Feb 12, 2025

Jeffwan Feb 12, 2025

Jeffwan Feb 12, 2025

		@@ -0,0 +1,203 @@
		{"timestamp": 21, "requests": [{"Prompt Length": 909, "Output Length": 22, "prompt": "Here is the introduction I have so far:\n\n# \\\\Introduction\\\\\n\nThis is a comprehensive introduction meant to bring you, the reader, up to speed with the current outline and motivations of the project.\n\n## \\\\What is \u2018The Journey\u2019\\\\\n\nThe Journey, derived from The Hero\u2019s Journey, is a theoretical roadmap to the development of the key elements that evoke powerful emotional reactions in people. The Hero\u2019s Journey is a structure that has been followed by some of the greatest stories ever told and ever lived.\n\nThe version of this journey described throughout the document is tailored for Kadence and is meant to serve as a reference point and a workspace for ideas, planning, and the execution of specialized tactics in order to continuously develop and progress the story which underlies the public representation of the ideas covered in this document.\n\nThe Journey, its associated ambitions, milestones, challenges, and gimmicks are experimental in nature, and thus are used to further our own research into the business of artist development, and quite possible leaving our mark on the World.\n\n## \\\\What is within this document?\\\\\n\nThis document contains a whole lot of useful information about characters, possible pathways of progression, theoretical understandings from The Hero\u2019s Journey, and much more. Overall, this document is a collection of all types of information that is relevant to the project undertakings described above.\n\n## \\\\How should this document be used?\\\\\n\nThis document should be seen strictly as an experimental guideline line used to plan and execute experimental content and plot lines with the intent of learning from experiment results and making changes to procedures in the future where necessary. With regards to content, the content database provided in this document will be used to visualize the potential timeline of events that will transpire once the official process has gone underway (ie. when the first piece of planned content is released to the public and the timeline must be adhered to.)\n\nIn addition to the content calendar, the document will be the gathering place for information deemed useful during the planning and execution process of projects such as the Docu-Series. This information serves to fuel the end-user content that is scheduled and created. By using the Hero\u2019s Journey as a guideline, maximum impact can be gradually attained via meticulous planning and execution of ordered story elements once it is distilled into its relevant parts here inside this document.\n\n## \\\\What is The Story\\\\\n\nThe Story is a character arch guideline for the Docu-series that is derived from the content of this page. It occurs over a discrete time period, subtly growing in both complexity and depth. The point of using a story is simple, it allows us as the creators of content to understand what type of activities, emotions, themes, places, people, and other story elements to include in order to progress the story, in film format, from a clear beginning to a decisive end without relying on specific events or in real life occurrences that might be outside of our control. By determining the characters in the story, as well as their personalities, aspirations, fears, hopes, and desires, we will be able to translate the implied reality of those characters into practical actions and plot points that can be made in the real world to add a touch of fantasy-like takeaways to the project.\n\nBy taking the time to understand both the created characters and their real life counterparts, we ensure maximum compatibility with your (you reading this document) own personal journey, as well as the journey of the characters within the story. This allows us to create a seamless and coherent narrative that will captivate and engage our audience, while also providing a meaningful and impactful experience for everyone involved.\n\nIn order to fully realize the potential of this project and to bring [The Story] to life, it is important to have a clear understanding of the key elements of The Hero\u2019s Journey, as well as an understanding of the specific goals and objectives of the project. With this information in hand, we can begin to craft a unique and compelling story that will capture the hearts and minds of our audience.\n\n### The Journey Ahead\n\nAs we embark on this journey, it is important to remember that this is an experiment and that the path ahead may be uncertain. However, by using The Hero\u2019s Journey as our guide, we can be sure to make the most of every opportunity that comes our way. With careful planning and execution, we can turn this project into something truly special and leave a lasting impact on the world.\n\"\n\n What is the next portion I should write?"}]}

End-to-end benchmark pipeline for autoscalers and routing policies #650

End-to-end benchmark pipeline for autoscalers and routing policies #650

Conversation

gangmuk commented Feb 11, 2025

Pull Request Description

Related Issues

Pull Request Title Format

Submission Checklist

Jeffwan commented Feb 11, 2025

Jeffwan commented Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jeffwan commented Feb 11, 2025

Jeffwan commented Feb 11, 2025

gangmuk commented Feb 11, 2025

Jeffwan commented Feb 12, 2025

gangmuk commented Feb 12, 2025

gangmuk commented Feb 12, 2025

Jeffwan commented Feb 12, 2025

Choose a reason for hiding this comment

nwangfw Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gangmuk commented Feb 12, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jeffwan commented Feb 11, 2025 •

edited

Loading

nwangfw Feb 12, 2025 •

edited

Loading