-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
End-to-end benchmark pipeline for autoscalers and routing policies #650
Conversation
move the TODO items to a separate issue, please create it |
what's the purpose of |
@@ -0,0 +1,203 @@ | |||
{"timestamp": 21, "requests": [{"Prompt Length": 909, "Output Length": 22, "prompt": "Here is the introduction I have so far:\n\n# \\*\\*Introduction\\*\\*\n\nThis is a comprehensive introduction meant to bring you, the reader, up to speed with the current outline and motivations of the project.\n\n## \\*\\*What is \u2018The Journey\u2019\\*\\*\n\nThe Journey, derived from The Hero\u2019s Journey, is a theoretical roadmap to the development of the key elements that evoke powerful emotional reactions in people. The Hero\u2019s Journey is a structure that has been followed by some of the greatest stories ever told and ever lived.\n\nThe version of this journey described throughout the document is tailored for Kadence and is meant to serve as a reference point and a workspace for ideas, planning, and the execution of specialized tactics in order to continuously develop and progress the story which underlies the public representation of the ideas covered in this document.\n\nThe Journey, its associated ambitions, milestones, challenges, and gimmicks are experimental in nature, and thus are used to further our own research into the business of artist development, and quite possible leaving our mark on the World.\n\n## \\*\\*What is within this document?\\*\\*\n\nThis document contains a whole lot of useful information about characters, possible pathways of progression, theoretical understandings from The Hero\u2019s Journey, and much more. Overall, this document is a collection of all types of information that is relevant to the project undertakings described above.\n\n## \\*\\*How should this document be used?\\*\\*\n\nThis document should be seen strictly as an experimental guideline line used to plan and execute experimental content and plot lines with the intent of learning from experiment results and making changes to procedures in the future where necessary. With regards to content, the content database provided in this document will be used to visualize the potential timeline of events that will transpire once the official process has gone underway (ie. when the first piece of planned content is released to the public and the timeline must be adhered to.)\n\nIn addition to the content calendar, the document will be the gathering place for information deemed useful during the planning and execution process of projects such as the Docu-Series. This information serves to fuel the end-user content that is scheduled and created. By using the Hero\u2019s Journey as a guideline, maximum impact can be gradually attained via meticulous planning and execution of ordered story elements once it is distilled into its relevant parts here inside this document.\n\n## \\*\\*What is The Story\\*\\*\n\nThe Story is a character arch guideline for the Docu-series that is derived from the content of this page. It occurs over a discrete time period, subtly growing in both complexity and depth. The point of using a story is simple, it allows us as the creators of content to understand what type of activities, emotions, themes, places, people, and other story elements to include in order to progress the story, in film format, from a clear beginning to a decisive end without relying on specific events or in real life occurrences that might be outside of our control. By determining the characters in the story, as well as their personalities, aspirations, fears, hopes, and desires, we will be able to translate the implied reality of those characters into practical actions and plot points that can be made in the real world to add a touch of fantasy-like takeaways to the project.\n\nBy taking the time to understand both the created characters and their real life counterparts, we ensure maximum compatibility with your (you reading this document) own personal journey, as well as the journey of the characters within the story. This allows us to create a seamless and coherent narrative that will captivate and engage our audience, while also providing a meaningful and impactful experience for everyone involved.\n\nIn order to fully realize the potential of this project and to bring [The Story] to life, it is important to have a clear understanding of the key elements of The Hero\u2019s Journey, as well as an understanding of the specific goals and objectives of the project. With this information in hand, we can begin to craft a unique and compelling story that will capture the hearts and minds of our audience.\n\n### The Journey Ahead\n\nAs we embark on this journey, it is important to remember that this is an experiment and that the path ahead may be uncertain. However, by using The Hero\u2019s Journey as our guide, we can be sure to make the most of every opportunity that comes our way. With careful planning and execution, we can turn this project into something truly special and leave a lasting impact on the world.\n\"\n\n What is the next portion I should write?"}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just document the steps to generate
- 5s
- 8min_up_and_down.jsonl
etc let's avoid checking to many static files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. It will be done by a separate client code update PR.
benchmarks/autoscaling/README.md
Outdated
|
||
TODO | ||
|
||
`./overnight_run.sh workload/8min_up_and_down.jsonl` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember the experiment last 20mins, what does 8min mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was making the scripts general. but I can make it only for autoscaling. I will update it.
|
||
# If you don't want to deploy any autoscaler use none. e.g., autoscalers="none" | ||
autoscalers="hpa kpa apa optimizer-kpa" | ||
routing_policies="random least-request least-kv-cache least-busy-time least-latency throughput" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should split the tests into two separate tests. Can we just use naive way instead of customized routing policies for autoscaling expeirments?
/cc @nwangfw @OrdinaryCrazy Please help take a look. You guys should be super familiar with all the setups |
could you please attach the deployment and autoscaling configurations for the autoscaling experiment? |
|
I mean these can be done by |
You mean by checking |
The scripts are to automate it. It restarts the deployments first and experiment will start when everything is ready. It is a part of script if that was what you were asking. |
@gangmuk makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems thatoptimizer-kpa.yaml
is same as hetro-autoscaler.yaml
. Why do we want to commit both of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have double compared the heter-gpu/deployment
yaml and deploy.yaml
. I am thinking if we can remove heter-gpu folder in this PR and use deploy.yaml
for all experiments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to have this 8_replica_hpa.yaml
in this PR? Is it used for routing experiments?
I updated the readme and removed redundant yaml files. please take a look. It works with the updated client which will is done by Le in a separate PR. |
apiVersion: autoscaling.aibrix.ai/v1alpha1 | ||
kind: PodAutoscaler | ||
metadata: | ||
name: podautoscaler-deepseek-llm-7b-chat-v100-apa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems the indent is not correct. can you change to indent 2 to get aligned with other files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove the detail gpu card specs like v100 in this folder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as name
spec: | ||
scalingStrategy: "APA" | ||
minReplicas: 1 | ||
maxReplicas: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you only have 8 pods, why set to 10 here?
apiVersion: autoscaling.aibrix.ai/v1alpha1 | ||
kind: PodAutoscaler | ||
metadata: | ||
name: podautoscaler-deepseek-llm-7b-chat-v100-hpa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same indent issues
) * Added related benchmark scripts * Added a set of workload traces * Removed experiment result files * Init Readme(WIP) * Remove routing part in script * Added k8s manifest (deployment and autoscalers) * Removed redundant yaml files * README for autoscaling experiment * Updated README --------- Co-authored-by: Gangmuk <[email protected]> Signed-off-by: Varun Gupta <[email protected]>
Pull Request Description
This PR contains end-to-end benchmark pipeline for autoscaling experiments and routing policy experiments.
Currently, the main benchmark scripts are located in
aibrix/benchmarks/autoscaling
directory, includingovernight_run.sh
,run-test.sh
with k8s yaml files and python scripts.The client is in
aibrix/benchmarks/generator/client.py
User should input the target autoscaler, routing policy, and workload trace file.
The pipeline should do
WIP items:
Related Issues
Resolves: #[Insert issue number(s)]
Important: Before submitting, please complete the description above and review the checklist below.
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]
: Corrections to existing functionality[CI]
: Changes to build process or CI pipeline[Docs]
: Updates or additions to documentation[API]
: Modifications to aibrix's API or interface[CLI]
: Changes or additions to the Command Line Interface[Misc]
: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.