Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build autoscaler abstractions like fetcher, client and scaler #300

Merged
merged 8 commits into from
Oct 17, 2024

Conversation

Jeffwan
Copy link
Collaborator

@Jeffwan Jeffwan commented Oct 16, 2024

Pull Request Description

This pull request includes several significant updates to the Pod Autoscaling algorithms and their related components. The changes introduce new scaling algorithms, enhance the existing ones, and update the metrics fetching mechanisms. Below are the most important changes grouped by theme.

New and Enhanced Scaling Algorithms:

  • Introduced the ScalingAlgorithm interface and extract the ApaScalingAlgorithm and KpaScalingAlgorithm classes with their respective ComputeTargetReplicas methods.
  • Refactored the metrics client to use a new PodMetricClient structure and introduced methods for updating and fetching pod metrics.
  • Added a new MetricFetcher interface and implemented it with RestMetricsFetcher, ResourceMetricsFetcher, and CustomMetricsFetcher classes. (pkg/controller/podautoscaler/metrics/fetcher.go)

These changes collectively enhance the autoscaling capabilities and metrics handling of the Pod Autoscaler, making it more robust and flexible for different scaling scenarios.

Related Issues

Resolves: part of #119

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@Jeffwan Jeffwan requested a review from kr11 October 16, 2024 05:54
@kr11
Copy link
Collaborator

kr11 commented Oct 16, 2024

Great work!

I 'd like to follow docs/tutorial/podautoscaler/README.md to test the new version of autoscaler on my docker-desktop, but meet error when running docker-build:

make docker-build IMG=aibrix/aibrix-controller-manager:v0.1.0-rc.3
make deploy IMG=aibrix/aibrix-controller-manager:v0.1.0-rc.3

The output was:

make docker-build IMG=aibrix/aibrix-controller-manager:v0.1.0-rc.3

make: *** No rule to make target `docker-build'.  Stop.

What's the update-to-date make build and make deploy commands?

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Oct 16, 2024

@kr11 Here's the new commands to use

.PHONY: docker-build-all
docker-build-all: docker-build-controller-manager docker-build-plugins docker-build-runtime docker-build-users ## Build all docker images

.PHONY: docker-build-controller-manager
docker-build-controller-manager: ## Build docker image with the manager.
        $(call build_and_tag,controller-manager,Dockerfile)

.PHONY: docker-build-plugins
docker-build-plugins: ## Build docker image with the plugins.
        $(call build_and_tag,plugins,Dockerfile.gateway)

.PHONY: docker-build-runtime
docker-build-runtime: ## Build docker image with the AI Runtime.
        $(call build_and_tag,runtime,Dockerfile.runtime)

.PHONY: docker-build-users
docker-build-users: ## Build docker image with the users.
        $(call build_and_tag,users,Dockerfile.users)

We change to docker-build-controller-manager now

@Jeffwan Jeffwan force-pushed the jiaxin/autoscaling-refactor-changes branch from 7d8fcbe to 5201235 Compare October 16, 2024 23:37
@Jeffwan Jeffwan force-pushed the jiaxin/autoscaling-refactor-changes branch from 5201235 to 6dbe0df Compare October 16, 2024 23:40
@kr11
Copy link
Collaborator

kr11 commented Oct 17, 2024

This PR LSTM, let's merge it.

@Jeffwan Jeffwan merged commit f19f5d8 into main Oct 17, 2024
9 checks passed
@Jeffwan Jeffwan deleted the jiaxin/autoscaling-refactor-changes branch October 17, 2024 16:48
Jeffwan added a commit that referenced this pull request Oct 22, 2024
* Update manifests version to v0.1.0-rc.3 (#287)

* [Misc] Add sync images step and scripts in release process (#283)

Add sync images step and scripts in release process

* [batch] E2E works with driver and request proxy  (#272)

* e2e driver and test

* comment functions

* check job status in test

* format update

* update copyright

* add examples with instructions and interfaces

* move batch tutorial

---------

Co-authored-by: xin.chen <[email protected]>

* Fix address already in use when AIRuntime start in pod (#289)

add uvicorn startup into file entrypoint

* Read model  name from request body (#290)

* Use model name from request body

* rename dummy to reserved router

* Fix redis bootstrap flaky connection issue (#293)

* skip docs CI if no changes in /docs dir (#294)

* skip docs CI if no changes in /docs dir

* test docs build

* Improve Rayclusterreplicaset Status (#295)

* improve rayclusterreplicaset status
* nit
* fix lint error
* improve isClusterActive logic
* fix lint error
* remove redundant isRayPodCreateOrDeleteFailed check
---------

Signed-off-by: Yicheng-Lu-llll <[email protected]>

* Add request trace for profiling (#291)

* Add request trace for profiling

* add to redis at 10 second interval

* nit

* round to nearest 10s interval

* round timestamp to nearest 10s interval and aggregate data by model

* add go routine to add request trace

* Update the crd definiton due to runtime upgrade (#298)

#295 introduce the latest kuberay api and the dependencies bumps sigs.k8s.io/controller-runtime from v0.17.3 to v0.17.5. Due to that change, make manifest update the CRD definitions

* Push images to Github registry in release pipeline (#301)

* Disable docker build github workflow to cut CI cost

* Push images to Github registry in release pipeline

* Build autoscaler abstractions like fetcher, client and scaler (#300)

* minor clean up on the autoscaler controller

* Extract the algorithm package

algorithm is extracted to distinguish with the scaler.

* Refactor scaler interface

1. Split the Scaler interface and BaseAutoscaler implementation
2. Create APA/KPA scaler separately and adopt the corresponding algorithms

* Introduce the scalingContext in algorithm

* Introduce k8s.io/metrics for resource & custom metrics fetching

* Extract metric fetcher to cover the fetching logic

* Optimize the scaler workflow to adopt fetch and client interface

* Further refactor the code structure

* Support pod autoscaler periodically check (#306)

* Support pod autoscaler periodically check

* Fix the error case

* Add timeout in nc check for redis bootstrap (#309)

* Refactor AutoScaler: metricClient, context, reconcile (#308)

* Refactor AutoScaler: optimize metric client, context, and reconcile processes.

* fix make lint-all

* fix typos

---------

Signed-off-by: Yicheng-Lu-llll <[email protected]>
Co-authored-by: xinchen384 <[email protected]>
Co-authored-by: xin.chen <[email protected]>
Co-authored-by: brosoul <[email protected]>
Co-authored-by: Varun Gupta <[email protected]>
Co-authored-by: Yicheng-Lu-llll <[email protected]>
Co-authored-by: Rong-Kang <[email protected]>
gangmuk pushed a commit that referenced this pull request Jan 25, 2025
* minor clean up on the autoscaler controller

* Extract the algorithm package

algorithm is extracted to distinguish with the scaler.

* Refactor scaler interface

1. Split the Scaler interface and BaseAutoscaler implementation
2. Create APA/KPA scaler separately and adopt the corresponding algorithms

* Introduce the scalingContext in algorithm

* Introduce k8s.io/metrics for resource & custom metrics fetching

* Extract metric fetcher to cover the fetching logic

* Optimize the scaler workflow to adopt fetch and client interface

* Further refactor the code structure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants