Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs #437

kr11 · 2024-11-27T05:25:24Z

Pull Request Description

[Please provide a clear and concise description of your changes here]

Related Issues

Description

When multiple KPAAutoScaler instances are created simultaneously, or when a same KPA is updated multiple times, there is a possibility that parameter values, particularly stateful metric parameters (like window length), may not update successfully. This PR addresses issue #406 and may also help resolve issue #436.

Existing Issues

The PodAutoscalerReconciler's AutoscalerMap is structured as {KPA: KPAAutoScaler, APA: APAAutoScaler}, meaning all KPAs share the same KPAAutoScaler object. During periodic calls to the reconcile function, the PA objects always update the same Autoscaler.
All KPA share the same KpaMetricClient, which maintains a dictionary mapping from NamespaceNameMetric to PanicWindow and StableWindow. The setting of panicWindowDuration(length of panic window) and stableWindowDuration does not take effect.

Modification

The PodAutoscalerReconciler's autoscalerMap is changed from {KPA: KPAAutoScaler, APA: APAAutoScaler} to NamespaceNameMetric -> KPAAutoScaler. Each PA has its own independent Context and metricClient.
We no longer create KPAAutoScaler instances in newReconciler., but waits until the reconcile function to create (if it does not exist) or update (if it already exists).
Each KpaMetricClient maintains only one panicWindow and one stableWindow, instead of maintaining metrics for multiple PAs.
Corrected the issue where settings for panicWindowDuration and stableWindowDuration were not taking effect.

TODO:

Currently, updates to existing panicWindowDuration and stableWindowDuration are not allowed, as this involves the problem of how metric data in the old window can be transferred to the new window. We will design this carefully in the future.
How should we support the deletion of PAs?

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

Jeffwan

i am curious whether we can make scaling context a map instead of entire autoscaler.

Jeffwan · 2024-11-27T18:46:28Z

pkg/controller/podautoscaler/metrics/client.go

@@ -54,42 +54,28 @@ type KPAMetricsClient struct {
 	// are collected and processed within the sliding window.
 	granularity time.Duration
 	// the difference between stable and panic metrics is the time window range
-	panicWindowDict  map[NamespaceNameMetric]*aggregation.TimeWindow
-	stableWindowDict map[NamespaceNameMetric]*aggregation.TimeWindow
+	panicWindow  *aggregation.TimeWindow


client is not singleton now? it only response for one metric?

client is not singleton now? it only response for one metric?

According to KNative, it employs a global metricCollector that maintains a map of metric windows for each Scaler. This global metricCollector is passed into NewAutoscaler so that all scaler.metricClient refer to the same object metricCollector.
In this PR, we mainly fix the bugs that we cannot assign time window length correctly according to individual scaler context.
Considering we have planned to revamp and streamline metrics management across the autoscaler, model adapter, and other components, I recommend we defer the transition of the metric client from individual to centralized handling to the upcoming PRs.

Jeffwan · 2024-11-27T18:46:51Z

pkg/controller/podautoscaler/metrics/client.go

@@ -112,7 +98,7 @@ func (c *KPAMetricsClient) UpdateMetrics(now time.Time, metricKey NamespaceNameM
 	defer c.collectionsMutex.Unlock()

 	// Update metrics into the window for tracking
-	err := c.UpdateMetricIntoWindow(metricKey, now, sumMetricValue)
+	err := c.UpdateMetricIntoWindow(now, sumMetricValue)


how does it know the metrics now?

now metric client is response for single scaler.

Jeffwan · 2024-11-27T18:48:47Z

pkg/controller/podautoscaler/podautoscaler_controller.go

@@ -74,36 +74,9 @@ func newReconciler(mgr manager.Manager) (reconcile.Reconciler, error) {
 		Mapper:         mgr.GetRESTMapper(),
 		resyncInterval: 10 * time.Second, // TODO: this should be override by an environment variable
 		eventCh:        make(chan event.GenericEvent),
+		AutoscalerMap:  make(map[metrics.NamespaceNameMetric]scaler.Scaler),


em. in this case, does it mean multiple HPA, KPA or APA scalers are flatten?
We used to have 3 singleton autoscaler

em. in this case, does it mean multiple HPA, KPA or APA scalers are flatten? We used to have 3 singleton autoscaler

I prefer to flatten them, as we need to manage individual object per scaler rather than one for each PA type. Considering we create multiple scalers with the same PA-Type (e.g. KPA), each one has a different context, metric window, and other stateful attributes, so we cannot merge them into one scaler object

kr11 · 2024-11-29T05:49:42Z

i am curious whether we can make scaling context a map instead of entire autoscaler.

Besides the context (stateless configuration), KPA and APA also possess stateful attributes such as panicTime, metrics, and DelayWindow.

I believe it's necessary to retain the entire scaler object, not just the context, even though we plan to centralize the metric window into a metricCollector in the future.

Jeffwan

/lgtm

Jeffwan · 2024-11-30T17:59:21Z

I had a brief discussion with @kr11 offline and get concensus on the flat structure to overcome the existing challenges. This can be merged.

…iple KPAs (#437) refactor scaler init and update

refactor scaler init and update

9bedaf3

kr11 changed the title ~~[WIP] refactor scaler: init and update~~ [WIP] Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs Nov 27, 2024

kr11 added the area/autoscaling label Nov 27, 2024

kr11 changed the title ~~[WIP] Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs~~ Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs Nov 27, 2024

Jeffwan reviewed Nov 27, 2024

View reviewed changes

Jeffwan approved these changes Nov 29, 2024

View reviewed changes

Jeffwan merged commit 9c85745 into main Nov 30, 2024
10 checks passed

Jeffwan deleted the kangrong/fix/multi_pa_metric_windows branch November 30, 2024 17:59

Jeffwan mentioned this pull request Nov 30, 2024

KPA shares same replica numbers for multiple autoscaler configration. #436

Closed

gangmuk pushed a commit that referenced this pull request Jan 25, 2025

Refactor Scaler: Resolve Issues with Metric Parameter Updates in Mult…

25eb982

…iple KPAs (#437) refactor scaler init and update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs #437

Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs #437

kr11 commented Nov 27, 2024 •

edited

Loading

Jeffwan left a comment

Jeffwan Nov 27, 2024

kr11 Nov 29, 2024

Jeffwan Nov 27, 2024

kr11 Nov 29, 2024

Jeffwan Nov 27, 2024

kr11 Nov 29, 2024

kr11 commented Nov 29, 2024

Jeffwan left a comment

Jeffwan commented Nov 30, 2024

Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs #437

Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs #437

Conversation

kr11 commented Nov 27, 2024 • edited Loading

Pull Request Description

Related Issues

Description

Existing Issues

Modification

TODO:

Pull Request Title Format

Submission Checklist

Jeffwan left a comment

Choose a reason for hiding this comment

Jeffwan Nov 27, 2024

Choose a reason for hiding this comment

kr11 Nov 29, 2024

Choose a reason for hiding this comment

Jeffwan Nov 27, 2024

Choose a reason for hiding this comment

kr11 Nov 29, 2024

Choose a reason for hiding this comment

Jeffwan Nov 27, 2024

Choose a reason for hiding this comment

kr11 Nov 29, 2024

Choose a reason for hiding this comment

kr11 commented Nov 29, 2024

Jeffwan left a comment

Choose a reason for hiding this comment

Jeffwan commented Nov 30, 2024

kr11 commented Nov 27, 2024 •

edited

Loading