-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs #437
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am curious whether we can make scaling context a map instead of entire autoscaler.
@@ -54,42 +54,28 @@ type KPAMetricsClient struct { | |||
// are collected and processed within the sliding window. | |||
granularity time.Duration | |||
// the difference between stable and panic metrics is the time window range | |||
panicWindowDict map[NamespaceNameMetric]*aggregation.TimeWindow | |||
stableWindowDict map[NamespaceNameMetric]*aggregation.TimeWindow | |||
panicWindow *aggregation.TimeWindow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client is not singleton now? it only response for one metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client is not singleton now? it only response for one metric?
According to KNative, it employs a global metricCollector that maintains a map of metric windows for each Scaler. This global metricCollector is passed into NewAutoscaler so that all scaler.metricClient refer to the same object metricCollector.
In this PR, we mainly fix the bugs that we cannot assign time window length correctly according to individual scaler context.
Considering we have planned to revamp and streamline metrics management across the autoscaler, model adapter, and other components, I recommend we defer the transition of the metric client from individual to centralized handling to the upcoming PRs.
@@ -112,7 +98,7 @@ func (c *KPAMetricsClient) UpdateMetrics(now time.Time, metricKey NamespaceNameM | |||
defer c.collectionsMutex.Unlock() | |||
|
|||
// Update metrics into the window for tracking | |||
err := c.UpdateMetricIntoWindow(metricKey, now, sumMetricValue) | |||
err := c.UpdateMetricIntoWindow(now, sumMetricValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does it know the metrics now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now metric client is response for single scaler.
@@ -74,36 +74,9 @@ func newReconciler(mgr manager.Manager) (reconcile.Reconciler, error) { | |||
Mapper: mgr.GetRESTMapper(), | |||
resyncInterval: 10 * time.Second, // TODO: this should be override by an environment variable | |||
eventCh: make(chan event.GenericEvent), | |||
AutoscalerMap: make(map[metrics.NamespaceNameMetric]scaler.Scaler), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
em. in this case, does it mean multiple HPA, KPA or APA scalers are flatten?
We used to have 3 singleton autoscaler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
em. in this case, does it mean multiple HPA, KPA or APA scalers are flatten? We used to have 3 singleton autoscaler
I prefer to flatten them, as we need to manage individual object per scaler rather than one for each PA type. Considering we create multiple scalers with the same PA-Type (e.g. KPA), each one has a different context, metric window, and other stateful attributes, so we cannot merge them into one scaler object
Besides the context (stateless configuration), KPA and APA also possess stateful attributes such as panicTime, metrics, and DelayWindow. I believe it's necessary to retain the entire scaler object, not just the context, even though we plan to centralize the metric window into a metricCollector in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
I had a brief discussion with @kr11 offline and get concensus on the flat structure to overcome the existing challenges. This can be merged. |
…iple KPAs (#437) refactor scaler init and update
Pull Request Description
[Please provide a clear and concise description of your changes here]
Related Issues
Resolves: #410 #436
Description
When multiple KPAAutoScaler instances are created simultaneously, or when a same KPA is updated multiple times, there is a possibility that parameter values, particularly stateful metric parameters (like window length), may not update successfully. This PR addresses issue #406 and may also help resolve issue #436.
Existing Issues
PodAutoscalerReconciler
'sAutoscalerMap
is structured as{KPA: KPAAutoScaler, APA: APAAutoScaler}
, meaning all KPAs share the sameKPAAutoScaler
object. During periodic calls to thereconcile
function, thePA
objects always update the sameAutoscaler
.KpaMetricClient
, which maintains a dictionary mapping fromNamespaceNameMetric
toPanicWindow
andStableWindow
. The setting ofpanicWindowDuration
(length of panic window) andstableWindowDuration
does not take effect.Modification
PodAutoscalerReconciler
'sautoscalerMap
is changed from{KPA: KPAAutoScaler, APA: APAAutoScaler}
toNamespaceNameMetric
->KPAAutoScaler
. EachPA
has its own independentContext
andmetricClient
.KPAAutoScaler
instances innewReconciler
., but waits until thereconcile
function to create (if it does not exist) or update (if it already exists).KpaMetricClient
maintains only onepanicWindow
and onestableWindow
, instead of maintaining metrics for multiple PAs.panicWindowDuration
andstableWindowDuration
were not taking effect.TODO:
panicWindowDuration
andstableWindowDuration
are not allowed, as this involves the problem of how metric data in the old window can be transferred to the new window. We will design this carefully in the future.Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]
: Corrections to existing functionality[CI]
: Changes to build process or CI pipeline[Docs]
: Updates or additions to documentation[API]
: Modifications to aibrix's API or interface[CLI]
: Changes or additions to the Command Line Interface[Misc]
: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.