Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add v1alpha2 grpc api #427

Merged
merged 8 commits into from
Mar 26, 2019
Merged

Conversation

YujiOshima
Copy link
Contributor

@YujiOshima YujiOshima commented Mar 12, 2019

#423
I added v1alpha2 gRPC API complies with the v1alpha2 CRD.
https://github.com/kubeflow/katib/pull/427/files#diff-c0f6760a929f1056527071edf1f57db8

I will add a new DB schema soon.

@richardsliu @johnugeorge @hougangliu @gaocegege

Signed-off-by: YujiOshima [email protected]


This change is Reviewable

Signed-off-by: YujiOshima <[email protected]>
Signed-off-by: YujiOshima <[email protected]>
Signed-off-by: YujiOshima <[email protected]>
@YujiOshima
Copy link
Contributor Author

I added a new version of DB IF
1a6d4ca

ObjectiveSpec objective = 2;
AlgorithmSpec algorithm = 3;
string trial_template = 4;
int32 pallarel_trial_count = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo - parallel

string value = 2;
}

message MetricLog {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for?

repeated Metric metrics = 1;
}

message ObservationLog {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document what this is used for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added doc here 2e3444d#diff-c0f6760a929f1056527071edf1f57db8R117
Katib will store every log of metrics. Users can look at the history of training. And the logs are used by suggestion algorithm and early stopping.

}

message GetExperimentListReply {
repeated ExperimentSummary experiment_summarys = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summaries

objective TEXT,
algorithm TEXT,
trial_template TEXT,
pallarel_trial_count INT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallel

*/
message GraphConfig {
int32 num_layers = 1; /// Number of layers
repeated int32 input_size = 2; /// Dimenstions of input size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dimenstions -> Dimensions


message ExperimentSummary {
string experiment_name = 1;
ExperimentStatus.ExperimentConditionType condition = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to change 4 to 2 ?

start_time DATETIME(6),
metrics_collector_type TEXT,
completion_time DATETIME(6),
last_reconcile_time DATETIME(6))`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we add NasConfig in experiments table in the future?

@YujiOshima
Copy link
Contributor Author

YujiOshima commented Mar 15, 2019

@richardsliu @andreyvelich Thanks! I applied your comments.

ObjectiveType type = 1;
float goal = 2;
string objective_metric_name = 3;
repeated string additional_metrics_name = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional_metrics_names?

string value = 2;
}
string name = 1;
repeated ValueLog valuelog = 2;
Copy link
Member

@johnugeorge johnugeorge Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would it be a list of [timestamp-value] pair for a particular key? Isn't it, a [key-value] pair for a particular timestamp?

Copy link
Member

@johnugeorge johnugeorge Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the metric name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/valuelog/value_log

Experiment experiment = 1;
}

message GetStudyListRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this used for?


message GetTrialListRequest {
string experiment_name = 1;
string start_time = 2; ///The start of the time range. RFC3339 format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these filter options?

message GetSuggestionsRequest {
string experiment_name = 1;
string algorithm_name = 2;
int32 request_number = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this field used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you set 3 to request_number, you can get three Suggestions at one time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this same as the parallel_trial_count which is already in the experiment Spec ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little bit different.
When parallel_trial_count is 4 and 2 trials are completed at the same time, the operator needs to set 2 to request_number in GetSuggesitonRequest.

Copy link
Member

@johnugeorge johnugeorge Mar 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks


message ExperimentSummary {
string experiment_name = 1;
ExperimentStatus.ExperimentConditionType condition = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add more fields as in ExperimentStatus https://github.com/kubeflow/katib/blob/master/pkg/api/operators/apis/experiment/v1alpha2/types.go#L75
OptimalTrial ,TrialsCompleted,TrialsFailed ,TrialsKilled ,TrialsPending

}
string start_time = 1; /// RFC3339 format
string completion_time = 2; /// RFC3339 format
string last_reconcile_time = 3; /// RFC3339 format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need last_reconcile_time in DB?

}
string start_time = 1; /// RFC3339 format
string completion_time = 2; /// RFC3339 format
string last_reconcile_time = 3; /// RFC3339 format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need last_reconcile_time in DB?

}

message Trial {
string id = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ID or name(like in experiment)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the trials are generated by the controller, users won't name them.
I think ID is more suitable here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true. But, I feel that it might be better to keep the same terminology as in the Kubernetes config(matadata.name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. "name" would be more user friendly.

}

message RegisterTrialReply {
string trial_id = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DB interface can have multiple storage backend(Mysql, ModelDb etc). Is the controller creating Id? What is the return trial_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption, it is created by DB backend.
But it is OK the controller create the ID. It will probably be more consistent behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if every storage backend supports id creation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. let's make the name of trial is generated by controller.

/**
* Update Status of a experiment.
*/
rpc UpdateExperimentStatus(UpdateExperimentStatusRequest) returns (UpdateExperimentStatusReply){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, Suggestion algorithms can modify the algorithm parameters. https://github.com/kubeflow/katib/blob/master/pkg/suggestion/grid_service.go#L195

How is this handled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create UpdateAlgorithmParameter API.

* Config for operations in DAG
*/
message Operation {
string operationType = 1; /// Type of operation in DAG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/operationType/operation_type to keep same name style as others

string value = 2;
}
string name = 1;
repeated ValueLog valuelog = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/valuelog/value_log

}

message ObservationLog {
repeated MetricLog metriclogs = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/metriclogs/metric_logs

Signed-off-by: YujiOshima <[email protected]>
@YujiOshima
Copy link
Contributor Author

@johnugeorge @hougangliu Thank you for your comments. I updated.
PTAL.

id INT AUTO_INCREMENT PRIMARY KEY,
time DATETIME(6),
name VARCHAR(255),
metric_name VARCHAR(255),
value TEXT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this metric_value field?

@johnugeorge
Copy link
Member

@YujiOshima I see that new API UpdateAlgorithmSpec is added. How is experiment Spec update handled in general? Say If user updates fields like MaxTrials, ParallelTrials etc in K8s config , how is DB updated?

@YujiOshima
Copy link
Contributor Author

@johnugeorge You have a point.
The problem is how to sync the AlgorithmSpec between k8s CR and DB.
One idea, the AlgorithmSpec would be an initial config of an algorithm for the experiment and it will not change.
And adding a new table AlgorithmStatus to DB. SuggestionService will save the status of the algorithm with UpdateAlgorithmStatus to DB.
WDYT?

@YujiOshima YujiOshima mentioned this pull request Mar 22, 2019
@johnugeorge
Copy link
Member

@YujiOshima How would AlgorithmStatus structure look ? Is it a key-value pair(similar to AlgorithmSetting) that would be only configured by Algorithms internally?

@YujiOshima
Copy link
Contributor Author

@johnugeorge Yes. Since the parameter that each algorithm need is different, the AlgorithmStatus should have a general struct like key-value pair.

that would be only configured by Algorithms internally?

Yes.

@johnugeorge
Copy link
Member

@YujiOshima
Few thoughts

  1. What does status mean in `AlgorithmStatus' call? I think, term is slightly confusing.

  2. From what i understood, this rpc call can be used to set extra key-value pairs (Algorithm specific parameters, in addition to the ones set by user in ExperimentSpec). Is it right?

3 . Do we need UpdateAlgorithmSpec call in such a case?

@YujiOshima
Copy link
Contributor Author

@johnugeorge Sorry for confusing you.

From what i understood, this rpc call can be used to set extra key-value pairs (Algorithm specific parameters, in addition to the ones set by user in ExperimentSpec). Is it right?

Right.
We need to store the extra key-value pairs to DB.
As you pointed, we should not use UpdateAlgorithmSpec in this case (I will delete this API since I think we don't need it now).
I called The extra parameters AlgorithmStatus in the previous comment.
My idea is the extra parameters are handled independently from AlgorithmSpec of ExperimentSpec.
When a Suggestion service calls UpdateAlgorithmStatus gRpc API to save the parameters, the Katib-manager save the parameter to DB and it doesn't any effect on Experiments CR.

make sense?

@johnugeorge
Copy link
Member

@johnugeorge Sorry for confusing you.

From what i understood, this rpc call can be used to set extra key-value pairs (Algorithm specific parameters, in addition to the ones set by user in ExperimentSpec). Is it right?

Right.
We need to store the extra key-value pairs to DB.
As you pointed, we should not use UpdateAlgorithmSpec in this case (I will delete this API since I think we don't need it now).
I called The extra parameters AlgorithmStatus in the previous comment.
My idea is the extra parameters are handled independently from AlgorithmSpec of ExperimentSpec.
When a Suggestion service calls UpdateAlgorithmStatus gRpc API to save the parameters, the Katib-manager save the parameter to DB and it doesn't any effect on Experiments CR.

make sense?

Yes. Should we rename the api UpdateAlgorithmStatus to a better one? Status looks misleading since we are just updating algorithm specific parameters

/**
* List of ParameterAssignment
*/
message ParameterAssignments {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YujiOshima @richardsliu I think we need ParameterSpec instead of ParameterAssignments, that I told you. We need Feasible space inside ParameterConfigs in Operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@YujiOshima
Copy link
Contributor Author

@johnugeorge How about

  • AlgorithmVariables
  • AdditionalAlgorithmSetting

Signed-off-by: YujiOshima <[email protected]>
Signed-off-by: YujiOshima <[email protected]>
@YujiOshima YujiOshima changed the title [WIP] add v1alpha2 grpc api add v1alpha2 grpc api Mar 26, 2019
@YujiOshima
Copy link
Contributor Author

@johnugeorge @andreyvelich @richardsliu @hougangliu I think I applied all comments. PTAL.

@hougangliu
Copy link
Member

/lgtm

@johnugeorge
Copy link
Member

/retest

4 similar comments
@YujiOshima
Copy link
Contributor Author

/retest

@johnugeorge
Copy link
Member

/retest

@johnugeorge
Copy link
Member

/retest

@johnugeorge
Copy link
Member

/retest

@johnugeorge
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 1316bad into kubeflow:master Mar 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants