Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP fetch abort_recipe_on_fail attribute #295

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

janjurca
Copy link

@janjurca janjurca commented May 3, 2023

Hello
I've added feature to http fetch tag. Now attribute abort_recipe_on_fail="on/off" can be placed in there.
If fetch of this task fail then the taks and all subsequent tasks in recipe are aborted.

@janjurca janjurca changed the title [WIP] HTTP fetch abort_recipeset_on_fail attribute [WIP] HTTP fetch abort_recipe_on_fail attribute May 16, 2023
@jbastian jbastian self-requested a review June 28, 2023 14:32
@jbastian
Copy link
Contributor

LGTM

@janjurca janjurca marked this pull request as ready for review July 3, 2023 07:45
@janjurca
Copy link
Author

@jbastian

LGTM

Thanks for review. I've removed the draft status from the pull request. Can you aprove it? Thanks!

@janjurca janjurca changed the title [WIP] HTTP fetch abort_recipe_on_fail attribute HTTP fetch abort_recipe_on_fail attribute Aug 2, 2023
src/task.c Outdated
@@ -116,6 +116,7 @@ fetch_finish_callback (GError *error, guint32 match_cnt,
} else {
g_propagate_error (&task->error, error);
task->state = TASK_COMPLETE;
task->fetch_succeeded = false;
Copy link
Contributor

@cbouchar cbouchar Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal 1 out of 2:
What if you did,
if (task->abort_recipe_on_fail) {
task->fetch_succeeded = false; # You do need this var to differentiate from rstrnt-abort
app_data->aborted = ABORTED_TASK;
g_cancellable_cancel(app_data->cancellable);
}
THEN
Further down the code in the state case TASK_COMPLETE:
// Set task finished
if (g_cancellable_is_cancelled(app_data->cancellable) &&
app_data->aborted != ABORTED_NONE) {
g_clear_error(&task->error);
if {task->fetch_succeeded) {
g_set_error(&task->error, RESTRAINT_ERROR,
RESTRAINT_TASK_RUNNER_ABORTED,
"Aborted by rstrnt-abort");
}else {
g_set_error(&task->error, RESTRAINT_ERROR,
RESTRAINT_TASK_RUNNER_ABORTED,
"Aborted due to fetch failure");
}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cbouchar And thanks a lot for comment.

Your are right this solution would be better and in fact it was the first solution I have tried. But it turned out not to be working. In this solution the task is corectly aborted with the correct abort message.
I've reimplemented this solution in this commit (3258337) and tried to run it beaker. And even though the task is aborted correctly the whole recipe just continue to run and the following tasks are executed as normal.
I haven't been able to solve this issue in the past so I've come up with the solution you've been reviewing.

So the actual problem with this solution is that the recipe is not aborted and conitues to run even though the RESTRAINT_ERROR is properly set.
I'll look at this aborting issue once again and try to solve it, but If you have any idea why the recipe is not aborted I'd like to hear your insights.

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure you saw my 2nd proposal. This sounds like an extreme corner case for a specific area in your recipe. It seems more reasonable to create a task to check if your requested test got installed. if not just call rstrnt-abort? Messing with restraint will affect all users and you'd be surprised the bugs that can surface.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw your second proposal. If there won't be any possibility to push this update upstream, we would need to implement some kind of wrapper. However, even this wrapper could fail to fetch data, which would result in the same outcome as if there were no wrapper at all.

Please take a quick look at my latest changes. I've refactored the code to minimize disruption and make the new feature as optional as possible for the end user.

Regarding this feature, you may see it as an extreme corner case. But is it? All we want to do is abort the recipe if there is a fetch error in some task. As it stands, the user currently has no way to do that in restraint. This provides the user with an option—not a necessity—to abort the recipe, which could result in saving computation time and stopping tasks that would produce irrelevant data.

And yes, I know there's always the possibility that some bugs could surface, but as I've said, this feature is designed to be as optional as possible.

Thanks a lot for your time!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion wasn't to retry a fetch but to check if the directory is present. if not, abort. I'm concerned about disrupting other engineers. BTW I'll be on PTO next week.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion, let postpone this pull request for Q1. Thanks!

src/task.c Outdated
@@ -1281,6 +1294,10 @@ task_handler (gpointer user_data)
break;
}
case TASK_NEXT:
if (!task->fetch_succeeded && task->abort_recipe_on_fail) {
Copy link
Contributor

@cbouchar cbouchar Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, it will proceed to get the next task but I believe your intent is to quit the current task after failing to perform the fetch operation for it. Putting this change here does not make sense. You should quit on the TASK that's failing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'd like to end on the aborted task. But I thought it should be handled by this:
g_set_error(&task->error, RESTRAINT_ERROR, RESTRAINT_TASK_RUNNER_ABORTED, "Aborted due to fetch failure");
But it just aborts curent task not the whole recipe. So this was the mechanism to abort all the following tasks manualy with coresponding log message.

Is there a mechanizm that could abort whole recipe? I tried to skip all the following tasks in a list in the recipe. But it did not work with Beaker properly. Beaker was stuck wating for results from those skiped tasks.

@cbouchar
Copy link
Contributor

cbouchar commented Aug 21, 2023

I'm concerned about your changes. You're injected another path for error handling which may circumvent existing path. FYI the documentation https://restraint.readthedocs.io/en/latest/results.html will be affected by your changes. Please take the time understand this doc since my suggestion is trying to work with how result handling is currently performed.

src/task.c Outdated
g_string_printf(message, "** Fetching task: %s [%s]\n", task->task_id, task->path);
app_data->fetch_retries = 0;
task->state = TASK_FETCH;
if(app_data->state == RECIPE_ABORT){
Copy link
Contributor

@cbouchar cbouchar Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need these changes with what I proposed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if the whole recipe abort would work than it wouldnt be necesary to have this.

src/recipe.h Outdated
@@ -37,6 +37,7 @@ typedef enum {
RECIPE_RUNNING,
RECIPE_FAIL,
RECIPE_COMPLETE,
RECIPE_ABORT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't need this

@cbouchar
Copy link
Contributor

cbouchar commented Aug 22, 2023

Proposal 2: I don't have your full picture of the problem you are trying to solve but I suspect this problem is with
one task that's giving you an issue with fetch retry. Instead of changing Restraint, couldn't you write a task following your task which checks whether your fetch operation is on disk. If not, you could run rstrnt-abort(). This seems like the most least invasive solution.
This is my preferred solution. Restraint should be riddled with customer corner cases. Doing so will break down the reliability of the code.
@janjurca Any reason you can't create a task to handle this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants