-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend task_fetch retries #293
Extend task_fetch retries #293
Conversation
More beaker jobs are now relying on fetching tasks from git repos, and intermitted connectivity issue are causing unnecessary problems when restraint only tries a couple of times in half a minute. It's a much bigger issue when core tasks like kernelinstall was aborted for that reason. The remaining tests in the recipe become pointless and even misleading due to false positive/negative from wrong kernel. This minor change is increasing the number and interval of task_fetch retries - to make future jobs more tolerant with intermitted connectivity issue.
Hi @lulinqing, can we let user specify env |
Not sure how this works in Beaker job definition, but I’m okay to
parameterize them. Sounds a nice feature.
Meanwhile I’d still like to have the default number/interval increased asap
- for reasons described earlier. And this should not conflict with your
future plan.
In case helpful, Jeff has more details from our conversation with Jirka.
Thanks!
On Wed, Apr 19, 2023 at 22:44 Vector Li ***@***.***> wrote:
This minor change is increasing the number and interval of task_fetch
retries - to make future jobs more tolerant with intermitted connectivity
issue ...
Hi @lulinqing <https://github.com/lulinqing>, can we let user specify env
TASK_FETCH_RETRIES and get its value from restraint client? If it needs
more effort, your patch with increasing macro TASK_FETCH_RETRIES and
TASK_FETCH_INTERVAL looks good me :-)
—
Reply to this email directly, view it on GitHub
<#293 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGBOQD6TJH2SYJ6KPZZBBTXCCPJ5ANCNFSM6AAAAAAXE2SPTY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Linqing
|
I don't think a task parameter (i.e., environment variable) will work here: We could, however, consider adding an environment variable to |
That's more tangible~ |
so that we can implement the Restraint RFE[1] in more elegant way: by add optional attribute in fetch element: <fetch url="http://my.download.host/path" retry="8" /> ref[1]: restraint-harness/restraint#293
so that we can implement the Restraint RFE[1] in more elegant way: by add optional attribute in fetch element: <fetch url="http://my.download.host/path" options="retry=8 timeo=8" /> ref[1]: restraint-harness/restraint#293
so that we can implement the Restraint RFE[1] in more elegant way: by add optional attribute in fetch element: <fetch url="http://my.download.host/path" options="retry=8 timeo=8" /> ref[1]: restraint-harness/restraint#293
I was made aware this week that internal Gerrit mirror sites which support git:// protocol will be gone soon (with Gerrit itself), just like they did for dist-git earlier. While we are pulling together a restraint volunteer group to properly build/test/validate other more sophisticated proposals like #295 , can we have some quick review/approval on this 2-liner patch to improve fault tolerance over networking issues?
@StykMartin @cbouchar @p3ck @jbastian |
More beaker jobs are now relying on fetching tasks from git repos, and intermitted connectivity issue are causing unnecessary problems when restraint only tries a couple of times in half a minute.
It's a much bigger issue when core tasks like kernelinstall was aborted for that reason. The remaining tests in the recipe become pointless and even misleading due to false positive/negative from wrong kernel.
This minor change is increasing the number and interval of task_fetch retries - to make future jobs more tolerant with intermitted connectivity issue, and mitigate false negative before we have a better solution for #288 .
(BTW I'm okay with larger numbers, even if it may trigger watchdog timeout and abort the whole recipeset - which works in our favor.)