Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue where new task was not returned in sticky-disabled scenarios #589

Merged
merged 3 commits into from
Oct 18, 2021

Conversation

cretz
Copy link
Member

@cretz cretz commented Oct 14, 2021

What was changed

Allow ReturnNewWorkflowTask to be true in RespondWorkflowTaskCompletedRequest even in sticky-disabled scenarios (e.g. integration tests with WORKFLOW_CACHE_SIZE=0) when the force flag is set.

Why?

Currently this single test w/ no-cache fails on master:

WORKFLOW_CACHE_SIZE=0 go test -v --count 1 ./test -run 'TestIntegrationSuite/TestLocalActivityRetryBehavior$'

However with cache or when run as a full suite (i.e. make integration-test-zero-cache) it succeeds presumably due to other tests' side effects. In #579 there was a considerable speedup to integration tests (speeding up the stopping of a worker) which then showed this single test failure. The activity attempt count was getting reset because we were always setting ReturnNewWorkflowTask as false in no-cache situations.

EDIT: Forgot to mention, thanks to @Sushisource for helping me find the issue and suggest a solution.

Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense ofc since I wrote it, but let's get someone else to take a peek.

@yiminc Does this make sense to you?

// If we are forcing creation of a new workflow task because we have a long running local activity,
// it's desirable to also ensure we get that new task returned to ourselves immediately, otherwise
// if we aren't caching the workflow we can end up forgetting the number of attempts on said local
// activity.
request.ReturnNewWorkflowTask = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I think even when we disable sticky cache, we should still be able to handle new workflow task, meaning this value should always be true. Could you try to see if that works?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have confirmed that if we just remove this entire else block (thereby always leaving it true if it was there), it does pass all tests in all scenarios and does properly return the new task. @Sushisource - thoughts on removing this else block entirely?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll still work. My thought here is that if we have sticky cached disabled it is better to allow load to be distributed to other workers which may have it enabled, or not, but at least then this worker isn't only running it's own tasks, but still incurring full replay cost every time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The ReturnNewWorkflowTask only apply if there is a task for this current workflow. This not very common.
  2. It is not likely to have some worker disable sticky, but others enable sticky.
    The perf difference would be very minimal with or without this flag.
    The benefit of removing this else is to simplify code logic. So reader don't even need to know all these tedious details that does not really matters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this conditional altogether.

@cretz cretz merged commit 6bbdd9e into temporalio:master Oct 18, 2021
@cretz cretz deleted the no-cache-retry-fail branch October 18, 2021 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants