Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix races in GetVariableValue and login #24599

Merged
merged 4 commits into from
Apr 8, 2020
Merged

Fix races in GetVariableValue and login #24599

merged 4 commits into from
Apr 8, 2020

Conversation

jbardin
Copy link
Member

@jbardin jbardin commented Apr 8, 2020

Fix a few races that had crept into the codebase. Now that we've moved to a new CI system, we can probably start testing for these automatically, but there will still be some test fixtures that need updating.

  • GetVariableValue was missing the variable mutex calls.
  • The login command had a race around a closed-over err value.
  • The testHook mock wasn't synchronized in the tests.
  • Missing call to wg.Wait() caused a race in the deferred cleanup function.

Fixes #24589

@jbardin jbardin requested a review from a team April 8, 2020 15:40
@ghost ghost added the sdkv1 [PRs only] Marks changes that may potentially need to be ported to the plugi nSDK label Apr 8, 2020
@codecov
Copy link

codecov bot commented Apr 8, 2020

Codecov Report

Merging #24599 into master will increase coverage by 0.00%.
The diff coverage is 75.00%.

Impacted Files Coverage Δ
command/login.go 48.27% <50.00%> (-0.13%) ⬇️
terraform/eval_context_builtin.go 77.35% <100.00%> (+0.28%) ⬆️
dag/marshal.go 53.40% <0.00%> (+1.13%) ⬆️

@jbardin jbardin changed the title Fix race in GetVariableValue and login Fix races in GetVariableValue and login Apr 8, 2020
Copy link
Contributor

@pselle pselle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I wouldn't object to another viewer

@corentone
Copy link

Thank you @jbardin !

For my education, --if you have time-- could you describe roughly how you were able to reproduce/identify the issue? I'm not too familiar with the TF code so it was tricky for me ( I swear I tried!)
I was looking at the DAG so I was pretty off from where you found the missing lock. The stacktrace, being in GetVariableValue, is not found again in the stacktrace but I only found in two other goroutines in the stacktrace that are in the Eval areas (but not explicitely GetVariableValue), so that would likely be them?

@jbardin
Copy link
Member Author

jbardin commented Apr 8, 2020

No problem @corentone,

The most relevant portion of a stack trace is always the first stack. Since this is a fatal error and not a normal panic, it's coming out of the runtime and we don't have the corresponding read/write locations like we would from the race-detector, but it will show a map access somewhere. If you step back through the frames of that first stack, the first instance of terraform code you encounter is the GetVariableValue call. From here it was a quick diagnosis, since the mutex for the variables map was missing from that method. (If that method was not supposed to be safe for concurrent access in the first place, that would be a larger issue to diagnose)

I actually don't have a test fixture to trigger this at the moment, as any test configuration I have doesn't end up triggering the race. I have an idea to create a larger config that will, but since this bug is simple to visually inspect we can leave that for later, and try to fish out more of these in the long term with some fuzzing.

Thanks for reporting it!

@jbardin jbardin merged commit 4f7d309 into master Apr 8, 2020
@jbardin jbardin deleted the jbardin/races branch April 8, 2020 21:13
@Ninir
Copy link
Contributor

Ninir commented May 5, 2020

@jbardin Do you think it would be posslbe to backport that to a patch version for the TF 0.12 branch somehow?

I've heard reports of this impacting some people, and this is biting us also.

Edit: It's done as per the commit below 🙇‍♂️

pselle pushed a commit that referenced this pull request May 5, 2020
[Cherry-pick]  Fix races in GetVariableValue and login #24599
@ghost
Copy link

ghost commented May 9, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators May 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
sdkv1 [PRs only] Marks changes that may potentially need to be ported to the plugi nSDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concurrent map read and map write in GetVariableValue
4 participants