Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full locking to workflow.Context to work around shutdown races #1139

Closed
wants to merge 4 commits into from

Conversation

Groxx
Copy link
Contributor

@Groxx Groxx commented Oct 9, 2021

Regrettable but pretty simple to add.
When goroutines are shut down correctly, these locks can (and should) be removed.

.... and then I discovered that Close has synchronous side effects.
I'm not super confident that this is stable, I'm not familiar enough with all the code to know for sure :|


Unfortunately it looks like #1124 is not sufficient in practice, there are other races in our users' workflows :
This is a (nearly) direct mimicry of go's context.go locking in 1.13 (our oldest supported version), but beyond that I have not really thought about the details. It fixes the new test's new race though, at least.

Regrettable but pretty simple to add.
When goroutines are shut down correctly, these locks can (and should) be removed.
@Groxx Groxx requested a review from a team October 9, 2021 04:10
@coveralls
Copy link

Pull Request Test Coverage Report for Build 957cbed2-c422-42c6-bf41-543d48cd82e2

  • 13 of 14 (92.86%) changed or added relevant lines in 1 file are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.04%) to 63.171%

Changes Missing Coverage Covered Lines Changed/Added Lines %
internal/context.go 13 14 92.86%
Files with Coverage Reduction New Missed Lines %
internal/internal_task_pollers.go 2 80.67%
Totals Coverage Status
Change from base Build fc712f9a-e71b-4295-8d09-ed2a531ba0d8: 0.04%
Covered Lines: 12180
Relevant Lines: 19281

💛 - Coveralls

@Groxx
Copy link
Contributor Author

Groxx commented Oct 9, 2021

:\ no, I think this may change execution order, given the order of propagation. gonna try something else.

@Groxx Groxx closed this Oct 9, 2021
@demirkayaender
Copy link
Contributor

Were you able to repro the issue in your test?

@Groxx
Copy link
Contributor Author

Groxx commented Oct 11, 2021

@demirkayaender not the one I was aiming for, unfortunately. I found yet another instead :|

❯ go test -test.run TestContext_RaceRegression_2 -count 1000 ./internal
fatal error: concurrent map writes

goroutine 648 [running]:
runtime.throw({0x1aa1d1f, 0x1000000017c9779})
	/usr/local/Cellar/go/1.17.1/libexec/src/runtime/panic.go:1198 +0x71 fp=0xc000071a58 sp=0xc000071a28 pc=0x1036e91
runtime.mapassign(0xc0005043c0, 0xc00048e230, 0x0)
	/usr/local/Cellar/go/1.17.1/libexec/src/runtime/map.go:585 +0x4d6 fp=0xc000071ad8 sp=0xc000071a58 pc=0x10105d6
go.uber.org/cadence/internal.propagateCancel({0x1bd7768, 0xc0005043c0}, {0x1bd0620, 0xc00048e230})
	/Users/stevenl/gocode/src/go.uber.org/cadence/internal/context.go:233 +0x11c fp=0xc000071b30 sp=0xc000071ad8 pc=0x169d1dc
go.uber.org/cadence/internal.WithCancel({0x1bd7768, 0xc0005043c0})
	/Users/stevenl/gocode/src/go.uber.org/cadence/internal/context.go:194 +0xbc fp=0xc000071b80 sp=0xc000071b30 pc=0x169ce5c
go.uber.org/cadence/internal.TestContext_RaceRegression_2.func1.1.1()
	/Users/stevenl/gocode/src/go.uber.org/cadence/internal/context_test.go:76 +0x2a fp=0xc000071bb0 sp=0xc000071b80 pc=0x17c0daa
runtime.deferCallSave(0xc000071c78, 0xc000071ef8)
	/usr/local/Cellar/go/1.17.1/libexec/src/runtime/panic.go:950 +0x82 fp=0xc000071bc0 sp=0xc000071bb0 pc=0x1036622
runtime.runOpenDeferFrame(0x100000001006e2d, 0xc00059ee10)
	/usr/local/Cellar/go/1.17.1/libexec/src/runtime/panic.go:889 +0x27b fp=0xc000071c40 sp=0xc000071bc0 pc=0x1035fbb
runtime.Goexit()

I haven't yet reproduced their mapiternext crash.

demirkayaender added a commit to demirkayaender/cadence-client that referenced this pull request Oct 11, 2021
demirkayaender added a commit to demirkayaender/cadence-client that referenced this pull request Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants