Skip to content

fix: Race condition when closing SDK #5523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

noahsmartin
Copy link
Contributor

We have a race condition in the SentryClient property of SentryHub that is causing flaky unit tests. When the tests are torn down we call bindClient:nil (from [SentrySDK close]) This sets _client to nil, but the client property can also be accessed by background threads (in particular from [SentryHub captureReplayEvent]). This can lead to messaging a dangling pointer. This fix makes the client property atomic, since it is accessed/read from multiple threads.

Fixes #5506

#skip-changelog

Copy link

codecov bot commented Jun 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.131%. Comparing base (409a607) to head (d352b0f).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##              main     #5523       +/-   ##
=============================================
- Coverage   86.140%   86.131%   -0.009%     
=============================================
  Files          407       406        -1     
  Lines        35067     34929      -138     
  Branches     15017     15025        +8     
=============================================
- Hits         30207     30085      -122     
+ Misses        4817      4799       -18     
- Partials        43        45        +2     
Files with missing lines Coverage Δ
Sources/Sentry/SentryHub.m 99.108% <100.000%> (-0.005%) ⬇️

... and 44 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 409a607...d352b0f. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

github-actions bot commented Jun 30, 2025

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1231.96 ms 1256.47 ms 24.51 ms
Size 23.74 KiB 874.15 KiB 850.41 KiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
b0e13a7 1227.71 ms 1245.88 ms 18.16 ms
018037b 1209.31 ms 1228.33 ms 19.03 ms
61414e8 1225.49 ms 1254.28 ms 28.79 ms
f92cfa9 1217.94 ms 1240.06 ms 22.12 ms
63ac649 1192.10 ms 1216.78 ms 24.68 ms
9be5373 1215.92 ms 1239.44 ms 23.52 ms
2609f7a 1218.17 ms 1241.34 ms 23.17 ms
db9572a 1200.27 ms 1234.80 ms 34.53 ms
f92cfa9 1228.45 ms 1251.33 ms 22.88 ms
fc0757d 1231.83 ms 1248.98 ms 17.15 ms

App size

Revision Plain With Sentry Diff
b0e13a7 23.75 KiB 860.98 KiB 837.23 KiB
018037b 23.75 KiB 867.16 KiB 843.41 KiB
61414e8 23.75 KiB 867.69 KiB 843.94 KiB
f92cfa9 23.75 KiB 855.38 KiB 831.63 KiB
63ac649 23.75 KiB 855.38 KiB 831.63 KiB
9be5373 23.75 KiB 866.50 KiB 842.75 KiB
2609f7a 23.75 KiB 867.04 KiB 843.29 KiB
db9572a 23.75 KiB 858.69 KiB 834.93 KiB
f92cfa9 23.75 KiB 855.38 KiB 831.62 KiB
fc0757d 23.75 KiB 850.73 KiB 826.98 KiB

Previous results on branch: fixFlakyUnitTestExit

Startup times

Revision Plain With Sentry Diff
104162b 1234.48 ms 1253.45 ms 18.97 ms
5d1ac9b 1205.76 ms 1245.63 ms 39.88 ms

App size

Revision Plain With Sentry Diff
104162b 23.74 KiB 874.16 KiB 850.42 KiB
5d1ac9b 23.75 KiB 866.76 KiB 843.01 KiB

Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this issue could also happen in prod, this PR should be a bugfix with the positive side effect of also fixing flaky tests. Furthermore, I think we should add some tests ensuring calling bindClient from a different thread while calling capture methods is thread safe.

@noahsmartin noahsmartin changed the title Fix flaky unit test exit fix: Race condition when closing SDK Jul 1, 2025
@noahsmartin noahsmartin force-pushed the fixFlakyUnitTestExit branch from e8a4e8b to 5698364 Compare July 2, 2025 18:18
@noahsmartin noahsmartin force-pushed the fixFlakyUnitTestExit branch from 5698364 to d352b0f Compare July 2, 2025 20:04
Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test, but it takes quite some time to run locally.

Comment on lines +27 to +30
wait(for: [warmupExpectation])
SentrySDK.currentHub().bindClient(nil)

wait(for: [exp])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h: Please always specify a timeout, cause otherwise this test could run for a long time in CI and also slow down our test suite.


final class SentrySDKThreadTests: XCTestCase {
func testRaceWhenBindingClient() {
for _ in 0..<10_000 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: This test requires over a minute or so to run on my local computer. This will significantly increase the test duration. We don't need to use SentrySDK.start , which can be a heavy operation with side effects. Instead, we should only interact with the hub, cause that's where the problem lies.

import XCTest

final class SentrySDKThreadTests: XCTestCase {
func testRaceWhenBindingClient() {
Copy link
Member

@philipphofmann philipphofmann Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I couldn't stop thinking about this test, and I played a round with it. Iff you change the atomic for the property in the client back to nonatomic this test should fail, which it didn't for me locally. So here's my proposal, which is a bit faster and leads to a crash when the client property is nonatomic. I think it would be great to add a comment explaining the problem it reproduces.

func testRaceWhenBindingClient() {

    let options = Options()
    let sut = SentryHub(client: SentryClient(options: options), andScope: nil)

    for _ in 0..<100 {

        let exp = expectation(description: "wait")
        exp.expectedFulfillmentCount = 100

        let queue = DispatchQueue(label: "com.sentry.test_client", qos: .userInteractive, attributes: [.concurrent, .initiallyInactive])

        for _ in 0..<100 {
            queue.async {
                sut.bindClient(SentryClient(options: options))
                sut.capture(message: "Test message")
                exp.fulfill()
            }
        }

        queue.activate()

        for _ in 0..<100 {
            sut.bindClient(nil)
        }

        wait(for: [exp], timeout: 1.0)
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky unit tests when no tests failed
2 participants