Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TestLogger thread-safe (introduce a lock) #54497

Merged
merged 4 commits into from
May 22, 2024
Merged

Conversation

NHDaly
Copy link
Member

@NHDaly NHDaly commented May 16, 2024

Fixes #54439.

  • Lock around concurrent accesses to .logs, .message_limits, and .shouldlog_args
  • Copy the vector out of the logger at the end, to shield against dangling Tasks.

Before:

julia> Threads.nthreads()
8

julia> function foo(n)
           @info "Doing foo with n=$n"
           @sync for i=1:n
               Threads.@spawn @info "Iteration $i"
           end
           42
       end
foo (generic function with 1 method)

julia> for _ in 1:1000
           @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000)
       end
julia+RAI(56155,0x1f5157ac0) malloc: double free for ptr 0x128248000
julia+RAI(56155,0x1f5157ac0) malloc: *** set a breakpoint in malloc_error_break to debug

[56155] signal (6): Abort trap: 6
in expression starting at REPL[8]:1

signal (6) thread (1) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 54370881 (Pool: 54363911; Big: 6970); GC: 119
[1]    56154 abort      julia -tauto

After:

julia> Threads.nthreads()
8

julia> function foo(n)
           @info "Doing foo with n=$n"
           @sync for i=1:n
               Threads.@spawn @info "Iteration $i"
           end
           42
       end
foo (generic function with 1 method)

julia> for _ in 1:1000
           @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000)
       end

(no crash) :)

- Lock around concurrent accesses to .logs and .message_limits
- Copy the vector out of the logger at the end, to shield against
  dangling Tasks.
@NHDaly NHDaly added domain:multithreading Base.Threads and related functionality testsystem The unit testing framework and Test stdlib labels May 16, 2024
Copy link

@charnik charnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NHDaly !

min_level::LogLevel
catch_exceptions::Bool
shouldlog_args
message_limits::Dict{Any,Int}
shouldlog_args # Guarded by lock.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, I just realized that this field is also mutated in Logging.shouldlog. I've added a lock around it, but i'm not sure that it makes logical sense, since another thread may overwrite the value right after it's set...

@c42f: Is there any chance you remember what this field is used for? I don't see that it's used anywhere. I think this is the only reason the struct is mutable, is so that we can set that from shouldlog, but I don't see why it's being set.

It's tested here:

julia/test/corelogging.jl

Lines 130 to 134 in 449c7a2

# Test consistency with shouldlog() function arguments
@test record.level == logger.shouldlog_args[1]
@test record._module == logger.shouldlog_args[2]
@test record.group == logger.shouldlog_args[3]
@test record.id == logger.shouldlog_args[4]

But that test doesn't really explain it either. It seems like it would only be retaining the shouldlog args for the last invocation.... So what should this even mean with multiple threads logging?

Could we remove this field entirely?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I have just added a comment that this field is not logically thread-safe, which is no worse than how things were before this PR, while the rest of the data-races are fixed.

So this PR is a strict improvement.

@jishnub
Copy link
Contributor

jishnub commented May 22, 2024

Should this be backported?

@NHDaly NHDaly added backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 backport 1.6 Change should be backported to release-1.6 labels May 22, 2024
@NHDaly
Copy link
Member Author

NHDaly commented May 22, 2024

Okay, I plan to merge this when tests pass.

As of now, this PR is a strict improvement. Perhaps in the future, if we determine that the shouldlog_args field is unused and could be removed, we could mark the entire struct immutable instead of mutable, which would let us shrink the locked regions. But that is a pretty small optimization at this point and most likely isn't super important.

Thanks!

@NHDaly NHDaly merged commit 0437210 into master May 22, 2024
8 checks passed
@NHDaly NHDaly deleted the nhd-test-logs-thread-safe branch May 22, 2024 18:24
KristofferC pushed a commit that referenced this pull request May 27, 2024
Fixes #54439.

- Lock around concurrent accesses to .logs, .message_limits, and
   .shouldlog_args.
- Copy the vector out of the logger at the end, to shield against
   dangling Tasks.

Before:
```julia
julia> Threads.nthreads()
8

julia> function foo(n)
           @info "Doing foo with n=$n"
           @sync for i=1:n
               Threads.@Spawn @info "Iteration $i"
           end
           42
       end
foo (generic function with 1 method)

julia> for _ in 1:1000
           @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000)
       end
julia+RAI(56155,0x1f5157ac0) malloc: double free for ptr 0x128248000
julia+RAI(56155,0x1f5157ac0) malloc: *** set a breakpoint in malloc_error_break to debug

[56155] signal (6): Abort trap: 6
in expression starting at REPL[8]:1

signal (6) thread (1) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 54370881 (Pool: 54363911; Big: 6970); GC: 119
[1]    56154 abort      julia -tauto
```
After:
```julia
julia> Threads.nthreads()
8

julia> function foo(n)
           @info "Doing foo with n=$n"
           @sync for i=1:n
               Threads.@Spawn @info "Iteration $i"
           end
           42
       end
foo (generic function with 1 method)

julia> for _ in 1:1000
           @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000)
       end
```
(no crash) :)

(cherry picked from commit 0437210)
@KristofferC KristofferC mentioned this pull request May 27, 2024
23 tasks
KristofferC pushed a commit that referenced this pull request May 27, 2024
Fixes #54439.

- Lock around concurrent accesses to .logs, .message_limits, and
   .shouldlog_args.
- Copy the vector out of the logger at the end, to shield against
   dangling Tasks.

Before:
```julia
julia> Threads.nthreads()
8

julia> function foo(n)
           @info "Doing foo with n=$n"
           @sync for i=1:n
               Threads.@Spawn @info "Iteration $i"
           end
           42
       end
foo (generic function with 1 method)

julia> for _ in 1:1000
           @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000)
       end
julia+RAI(56155,0x1f5157ac0) malloc: double free for ptr 0x128248000
julia+RAI(56155,0x1f5157ac0) malloc: *** set a breakpoint in malloc_error_break to debug

[56155] signal (6): Abort trap: 6
in expression starting at REPL[8]:1

signal (6) thread (1) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 54370881 (Pool: 54363911; Big: 6970); GC: 119
[1]    56154 abort      julia -tauto
```
After:
```julia
julia> Threads.nthreads()
8

julia> function foo(n)
           @info "Doing foo with n=$n"
           @sync for i=1:n
               Threads.@Spawn @info "Iteration $i"
           end
           42
       end
foo (generic function with 1 method)

julia> for _ in 1:1000
           @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000)
       end
```
(no crash) :)

(cherry picked from commit 0437210)
@KristofferC KristofferC removed the backport 1.11 Change should be backported to release-1.11 label May 28, 2024
KristofferC added a commit that referenced this pull request May 30, 2024
Backported PRs:
- [x] #54010 <!-- Overload `Base.literal_pow` for `AbstractQ` -->
- [x] #54143 <!-- Fix `make install` from tarballs -->
- [x] #54151 <!-- LinearAlgebra: Correct zero element in
`_generic_matvecmul!` for block adj/trans -->
- [x] #54233 <!-- set MAX_OS_WRITE on unix -->
- [x] #54251 <!-- fix typo in gc_mark_memory8 when chunking a large
array -->
- [x] #54363 <!-- typeintersect: fix another stack overflow caused by
circular constraints -->
- [x] #54497 <!-- Make TestLogger thread-safe (introduce a lock) -->
- [x] #53796 <!-- Add a missing doc -->
- [x] #54465 <!-- typeintersect: conservative typevar subtitution during
`finish_unionall` -->
- [x] #54514 <!-- typeintersect: followup cleanup for the nothrow path
of type instantiation -->

Need manual backport:
- [ ] #52505 <!-- fix alignment of emit_unbox_store copy -->
- [ ] #53373 <!-- fix sysimage-native-code=no option with pkgimages -->
- [ ] #53815 <!-- create phantom task for GC threads -->
- [ ] #53984 <!-- Profile: fix heap snapshot is valid char check -->
- [ ] #54276 <!-- Fix solve for complex `Hermitian` with non-vanishing
imaginary part on diagonal -->

Contains multiple commits, manual intervention needed:
- [ ] #52854 <!-- Change to streaming out the heap snapshot data -->
- [ ] #53218 <!-- Fix interpreter_exec.jl test -->
- [ ] #53833 <!-- Profile: make heap snapshots viewable in vscode viewer
-->
- [ ] #54303 <!-- LinearAlgebra: improve type-inference in
Symmetric/Hermitian matmul -->
- [ ] #52694 <!-- Reinstate similar for AbstractQ for backward
compatibility -->

Non-merged PRs with backport label:
- [ ] #54471 <!-- Actually setup jit targets when compiling
packageimages instead of targeting only one -->
- [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
@KristofferC KristofferC removed the backport 1.10 Change should be backported to the 1.10 release label May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.6 Change should be backported to release-1.6 domain:multithreading Base.Threads and related functionality testsystem The unit testing framework and Test stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race Condition: Test.@test_logs is not thread safe.
4 participants