Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete global refs in an attached thread #150

Merged
merged 11 commits into from
Sep 1, 2020

Conversation

facundominguez
Copy link
Member

No description provided.

Comment on lines 560 to 561
fptr <- newForeignPtr ptr $ finalize jobj ptr
`finally` deleteGlobalRefNonFinalized jobj
`finally` runInAttachedThread (deleteGlobalRefNonFinalized jobj)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is well spotted. Note though, that finalize jobj ptr might do jni calls, so it needs to be wrapped as well.

newForeignPtr ptr $ runInAttachedThread $
  finalize jobj ptr `finally` deleteGlobalRefNonFinalized jobj

Also, this finalizer is only used for vectors. The finalizer of global refs in Foreign/JNI/Unsafe.hs hasn't been updated.

return ()
, bench "delete global reference (no finalizer, safe)" $ nfIO $ do
ref <- newGlobalRefNonFinalized jobj
_ <- runInAttachedThread (deleteGlobalRefNonFinalized ref)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runInAttachedThread does not attach the thread if the thread is already attached. You would have to try running this benchmark on a different thread.

, bench "delete global reference (not attached)" $ nfIO $ forkWait $ do
ref <- newGlobalRefNonFinalized jobj
_ <- deleteGlobalRefNonFinalized ref
return ()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this code to fail to run, as it is running in a thread which is not attached. It doesn't look like forkWait would propagate any failures so the benchmark would silently pass.

Comment on lines 53 to 58
-- | Execute the given IO in a separate thread, then wait for it to finish
forkWait :: IO a -> IO ()
forkWait io = do
handle <- newEmptyMVar
_ <- forkFinally (runInBoundThread io) (\_ -> putMVar handle ())
takeMVar handle >>= return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what the reason for spawning a thread then immediately waiting for it would be. I'd imagine this has exactly the same semantics as just doing io. But probably I'm missing something.

At any rate, you shouldn't implement this sort of logic yourself. The async implements high level primitives which do the MVar thing, and error handling, and such. You should reach for these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At any rate, you shouldn't implement this sort of logic yourself. The async implements high level primitives which do the MVar thing, and error handling, and such. You should reach for these.

That's good advice in general. In this particular case, I think it is better to not spawn any threads. The need to spawn the thread comes from trying to measure how much time attaching and detaching the thread takes. The main thread is already attached to the JVM so we can't measure it there.

I think the best solution is to have the benchmark setup detach the main thread before running the benchmark.

@nbacquey
Copy link
Member

nbacquey commented Jul 31, 2020

The naive solution of attaching a thread to the JVM before calling deleteGlobalRefNonFinalized, then detaching it afterward, has been rejecting due to adding too much overhead:

benchmarking References/delete global reference (attached)
time                 133.8 ns   (130.8 ns .. 139.3 ns)
                     0.947 R²   (0.898 R² .. 0.982 R²)
mean                 160.6 ns   (145.8 ns .. 184.8 ns)
std dev              60.46 ns   (42.23 ns .. 88.69 ns)
variance introduced by outliers: 100% (severely inflated)

benchmarking References/delete global reference (not attached)
time                 24.47 μs   (23.54 μs .. 25.75 μs)
                     0.953 R²   (0.913 R² .. 0.981 R²)
mean                 31.88 μs   (28.92 μs .. 37.38 μs)
std dev              13.50 μs   (6.982 μs .. 20.83 μs)
variance introduced by outliers: 99% (severely inflated)

The extra cost of 25 μs led us to try to have a thread, attached to the JVM startup, which would be dedicated to deleting global references. A preliminary, still nonfunctional version, is currently on commit 8c4a7f4

Copy link
Member Author

@facundominguez facundominguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments are for polishing, and some are for correctness. The polishing comments can be deferred until we are happy with the benchmarks.

jni/src/common/Foreign/JNI/Unsafe.hs Outdated Show resolved Hide resolved
jni/src/common/Foreign/JNI/Unsafe.hs Outdated Show resolved Hide resolved
jni/src/common/Foreign/JNI/Unsafe.hs Outdated Show resolved Hide resolved
jni/src/common/Foreign/JNI/Unsafe.hs Outdated Show resolved Hide resolved
jni/src/common/Foreign/JNI/Unsafe.hs Outdated Show resolved Hide resolved
stopGlobalRefCleaner :: GlobalRefCleaner -> IO ()
stopGlobalRefCleaner cleaner = do
putMVar (nextAction cleaner) Terminate
wait (cleanerAsync cleaner)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would just cancel the thread here. Is it necessary to delete the pending global references before terminating the JVM?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, probably we should use uninterruptibleMask_ to make this code safe to asynchronous exceptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not necessary to delete pending global references before terminating the JVM.
In any case, we cannot guarantee that all pending references will be deleted in the current state of the code anyway.

Shall we keep the CleanerAction type then ? It looks like we won't need Terminate

@nbacquey
Copy link
Member

nbacquey commented Aug 21, 2020

Here are the results of the benchmarks:

benchmarking References/delete global reference (attached)                                      
time                 5.822 μs   (704.3 ns .. 12.74 μs)                                          
                     0.066 R²   (0.002 R² .. 0.209 R²)                                          
mean                 150.9 μs   (70.16 μs .. 385.0 μs)                                          
std dev              176.3 μs   (58.25 μs .. 257.1 μs)                                          
variance introduced by outliers: 99% (severely inflated)                                        
                                                                                                
benchmarking References/delete global reference (not attached)                                  
time                 155.5 μs   (153.9 μs .. 157.0 μs)                                          
                     0.997 R²   (0.996 R² .. 0.998 R²)                                          
mean                 154.4 μs   (152.2 μs .. 156.3 μs)                                          
std dev              6.982 μs   (5.515 μs .. 10.29 μs)                                          
variance introduced by outliers: 45% (moderately inflated)                                      
                                                                                                
benchmarking References/delete global reference (separate thread)                               
time                 305.5 ns   (299.5 ns .. 312.2 ns)                                          
                     0.993 R²   (0.988 R² .. 0.996 R²)                                          
mean                 311.7 ns   (300.1 ns .. 342.1 ns)                                          
std dev              57.76 ns   (22.54 ns .. 115.9 ns)                                          
variance introduced by outliers: 97% (severely inflated)

It looks to me that having a separate thread do the deletion is the way to go.

@nbacquey
Copy link
Member

Here are the results of the benchmarks:

benchmarking References/delete global reference (attached)                                      
time                 5.822 μs   (704.3 ns .. 12.74 μs)                                          
                     0.066 R²   (0.002 R² .. 0.209 R²)                                          
mean                 150.9 μs   (70.16 μs .. 385.0 μs)                                          
std dev              176.3 μs   (58.25 μs .. 257.1 μs)                                          
variance introduced by outliers: 99% (severely inflated)                                        
                                                                                                
benchmarking References/delete global reference (not attached)                                  
time                 155.5 μs   (153.9 μs .. 157.0 μs)                                          
                     0.997 R²   (0.996 R² .. 0.998 R²)                                          
mean                 154.4 μs   (152.2 μs .. 156.3 μs)                                          
std dev              6.982 μs   (5.515 μs .. 10.29 μs)                                          
variance introduced by outliers: 45% (moderately inflated)                                      
                                                                                                
benchmarking References/delete global reference (separate thread)                               
time                 305.5 ns   (299.5 ns .. 312.2 ns)                                          
                     0.993 R²   (0.988 R² .. 0.996 R²)                                          
mean                 311.7 ns   (300.1 ns .. 342.1 ns)                                          
std dev              57.76 ns   (22.54 ns .. 115.9 ns)                                          
variance introduced by outliers: 97% (severely inflated)

It looks to me that having a separate thread do the deletion is the way to go.

Note that the results of the third benchmark do not measure the time needed to delete a reference, but only the overhead of sending the references to a dedicated thread, which does the actual deleting.
It still is a way more efficient process than attaching the thread to the jvm, then detaching it (+5% vs. +3000%).

@nbacquey
Copy link
Member

commit b250314 fix: use main thread class loader when attaching thread as daemon
is not necessarily related to this PR, or the underlying issue, but solves a bug that was dormant until now.
It may be relevant to move it to its own PR instead

@@ -376,6 +380,7 @@ attachCurrentThreadAsDaemon = do
[CU.exp| jint {
(*$(JavaVM* jvm))->AttachCurrentThreadAsDaemon($(JavaVM* jvm), (void**)&jniEnv, NULL)
} |]
readIORef mainThreadClassLoader >>= setCurrentThreadClassLoader
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mboes, I'm struggling with a design choice here. With the change to delete global references in a background thread we need to use the threaded runtime, and tests in jvm-streaming will fail in the threaded runtime because we need to set the context ClassLoader on some of the OS threads that we use.

I proposed to @nbacquey to set it here, but I'm noticing that this runs against the principle of keeping the bindings in jni simple.

Maybe the best place to put it would be in a new convenience package jvm-threads, where we can provide a version of runInAttachedThread and withJVM that deal with setting context class loaders for the user. Otherwise, the user would need to be responsible for doing this in each application.

Any thoughts?

@facundominguez
Copy link
Member Author

I squashed commits. The old tip of the branch is 42bc342.

I undid the provision to set the context class loader when attaching threads. I'm using loadJavaWrappers in test to load the classes ahead of calling JNI in other threads. Also, I added a troubleshooting section to the readme, to discuss this and other problems.

@facundominguez facundominguez marked this pull request as ready for review September 1, 2020 18:06
@facundominguez facundominguez merged commit 35f8c3d into master Sep 1, 2020
@facundominguez facundominguez deleted the safe-delete-global-ref branch September 1, 2020 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants