Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FLS detach callback as a thread termination notification. Another try. #112809

Merged
merged 20 commits into from
Feb 25, 2025

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Feb 22, 2025

Fixes: #110350

Switches the mechanism for OS notification about thread termination to use FLS (Fiber Local Storage) - the same mechanism as used on NativeAOT.

The advantage of FLS notification is that it does not run while holding the Loader Lock, thus it does not require that the termination callback never calls or waits for an OS call that may need to acquire that same Loader Lock. (It is hard for us to guarantee the latter)

This PR is similar to #110589 with additional mitigation for scenario where COM thread cleanups reinitialize the Thread object after we had our last chance to clean it up.
The mitigation is basically ensuring that COM is initialized and sets up its FLS slot before we set up ours.

@Copilot Copilot bot review requested due to automatic review settings February 22, 2025 01:55

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

@VSadov
Copy link
Member Author

VSadov commented Feb 22, 2025

@jkotas @mangod9 This is basically the same PR as before + mitigation for the COM reinitializing/attaching Thread objects or even running managed code after we had our last chance to detach Threads.

The idea is to call CoInitialize before setting up our FLS slot, which would make COM to set up its FLS slot, if not already initialized.

We call CoInitialize in most cases anyways, and fairly early, for example once the first method starts JIT-ing we launch the tiering helper, which is explicitly an MTA thread. Most user-launched threads are MTA as well. Alas, all that would be after we have our FLS slot set.

Why on the finalizer thread and not on the thread that initializes EE?
My concern is that we do not know the context of the thread that calls into EE initialization. In theory CoInitialize may be not allowed on some threads and initialize nothing. We control the state of the Finalizer thread though.

@VSadov
Copy link
Member Author

VSadov commented Feb 22, 2025

I have run quite a bit of winform tests with this + there is now a failfast in case we see an orphaned Thread in DLL_THREAD_DETACH, which is more deterministic than crashing on GC holes.

@jkotas
Copy link
Member

jkotas commented Feb 22, 2025

System.Threading.Tests.EtwTests.WaitHandleWaitEventTest failure looks related to this change.

Comment on lines 1416 to 1421
if (!IsFinalizerThread())
{
FinalizerThread::GetFinalizerThread()->SetRequiresCoInitialize();
// set the finalizer event
FinalizerThread::EnableFinalization();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this trying to do? Is it still needed now that we wait for the finalizer thread to CoInitialize COM during startup?

Copy link
Member Author

@VSadov VSadov Feb 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the stuff that tries to CoInitialize the current and Finalizer threads lazily can be removed if this change goes in.
I think quite a bit of cleanup might be possible even without this change.
Compared to NativeAOT there is a lot of extra complexity in Thread init/deinit. Some of that may still be needed (i.e. debugger/COM/syncblocks/threadstatics related), but some of that seems to be leftovers since we had hosting/fibers/appdomains and the like.

I did not want to go too far beyond what is needed for the fix - jut to minimize the diff, in case we backport this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is a lot of cruft that can be cleaned up that is better left for a different PR.

I am not able to reason about whether this if (!IsFinalizerThread()) { ... } block is just a useless code or whether it can be actively harmful in some corner cases.

For this PR, can we delete the if (!IsFinalizerThread()) block, fCoInitCurrentThread argument for EnsureComStarted and the one call to EnsureComStarted(FALSE)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added g_pFinalizerThread->SetApartmentOfUnstartedThread(Thread::AS_InMTA) to match manual call into CoInitialize, but that results in "lazy coinitialize" pattern forcing finalizer to set itself to MTA in response to setting finalizer thread to MTA.

I think I can remove both this change and g_pFinalizerThread->SetApartmentOfUnstartedThread(Thread::AS_InMTA) to raise fewer questions.
It is ok to use native API to set apartment state before Thread is attached. The rest is unnecessary and perhaps confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about performance overhead of this change. We are going to do extra thread switch during startup critical path. It is unavoidable.

However, EnsureComStarted that is going to be executed a bit later is going to spin the finalizer thread again by setting FinalizerThread::EnableFinalization.

Can we reduce the startup overhead by setting g_fComStarted on finalizer thread once we call CoInitialize and reduce EnsureComStarted to just:

    _ASSERTE(g_fComStarted); // COM is expected to be started on finalizer thread during startup

?

Copy link
Member Author

@VSadov VSadov Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about performance overhead of this change. We are going to do extra thread switch during startup critical path. It is unavoidable.

I've moved launching of finalizer thread a bit earlier. EE Initializing thread will do some other work before it needs to check if FLS/COM initialization is finished. Hopefully it will be at least partially done, so we'd not need to wait much, if at all.

Can we reduce the startup overhead by setting g_fComStarted on finalizer thread

Good point. Since we will be initializing anyways, why not get some benefit from it.

@jkotas jkotas requested a review from kouvel February 22, 2025 12:52
@VSadov
Copy link
Member Author

VSadov commented Feb 22, 2025

System.Threading.Tests.EtwTests.WaitHandleWaitEventTest failure looks related to this change.

That is on alpine-3.21-helix-arm32v7. I did not realize we run ceemain/DLL_THREAD_DETACH on linux as well, nor I am certain if the same invariant applies in such case.

It may be worth to investigate and add similar guards on other platforms (unix, NativeAOT), if possible and appropriate, but it should not be in this PR.

I will scope the failfast to Windows only.

_ASSERTE_ALL_BUILDS(!"Detaching a thread from the wrong fiber");
}

flsState = FLS_STATE_CLEAR;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? I do not think that we can handle re-initialization of the Thread after this point gracefully.

Copy link
Member Author

@VSadov VSadov Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens in winforms. We currently rely on re-initialization in COM cleanup scenarios -

A thread returns from its entry point and we call DestroyThread, which comes here and detaches/destroys the Thread.

Then OS goes through FLS callbacks and COM callback may need to run something that needs Thread, so we attach a new instance of Thread (because the former is gone by then), and the new Thread gets later deinitialized in our FLS callback.
As long as our deinitialization happens last, things work ok. Seems like it has been like this for a long time.

Ideally, we should not call DestroyThread explicitly when thread exits, just rely on callback - so we would never need to reinitialize. Naively, making such change breaks other things, so probably not in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I agree that we want to fix this one in separate PR. The reinitialization sounds like a great source of mysterious corner-case bugs.

@VSadov
Copy link
Member Author

VSadov commented Feb 25, 2025

I think the new failures in SslStream_ServerUntrustedCaWithCustomTrust_OK are unrelated.
(The remote certificate is invalid because of errors in the certificate chain: PartialChain)

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise. Thank you!

#define FLS_STATE_ARMED 1
#define FLS_STATE_INVOKED 2

static __declspec(thread) byte flsState;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static __declspec(thread) byte flsState;
static __declspec(thread) byte t_flsState;

Naming nit

@VSadov
Copy link
Member Author

VSadov commented Feb 25, 2025

/ba-g remaining 2 osx arm64 test legs just take very long time and seem to eventually time out in most runs.

@VSadov VSadov merged commit fa3beb9 into dotnet:main Feb 25, 2025
91 of 94 checks passed
@VSadov VSadov deleted the FLS2 branch February 25, 2025 23:05
@u-less
Copy link

u-less commented Feb 27, 2025

@VSadov Does this fix work for Linux platform ? We also encountered the same problem on the Linux platform after upgrading to. net9.

@jkotas
Copy link
Member

jkotas commented Feb 27, 2025

This is fixing Windows-specific problem. If you are seeing issues on Linux, it is going to be something else. You may want to open a new issue about it.

@VSadov
Copy link
Member Author

VSadov commented Mar 2, 2025

/backport to release/9.0-staging

Copy link
Contributor

github-actions bot commented Mar 2, 2025

Started backporting to release/9.0-staging: https://github.com/dotnet/runtime/actions/runs/13617313559

Copy link
Contributor

github-actions bot commented Mar 2, 2025

@VSadov backporting to "release/9.0-staging" failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Use FLS detach as thread termination notification on windows.
Using index info to reconstruct a base tree...
M	src/coreclr/nativeaot/Runtime/threadstore.cpp
M	src/coreclr/vm/ceemain.cpp
M	src/coreclr/vm/threads.cpp
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/vm/threads.cpp
Auto-merging src/coreclr/vm/ceemain.cpp
Auto-merging src/coreclr/nativeaot/Runtime/threadstore.cpp
Applying: use _ASSERTE_ALL_BUILDS
Applying: one more case
Applying: InitFlsSlot throws per convention used in threading initialization
Applying: OsDetachThread could be void
Applying: Update src/coreclr/vm/ceemain.cpp
Applying: ensure CoInitialize
error: sha1 information is lacking or useless (src/coreclr/vm/ceemain.cpp).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0007 ensure CoInitialize
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

VSadov added a commit to VSadov/runtime that referenced this pull request Mar 3, 2025
… try. (dotnet#112809)

* Use FLS detach as thread termination notification on windows.

* use _ASSERTE_ALL_BUILDS

* one more case

* InitFlsSlot throws per convention used in threading initialization

* OsDetachThread could be void

* Update src/coreclr/vm/ceemain.cpp

* ensure CoInitialize

* Asserts to fail deterministically.

* comments

* scope the failfast to Windows and tweak the comments a bit.

* better assert

* Apply suggestions from code review

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

* Undo unnecessary finalizer thread changes

* reverse the failfast condition

* handle destruction of unattached threads

* PR feedback

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

* Update src/coreclr/vm/ceemain.cpp

* set g_fComStarted as soon as we call CoInitialize

* Naming nit

* fix a typpo + better comment

---------

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Process not responsive dump indicates garbage collection
3 participants