Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce blocking operations in eventloop thread #1342

Merged
merged 19 commits into from
Oct 20, 2023

Conversation

itamarst
Copy link
Collaborator

This is likely only a subset, but it's a start.

Fixes https://tahoe-lafs.org/trac/tahoe-lafs/ticket/4068, will probably file follow-up for round 2.

@coveralls
Copy link
Collaborator

coveralls commented Oct 16, 2023

Coverage Status

coverage: 94.627%. first build when pulling 20cfe70 on 4068-reduce-cpu-in-eventloop-thread into a08a622 on master.

@exarkun exarkun self-assigned this Oct 18, 2023
Copy link
Member

@exarkun exarkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It looks like many of these implementation changes should improve responsiveness during bulk operations. Maybe some of them will even improve throughput for concurrent bulk operations? I didn't try running the benchmarks and comparing against master. Did you try that? Is there a responsiveness benchmark?

Overall the new global state in the implementation is a bit bothersome. I can think of extensive refactorings that would let us avoid that but I'm not sure how realistic those are. Maybe we could try to brainstorm together and see if we can come up with something feasible that avoids it?

src/allmydata/client.py Outdated Show resolved Hide resolved
@@ -53,9 +53,9 @@ def encode(self, inshares, desired_share_ids=None):

for inshare in inshares:
assert len(inshare) == self.share_size, (len(inshare), self.share_size, self.data_size, self.required_shares)
shares = self.encoder.encode(inshares, desired_share_ids)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand the code, the Python ZFEC bindings don't release the GIL (which is a good thing, since until very recently the C library initialization was not thread-safe - and now requires an explicit single-threaded initialization step). Were you able to observe any throughput increase with these encode/decode changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly just put stuff in threads that the blocking-detector code flagged. I am happy to go and make ZFEC release the GIL as a relevant follow-up.

src/allmydata/mutable/publish.py Outdated Show resolved Hide resolved
@@ -767,9 +769,9 @@ def _validate_block(self, results, segnum, reader, server, started):
"block hash tree failure: %s" % e)

if self._version == MDMF_VERSION:
blockhash = hashutil.block_hash(salt + block)
blockhash = await defer_to_thread(hashutil.block_hash, salt + block)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shame about that salt + block, that's presumably a lot of time spent allocating and copying and freeing in the main thread. Possibly a spot for future improvement?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying memory is actually quite fast typically, but maybe.

src/allmydata/storage/http_client.py Show resolved Hide resolved
src/allmydata/util/cputhreadpool.py Outdated Show resolved Hide resolved
src/allmydata/util/cputhreadpool.py Show resolved Hide resolved
@itamarst
Copy link
Collaborator Author

Here is my thought on reactor as state: it shouldn't be an attribute or parameter, it should be accessible via a function get_reactor() that internally retrieves it from contextvar.ContextVar(). And this should really be in Twisted itself...

@itamarst
Copy link
Collaborator Author

I just looked and asyncio does something similar, a thread local. ContextVar is a bit nicer in that you can override it for the current context in a stack-based way, which is useful for tests, but same basic idea.

@meejah
Copy link
Contributor

meejah commented Oct 19, 2023

"In general" trio seems to have better-thought-out async ideas (vs. asyncio) -- although I don't know what it does about the reactor.

The asyncio things I have looked at unfortunately have problems similar to many Twisted internal (and third-party) libraries when dealing with the global / implicit reactor ("thread-local context var" still seems global-adjacent to me? but maybe I don't understand ContextVar enough)

@itamarst
Copy link
Collaborator Author

  1. Decided to punt on reactor, just leave API as is.
  2. For testing flag, the worry with swapping out synchronous version is that you don't immediately await the value, so synchronous test doesn't catch bad parallelism. Switching the defer_to_thread() API to be a coroutine makes this less likely, you get warning if you don't await and it's not a thing you pass around, unlike Deferred.

@itamarst itamarst requested a review from exarkun October 19, 2023 18:06
Copy link
Member

@exarkun exarkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me.

@itamarst itamarst merged commit 4fbf31b into master Oct 20, 2023
30 checks passed
@itamarst itamarst deleted the 4068-reduce-cpu-in-eventloop-thread branch October 20, 2023 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants