Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel pull from Share #3153

Merged
merged 11 commits into from Jun 30, 2022
Merged

Parallel pull from Share #3153

merged 11 commits into from Jun 30, 2022

Conversation

mitchellwrosen
Copy link
Member

@mitchellwrosen mitchellwrosen commented Jun 24, 2022

Screenshot from 2022-06-28 14-06-54

Overview

This PR implements a concurrent pull procedure involving a few different threads:

  • One thread that inserts entities fetched from Share into sqlite. Because writing to sqlite requires a lock, we can't do better than that.
  • One thread that elaborates subsets of those hashes and adds the elaborated hashes to the work queue (the set of hashes we need to download).
  • One thread to coordinate pulling batches of work off of the work queue and spawning one-shot workers, and deciding when there's no more work left to do.

@mitchellwrosen mitchellwrosen mentioned this pull request Jun 28, 2022
1 task

workerCount <- newWorkerCount

Ki.scoped \scope -> do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works really nicely 👍🏼 Good work on this lib

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

elaborator hashesVar uninsertedHashesVar newTempEntitiesQueue workerCount =
connect \conn ->
forever do
(join . atomically) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a benefit to the (join . atomically) do pattern here over something like:

forever do
  mayNewTempEntities <- atomically $ do ...
  case mayNewTempEntities of
    Nothing -> ...
    Just _ -> ...

?

I find interleaving STM and IO makes it tougher to see what's being accomplished transactionally 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can switch it to not do the join . atomically thing. I think instead of a Maybe (Set Hash) (or w/e) I'll go with a one-off data type, so it's more clear what going down one branch versus another means.

also reduce number of downloaders from 10 to 5, because 5 has performed
better in a few tests
@mitchellwrosen mitchellwrosen marked this pull request as ready for review June 30, 2022 02:43
@mitchellwrosen mitchellwrosen added the ready-to-merge Apply this to a PR and it will get merged automatically once CI passes and 1 reviewer has approved label Jun 30, 2022
@mergify mergify bot merged commit 8baee64 into trunk Jun 30, 2022
@mergify mergify bot deleted the travis/pull-conc branch June 30, 2022 03:40
@mergify mergify bot removed the ready-to-merge Apply this to a PR and it will get merged automatically once CI passes and 1 reviewer has approved label Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants