Vastly speed up ancestry check by not re-crawling duplicate histories. #4753
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Noticed when doing fast-forward detection in PG that base only has ~500 history nodes at the top level, so was curious why it was infeasible to do full ancestor detection; turns out that with
UNION
instead ofUNION ALL
we don't cull duplicates, meaning every time we would merge it'd crawl the entire history again 🙃 .It's unintuitive, because you'd think
UNION ALL
is correct since the root causal shouldn't ever appear in the recursive tail, but it appears that's just not how UNIONs work in aWITH RECURSIVE
CTE 🤷🏼♂️Should speed up merges since this affects
lca
, but we can also makelca
much faster, split that off into its own PR since it's a bigger change that I don't have time to test thoroughly right now: #4754Implementation notes
UNION ALL
->UNION
;speeds this up by infinity percent (I never waited it to finish before, now it finishes instantly).
Test coverage
Did some sqlite tests and it should work.
Loose ends
While digging around I noticed that in https://github.com/unisonweb/unison/blob/trunk/parser-typechecker/src/Unison/Codebase/Causal.hs there are many functions which are computing predecessors and 'before' checks on in-memory haskell objects, these would be better on memory and probably much faster if they just used SQLite directly. Probably something for @mitchellwrosen and @tstat to look at as part of the merge rewrite 😄
See #4754