Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-device editing/syncing with CRDTs #250

20 of 35 tasks
trishume opened this issue May 4, 2017 · 7 comments
20 of 35 tasks

Multi-device editing/syncing with CRDTs #250

trishume opened this issue May 4, 2017 · 7 comments


Copy link

trishume commented May 4, 2017

This is a tracking issue/roadmap/plan for how I'm going to proceed with adding the ability to synchronize documents. The tasks below correspond approximately to PRs I plan on writing and the order I plan on doing them.

Edit 2017/08/04: 馃摑 馃憗 I wrote a detailed document describing the CRDT

Edit 2017/05/04: Add more detail.
Edit 2017/05/04: Add ideas for optimizing
Edit 2017/07/12: Heavily refactored list to reflect what actually happened, include links to more PRs, and update plan

cc @raphlinus

xi-rope refactoring

Prior to starting this project, the rope and mini-CRDT data structures were mostly undocumented, lightly tested and used some representations that didn't have good time complexity or didn't fit well with turning the mini-CRDT into a full CRDT. The first step of the project was to refactor and replace some of the existing data structures and algorithms and comment and test them.

Fuchsia last-write-wins prototype

As an initial use case of this functionality, implement syncing using ledger for fuchsia/xi.

CRDT Merge operation

Implement the merge function on Engine for histories not involving undo.

Integrate CRDT into Fuchsia

You can now open Xi on two Fuchsia devices and type into both of them concurrently and they will synchronize and reach a consistent state shortly afterward.


Improve CRDT robustness and functionality

  • Write randomized tests for the merge operation. Possibly using quickcheck tests or fuzz tests. [WIP] Fuzz testing for xi-rope聽#300
  • Support Undo operations in the CRDT
  • Use test generator to generate large tests and characterize performance.

Make it fast

The above will have terrible awful scaling properties and will slow to a crawl in many use cases. The next step is to figure out ways to improve the CRDT representation and operations to have better time and space complexity. Some of these ideas are well thought out and are clear wins, others are more speculative half-baked ideas (marked by a "馃 ").

  • Properly compute the common base for merges, currently it always starts from the very first commit. This should be both easy and a huge improvement to merge performance.
  • Optimize merge for common simple cases like no new edits on one side, or no edits in common after the base. These won't be that common with conflicts but will be super frequent in the Ledger PageWatcher update case.
  • Talk to Ledger team about having side with more revisions be on the left of the merge.
  • Store larger Subsets in a rope with an aggregation on number of inserted and deleted characters so that calculations can do things in the middle of the text without being O(n).
  • Switch to a multi-key ledger representation, instead of storing everything in a single key. This will improve both running time due to not loading in large changes from ledger, as well as storage use in Ledger.
    • Refactor to make this possible, for example this requires that new changes be merged into the Ledger instead of the current Engine being put in, that will guarantee things are only appended.
    • Optimize the ledger storage serialization of revision history to store segments of history in separate keys, use changed keys in merge to determine common prefix.
    • Store text in ledger as a doubly-linked list of blocks. If a block grows too large it splits, and if too small it merges, but both of these are local operations affecting a constant small number of keys.
  • 馃 Store the revision history in a rope/tree/skip-list to make computing the coordinate transforms O(log n) instead of O(n) for merges and undos. Where n is the size of the history after the undo/common prefix. This would definitely be useful for undo because it would allow an arbitrarily long undo history without significant performance degradation, and would effectively allow us to remove the garbage collector.
  • 馃 In merge, combine transforms that are spatially separate into one Subset so that there are less Subsets to rebase and transform repeatedly in merge. Like Combine sequential edits with same priority in merge聽#351 but more extreme.
Copy link
Collaborator Author

trishume commented May 5, 2017

I discussed this a bunch with @raphlinus today and now have a much better understanding of the whole CRDT model and how to proceed. I've included some (very high-res page-bloaty) pictures of the whiteboard from our discussion along with some quick notes. I've edited the main issue body based on the discussion with new items.

cc @cmyr @rkusa since both of you expressed interest in CRDT stuff

Example of a revision history, right column is the deletion set at every revision, identical concurrent deletions (last two) necessitate multi-set (each deleted index also has a count) for reversibility so that undoing/reversing one doesn't un-delete a multi-deleted character. Things along the bottom are state in the engine for the head revision:
top right is the contents of a revision currently:
example of a non-trivial merge (bot-left) and half of a diagram showing correspondence between ops (which include the inserted or deleted string) and revs (which only include index sets and rely on text and tombstone ropes stored in the engine for the head revision):
other half of diagram, plus drawings of how one might efficiently represent histories and ropes as key-values so that only a limited amount of keys/data are changed on update:
example of correspondence between deletion set and function from index to element, with set of non-primes corresponding to function of nth prime:

Copy link

rkusa commented May 5, 2017

Thanks for the update!

I have a question to the first image. Why is the deletion set at line three (ab ins={1} {1}) {1} and not {2}?

Another question regarding the tombstones. Since they are not part of the revision, we only have the most reason tomstone string, right?

I've added the tombstone state to each line of your first example (and also added an additional starting line for the sake of my following question).

(1) 123   del={0, 1, 2}   {0, 1, 2}
      text = "" tombstones = "123"
(2) a     ins={0}         {1, 2}
      text = "a" tombstones = "23"
(3) ab    ins={1}         {1}
      text = "ab" tombstones = "3"
(4) axb   ins={1}         {}
      text = "axb" tombstones = ""
(5) ab    del={1}         {1}
      text = "ab" tombstones = "x"

Is this the correct way of how tombstones are going to work? If yes, when I undo all of these revision ((5), (4), (3), (2) and (1)), how do we know that we have to restore 123 (because 123 it is not in the tombstone string anymore)?

Copy link
Collaborator Author

trishume commented May 5, 2017

The leftmost deletion set, also known as from_union or the interleaving, is the characters deleted to get the final string. To be clear in the whiteboard diagram, the rightmost column is things that can be computed at the current time after all the revisions based on information at the bottom rather than stuff stored in a revision or computed at that time. So since the union string at the bottom is "axb", the deletion set to get "ab" is {1}.

Let me rewrite your example to be correct. Below the first line is in the revision, and the next line is the contents of Engine just after that revision is made. I included both the text,tombstones representation and the union representation, but in reality only one would be included. For the union, from_union is just the characters to delete to get text, and for text,tombstones you can get the union by using from_union to determine an interleaving. You also need from_union to figure out indexes to put in revisions since the indexes are based on the union string. The text,tombstones representation is an optimization of union that is more complex but also more efficient. I also included back_computed_deletions_from_6_union which is not something known at each time but something that can be computed at the very end, possibly for undo processing or something, it corresponds to the leftmost column of the drawing, I don't know if it is actually useful, it's just something we talked about to make sure I was understanding it right.

()   before any revisions are made
      text = "" tombstones = "" union="" from_union={}
      back_computed_deletions_from_6_union = {0,1,2,3,4,5}
(1) ins={0, 1, 2}
      text = "123" tombstones = "" union="123" from_union={}
      back_computed_deletions_from_6_union = {0,1,2}
(2)  del={0, 1, 2}
      text = "" tombstones = "123" union="123" from_union={0,1,2}
      back_computed_deletions_from_6_union = {0,1,2,3,4,5}
(3) ins={0}
      text = "a" tombstones = "123" union="a123" from_union={1,2,3}
      back_computed_deletions_from_6_union = {1,2,3,4,5}
(4) ins={1}
      text = "ab" tombstones = "123" union="ab123" from_union={2,3,4}
      back_computed_deletions_from_6_union = {1,3,4,5}
(5) ins={1}
      text = "axb" tombstones = "123" union="axb123" from_union={3,4,5}
      back_computed_deletions_from_6_union = {3,4,5}
(6) del={1}
      text = "ab" tombstones = "x123" union="axb123" from_union={1,3,4,5}

@cmyr cmyr mentioned this issue May 5, 2017
28 tasks
Copy link

rkusa commented May 5, 2017

Now, it makes sense. Thanks a lot for the detailed answer!

raphlinus added a commit that referenced this issue May 8, 2017
Fixes merge conflicts from #250
raphlinus added a commit that referenced this issue May 8, 2017
Fixes conflicts from having merged #250 first
@trishume trishume self-assigned this Jul 12, 2017
Copy link

Might this become compatible with the CRDT version control system Memo that @as-cii and @nathansobo are working on in xray?

Copy link

cmyr commented Nov 2, 2018

@erlend-sh These two projects aren't particularly related, so I don't think it makes too much sense to think of compatibility; however my understanding is that Memo is designed to be a standalone system that exposes an API that can be used by any consumer, so supporting it in xi at some point in the future should definitely be possibly. It's certainly an interesting project, and I've been following along with interest. :)

Copy link

ul commented May 2, 2020

Is there any ongoing work for this project? I'm especially interested in "Support Undo operations in the CRDT". Is anyone having a deep Engine understanding keen to work on it with me? I also could offer a bounty for this implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

5 participants