This WIP library intends to implement synchronization using the trimerge algorithm. It is an iteration on top of my original collabodux proof-of-concept.
Trimerge-sync is a client-first declarative/functional approach to synchronizing application state across devices/users.
It “steals” ideas from a number of projects:
- The entire state is represented as an immutable data structure (as in Redux)
- Each change is represented by a base revision and new revision (as in Git)
- Changes are applied by diffing data structures (as in React virtual dom)
- Conflicts are resolved on the client side (as in @mweststrate's “Distributing state changes using snapshots, patches and actions”)
- Data structure design can limit conflicts (as in CRDT)
- Easy to reason about application state because it's just immutable JS objects
- Conflict resolution is data-oriented and declarative
- Unlike Operational Transform, which becomes increasingly more complex with more types of operations
- Scales with data type complexity, not schema size
- Easy to write unit tests against
- Offline-first
- Integrated multi-user undo
- Can easily rollback specific edits
- Can capture undo state
- Server or peer-to-peer (theoretically)
- Server is schema-agnostic
- Focused on networking, authentication, and persistence
- Assumes application is built on immutable data structures
- Does not scale to high number of concurrent edits (conflict thrashing)
- Requires the full document model to be in all clients' memory
The state history is a directed acyclic append-only graph. Each node represents an edit.
All nodes are immutable: you can only add new nodes that point to them.
Each graph node represents an edit with zero, one, or two parents:
0: an initial edit has no parent, generally a blank document 1: a single-parent node is a simple edit 2: a dual-parent node is a merge of the two parents
A node is a "head" node if it has no child nodes (i.e. no nodes that reference it as a parent).
Whenever a client has more than one head node, it attempts to trimerge all the head nodes into one node.
This is done with trimerge:
- find the two head nodes with the closest common ancestor
- trimerge those two nodes against their common base
- create a merge node
- repeat until there is one head node
First let's look at the four levels of possible synchronization:
- LEVEL 1: Local process sync
- LEVEL 2: Persisted local process sync
- LEVEL 3: Persisted remote sync
- LEVEL 4: Persisted p2p sync
All levels assume some kind of 2-way communication between processes. This could be broadcast-channel, websockets, or something custom
In order to synchronize local processes (e.g. browser tabs or between web workers), we need the following:
- Start listening on a shared message channel
- Make “hello” request to see if anyone is out there
- Get snapshot from another process, or start new one
- On local change send diff nodes
- On receive do send acknowledgment / trimerge as needed
- On receiving acknowledgement delete old nodes
This is similar, but assumes all processes can access a central data store (like IndexedDB or Sqlite, etc).
So how do you synchronize these graphs across clients?
The first thing is to have a simpler version of each node:
- Only store a diff of a node and its parent (pick one for merges)
- To avoid reading an entire path of the graph, you can store snapshots at arbitrary nodes
Then you need a way to know when a node should be sent to another client or not. I haven't quite figured out the best way to do this.
Zlib license