Skip to content

Latest commit

 

History

History
89 lines (57 loc) · 4.95 KB

data-structure.md

File metadata and controls

89 lines (57 loc) · 4.95 KB
title target-version
data-structure
0.4.6

Data Structures

This document covers the data structures of the SDK.

Summary

The json and crdt package has data structures for representing the contents of JSON-like documents edited by the user. This document explains what data structures are used for and how they refer to each other.

Goals

This document is to help new SDK contributors understand the overall data structures of the JSON-like document.

Non-Goals

This document does not describe algorithms in distributed systems such as CRDTs or logical clocks.

Proposal Details

The json and crdt package has data structures for representing the contents of JSON-like documents edited by the user.

Overview

Below is the dependency graph of data structures used in a JSON-like document.

data-structure

The data structures can be divided into three groups:

  • JSON-like: Data structures used directly in JSON-like documents.
  • CRDT: Data structures used by JSON-like group to resolve conflicts.
  • Common: Data structures used for general purposes.

The data structures of each group have the dependencies shown in the figure above; the data structure on the left side of an arrow use the data structure on the right.

JSON-like Group

JSON-like data structures are used when editing JSON-like documents.

  • Primitive: represents primitive data like string, number, boolean, null, etc.
  • Object: represents object type of JavaScript. Just like JavaScript, you can use Object as hash table.
  • Array: represents array type of JavaScript. You can also use Array as list.
  • Text: represents text with style attributes in rich text editors such as Quill. Users can express styles such as bold, italic, and underline to text content. Of course, it can represent just a plain text in text-based editors such as CodeMirror. It supports collaborative editing; multiple users can modify parts of the contents without conflict.
  • Counter: represents a counter in the document. As a proxy for the CRDT counter, it is used when the user manipulates the counter from the outside.
  • Tree: represents CRDT-based tree structure that is used to represent the document tree of text-based editor such as ProseMirror.

JSON-like data structures can be edited through proxies. For example:

doc.update((root) => {
  // set a `Primitive<string>` "world" to the root `object` at key "hello".
  root.hello = 'world'; // { "hello": "world" }

  // set an `array` [1, 2, 3] to the root `object` at key "array".
  root.array = [1, 2, 3]; // { "hello": "world", "array": [1, 2, 3] }

  // push a `Primitive<number>` 4 to the `array` at the end.
  root.array.push(4); // { "hello": "world", "array": [1, 2, 3, 4] }
});

The code above uses Primitive, Object and Array in JSON-like group.

CRDT Group

CRDT data structures are used by JSON-like group to resolve conflicts in concurrent editing.

  • RHT(Replicated Hash Table): similar to hash table, but resolves concurrent-editing conflicts.
  • ElementRHT: similar to RHT, but has elements as values.
  • RGATreeList: extended RGA(Replicated Growable Array) with an additional index tree. The index tree manages the indices of elements and provides faster access to elements at the int-based index.
  • RGATreeSplit: extended RGATreeList allowing characters to be represented as blocks rather than each single character.
  • CRDTTree: represents the CRDT tree with an index tree structure'. It resolves conflicts arising from concurrent editing.

Common Group

Common data structures can be used for general purposes.

  • SplayTree: A tree that moves nodes to the root by splaying. This is effective when user frequently access the same location, such as text editing. We use SplayTree as an index tree to give each node a weight, and to quickly access the node based on the index.
  • LLRBTree: A tree simpler than Red-Black Tree. Newly added floor method finds the node of the largest key less than or equal to the given key.
  • IndexTree: A tree implementation to represent a document of text-based editors.

Risks and Mitigation

We can replace the data structures with better ones for some reason, such as performance. For example, SplayTree used in RGATreeList can be replaced with TreeList.