Improve version #2

aravindet · 2019-08-09T02:32:18Z

Currently, every node has a version value. In practice, in most (but not all) graphs and queries, all nodes have the same version. There is some redundancy here.

In subscription caches, we need to update the version number of the entire cache whenever there is a new update. With the current data structure, this takes O(size of cache) time, while other operations only take O(size of change).

It might be beneficial to rethink how version is stored and manipulated in the internal representation.

aravindet · 2020-02-23T15:23:09Z

We need two versions per node; let's call them readVersion and writeVersion. Both are defined with respect to a monotonically increasing, global, positive integer clock. The semantics are similar, but distinct, based on the node type.

1. Results from read and watch, cached values

Example: { key: 'k', value: 'v', readVersion: 44, writeVersion: 32 }.

Here, the node was set to the value v at time 32, and it retained the value v until at least clock 44. It may or may not have the same value at times after 44 - we don't know.

In the watch change stream, readVersion and writeVersion will be identical.

2. Changes passed to write

Example: { key: 'k', value: 'v', readVersion: 44, writeVersion: 56 }

Here, the write is being performed at time 56, but the writer's most recent knowledge of this node (typically with a different value) is from time 44. If the node has been updated since 44, the write should be rejected.

This is a first step towards atomic writes (although this is not sufficient, we may need to introduce a two-phase commit protocol to the onWrite contract).

3. Queries passed to read and watch

Example: { key: 'k', value: 'v', readVersion: 3, writeVersion: 4 }

Here, the query requests nodes with a readVersion of at least 3 AND a writeVersion of at least 4. Specifying a non-zero writeVersion may result in the result skipping that node, if that node was not written since version 4.

Adding a writeVersion to queries allows resume-able watch and watch-by-polling.

aravindet · 2020-04-18T16:20:14Z

In this case, there are three situations with a range of keys:

A range of keys is unknown. We don't know anything about them. This is represented by the absence of any nodes within that range.
A range of keys is known to be empty. This is represented by a range key, which may have a readVersion and a writeVersion; That range then had no keys in the period between the writeVersion (earliest) and readVersion (latest).
A range of keys is excluded as they were not written to since the query readVersion. They may or may not exist; in this way it is similar to unknown.

aravindet added this to To do in Graffy Aug 27, 2019

aravindet changed the title ~~Improve version in graffy-fill~~ Improve version Feb 23, 2020

aravindet mentioned this issue Apr 11, 2020

MVCC: "Multi-layer" graphs #39

Closed

aravindet closed this as completed Dec 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve version #2

Improve version #2

aravindet commented Aug 9, 2019

aravindet commented Feb 23, 2020

aravindet commented Apr 18, 2020 •

edited

Improve version #2

Improve version #2

Comments

aravindet commented Aug 9, 2019

aravindet commented Feb 23, 2020

1. Results from read and watch, cached values

2. Changes passed to write

3. Queries passed to read and watch

aravindet commented Apr 18, 2020 • edited

aravindet commented Apr 18, 2020 •

edited