Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve version #2

Closed
aravindet opened this issue Aug 9, 2019 · 2 comments
Closed

Improve version #2

aravindet opened this issue Aug 9, 2019 · 2 comments
Projects

Comments

@aravindet
Copy link
Collaborator

Currently, every node has a version value. In practice, in most (but not all) graphs and queries, all nodes have the same version. There is some redundancy here.

In subscription caches, we need to update the version number of the entire cache whenever there is a new update. With the current data structure, this takes O(size of cache) time, while other operations only take O(size of change).

It might be beneficial to rethink how version is stored and manipulated in the internal representation.

@aravindet aravindet added this to To do in Graffy Aug 27, 2019
@aravindet aravindet changed the title Improve version in graffy-fill Improve version Feb 23, 2020
@aravindet
Copy link
Collaborator Author

We need two versions per node; let's call them readVersion and writeVersion. Both are defined with respect to a monotonically increasing, global, positive integer clock. The semantics are similar, but distinct, based on the node type.

1. Results from read and watch, cached values

Example: { key: 'k', value: 'v', readVersion: 44, writeVersion: 32 }.

Here, the node was set to the value v at time 32, and it retained the value v until at least clock 44. It may or may not have the same value at times after 44 - we don't know.

In the watch change stream, readVersion and writeVersion will be identical.

2. Changes passed to write

Example: { key: 'k', value: 'v', readVersion: 44, writeVersion: 56 }

Here, the write is being performed at time 56, but the writer's most recent knowledge of this node (typically with a different value) is from time 44. If the node has been updated since 44, the write should be rejected.

This is a first step towards atomic writes (although this is not sufficient, we may need to introduce a two-phase commit protocol to the onWrite contract).

3. Queries passed to read and watch

Example: { key: 'k', value: 'v', readVersion: 3, writeVersion: 4 }

Here, the query requests nodes with a readVersion of at least 3 AND a writeVersion of at least 4. Specifying a non-zero writeVersion may result in the result skipping that node, if that node was not written since version 4.

Adding a writeVersion to queries allows resume-able watch and watch-by-polling.

@aravindet
Copy link
Collaborator Author

aravindet commented Apr 18, 2020

In this case, there are three situations with a range of keys:

  1. A range of keys is unknown. We don't know anything about them. This is represented by the absence of any nodes within that range.

  2. A range of keys is known to be empty. This is represented by a range key, which may have a readVersion and a writeVersion; That range then had no keys in the period between the writeVersion (earliest) and readVersion (latest).

  3. A range of keys is excluded as they were not written to since the query readVersion. They may or may not exist; in this way it is similar to unknown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Graffy
  
To do
Development

No branches or pull requests

1 participant