Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite Collaboration Server #86

Closed
seveibar opened this issue Apr 15, 2020 · 1 comment
Closed

SQLite Collaboration Server #86

seveibar opened this issue Apr 15, 2020 · 1 comment
Assignees
Milestone

Comments

@seveibar
Copy link
Collaborator

seveibar commented Apr 15, 2020

First reported in #16. The collaboration server is currently written with a scalable serverless architecture hosted on zeit now. We want to have a different codebase for the local one. Because the zeit now code was built for a commercial project, we can't open-source the code. But we can build a new version that implements the API.

Here is the full specification:

Universal Data Tool Collaborative Editing Server

Goals

  • Users should be able to collaborate with other users to complete the labeling of a dataset together
  • Users should receive notifications as work is completed or started by other users
  • Users should receive "updates" from other users in less than 500ms
  • The "Settings" should be able to be edited by any user
  • New data uploaded should be supported by any user
  • Collaborative links should be shareable
  • The first time someone enters collaboration mode a dialog should explain how to share the link etc.

Out of Scope

  • Should not require any login
  • Collaborative editing on a per-sample basis
    • Collisions should take "last person who submitted edit"
  • Completion time estimate

Key Technologies

  • fast-json-patch is used to send patches
  • object-hash is used to hash objects to produce hashOfLatestState
  • micro is used for endpoints
  • ava is used for testing
  • sqlite is used as the database
  • better-sqlite3 is an npm module that makes the connection to sqlite very fast and simple

Architecture

The following endpoints are used...

  • POST /udt/session: Creates a link to a UDT session. Whoever initiates collaboration mode calls this. It is called exactly once to start a session. A session lasts indefinitely. Returns the url to the session.
  • GET /udt/session/<session_id>: Gets the latest version of the UDT JSON file by getting the latest session_state (see DB Architecture)
  • GET /udt/session/<session_id>/diffs: Gets recent diffs for the JSON file
    • The requestor must provide the querystring parameter since=<ISODATE> indicating that they would like the diffs since the last time they polled.
    • The UDT will poll this every 250-500ms. Most of the time it'll return an empty array of patches.
    • Responds with { patches: Array<JSONDiffPatch>, hashOfLatestState, latestVersion }
  • PATCH /udt/session/<session_id>: Sends a JSONDiffPatch object with changes
    • Request contains { patch, mySessionStateId }
      • patch is applied against the latest session state to generate a new session state.
      • mySessionStateId isn't used (for now)
    • Should return { hashOfLatestState, latestVersion }
  • PATCH /udt/session/<session_id>/sample: Creates modifies or deletes a sample
    • This endpoint should be used instead of the /udt/session/<session_id> endpoint for updating, creating or deleting samples because it can handle certain edge cases better.
    • A request contains { operation, sampleIndex, [newInput], [newOutput], [previousInput] }
      • operation can be "DELETE", "CREATE", "UPDATE"
      • newInput is the taskData[sampleIndex] that the UDT observes when it sends the request
        • If "UPDATE" or "DELETE", use previousInput to find the true sample index. (i.e. do a deep comparison to find the sampleIndex using the latest version of the state).
      • newOutput is the new output for "UPDATE" operations. It is optional because the user may not want
      • sampleIndex provided by the requestor not be used.
    • Should return { hashOfLatestState, latestVersion }

Example

Let's look at a typical collaborative workflow to see how these endpoints work:

  1. After User1 engages collaboration mode, an API request is sent to POST /udt/sessionUser1's editor parses the response and creates a link for them to share.
  2. User1 shares the link with their team (only User2) and begins to edit
  3. User2 uses the link to join the session. They get the latest version of the UDT JSON by calling GET /udt/session/<session_id>. They know the session_id because it's embedded in the link.
  4. User2 edits something in the settings. The UDT makes a request to PATCH /udt/session/<session_id> with a JSONDiffPatch containing they're changes.
  5. User1 polls GET /udt/session/<session_id>/diffs?since=<last_version> to get the latest patches. User1's editor sees that there's a patch to apply from User2. They apply the patch, and display a notification for the user.
  6. User1 begins to edit a sample. This triggers a request to PATCH /udt/session/<session_id>/sample changing the taskData[sampleIndex].isBeingEdited to true.
  7. User1 finishes editing a sample. This triggers a request to PATCH /udt/session/<session_id>/sample changing the taskData[sampleIndex].isBeingEdited to true and and taskOutput[sampleIndex] to their newOutput

Database Architecture

One table called session_state representing each state of the JSON file. It contains the following columns:

  • session_state_id uuid randomly generated
  • short_id text randomly generated: represents the session id
  • udt_json jsonb: The state of the UDT file
  • patch jsonb: The patch that created this version from the previous version
  • previous_session_state_id uuid: Identifier for previous state
  • version integer: Integer identifying the revision number
  • created_at timestamptz: Timestamp on creation

The database will have the following constraints applied

  • UNIQUE previous_session_state_id
    • Each session can only have one subsequent state. This prevents certain race conditions.

The database will have the following SQL triggers:

  • Delete session_states that are older than 1 hour AND not the latest state
    • Triggered when a session state is inserted.
@seveibar seveibar added this to the On-Premise Enhancements milestone Apr 15, 2020
@seveibar seveibar modified the milestones: On-Premise Enhancements, Version 1 May 15, 2020
@seveibar
Copy link
Collaborator Author

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants