Cassandra table storage backend for RESTBase
JavaScript
Latest commit f405c4a Jul 20, 2017 @eevans eevans Merge pull request #209 from Pchelolo/default_ttl
Expose default_time_to_live

Readme.md

RESTBase table storage on Cassandra

This projects provides a high-level table storage service abstraction similar to Amazon DynamoDB or Google DataStore on top of Cassandra. As the production table storage backend for RESTBase, it powers the Wikimedia REST APIs, such as this one for the English Wikipedia.

For testing and small installs, there is also a sqlite backend implementing the same interfaces.

Issue tracking

We use Phabricator to track issues. See the list of current issues in restbase-mod-table-cassandra.

Status

In production since March 2015.

Build Status coverage status

Features

  • basic table storage service with REST interface, backed by Cassandra, implementing the RESTBase table storage interface
  • multi-tenant design: domain creation, prepared for per-domain ACLs
  • table creation with declarative JSON schemas
  • global secondary indexes
    • index entries written in batch with main data write, superseded entries removed from indexes asynchronously / eventually consistent
      • support for strongly consistent reads at the cost of extra cross-checks with the main data table (not implemented yet)
    • range queries
    • projections
  • limited automatic schema migrations
  • multiple retention policies for limiting the MVCC history
  • paging

TODO

Configuration

Configuration of this module takes place from within an x-modules stanza in the YAML-formatted RESTBase configuration file. While complete configuration of RESTBase is beyond the scope of this document, (see the RESTBase docs for that), this section covers the restbase-mod-table-cassandra specifics.

    - name: restbase-mod-table-cassandra
      version: 1.0.0
      type: npm
      options: # Passed to the module constructor
        conf:
          version: 1
          hosts: [localhost]
          username: cassandra
          password: cassandra
          defaultConsistency: localOne
          localDc: datacenter1
          datacenters:
            - datacenter1
          storage_groups:
            - name: default.group.local
              domains: /./

Version

The version of this configuration. Each edit of the module configuration must correpond to a new, unique version.

Note: Versions must be monotonically increasing.

    version: 1

Hosts

A list of Cassandra nodes to use as contact points.

    hosts:
      - cassandra-01.sample.org
      - cassandra-02.sample.org
      - cassandra-03.sample.org

Credentials

Password credentials to use in authenticating with Cassandra.

Note: Optional; Leave unconfigured if Cassandra authentication is not enabled.

    username: someuser
    password: somepass

Default Consistency

The Cassandra consistency level to use when not otherwise specified. Valid values are those from the nodejs driver for Cassandra. Defaults to localOne.

    defaultConsistency: localOne

TLS

Key and certificate information for use in TLS-encrypted environments. See the nodejs documentation on tls.connect for the meaning of these directives.

Note: Optional; Leave unconfigured if Cassandra client encryption is not enabled.

    tls:
      cert: /etc/restbase/tls/cert.pem
      key: /etc/restbase/tls/key.pem
      ca:
        - /etc/restbase/tls/root.pem

Local Datacenter

restbase-mod-table-cassandra uses a datacenter-aware connection pool. The localDc directive instructs the module which datacenter to treat as 'local' to this instance. Cassandra nodes in the local datacenter will be used for queries, and any others serve as a fallback. Defaults to datacenter1 (the Cassandra default).

Note: the localDc must be in the list of configured datacenters (see below).

    localDc: datacenter1

Datacenters

The list of datacenters this Cassandra cluster belongs to. Data will be replicated across these datacenters accordingly. Defaults to [ datacenter1 ].

Note: Changing this list alters the underlying Cassandra keyspaces in order to add or remove datacenter replicas accordingly, but replication is NOT made retroactive. You MUST perform a Cassandra repair after adding a new datacenter to realize the added redundancy. Likewise, you must perform a cleanup to reclaim space if a datacenter is removed.

    datacenters:
      - datacenter1

Storage Groups

Storage groups are used to map tables to one or more hosts/domains.

    storage_groups:
      - name: default.group.local
        domains: /./