RESTBase table storage on Cassandra
This projects provides a high-level table storage service abstraction similar to Amazon DynamoDB or Google DataStore on top of Cassandra. As the production table storage backend for RESTBase, it powers the Wikimedia REST APIs, such as this one for the English Wikipedia.
For testing and small installs, there is also a sqlite backend implementing the same interfaces.
In production since March 2015.
- basic table storage service with REST interface, backed by Cassandra, implementing the RESTBase table storage interface
- multi-tenant design: domain creation, prepared for per-domain ACLs
- table creation with declarative JSON schemas
- global secondary
- index entries written in batch with main data write, superseded entries
removed from indexes asynchronously / eventually consistent
- support for strongly consistent reads at the cost of extra cross-checks with the main data table (not implemented yet)
- range queries
- index entries written in batch with main data write, superseded entries removed from indexes asynchronously / eventually consistent
- limited automatic schema migrations
- multiple retention policies for limiting the MVCC history
- Secondary index refinements:
- queries for columns not projected into the secondary index
- full index rebuilds
- sharded ordered range indexes
- Possibly, some amount of transaction support
- Leverage Cassandra 3 materialized views where it makes sense, once those have stabilized.
Configuration of this module takes place from within an
x-modules stanza in the YAML-formatted
RESTBase configuration file.
While complete configuration of RESTBase is beyond the scope of this document, (see the
RESTBase docs for that), this section covers the
- name: restbase-mod-table-cassandra version: 1.0.0 type: npm options: # Passed to the module constructor conf: version: 1 hosts: [localhost] username: cassandra password: cassandra defaultConsistency: localOne localDc: datacenter1 datacenters: - datacenter1 storage_groups: - name: default.group.local domains: /./
The version of this configuration. Each edit of the module configuration must correpond to a new, unique version.
Note: Versions must be monotonically increasing.
A list of Cassandra nodes to use as contact points.
hosts: - cassandra-01.sample.org - cassandra-02.sample.org - cassandra-03.sample.org
Password credentials to use in authenticating with Cassandra.
Note: Optional; Leave unconfigured if Cassandra authentication is not enabled.
username: someuser password: somepass
The Cassandra consistency level to use when not otherwise specified. Valid
values are those from the nodejs driver for Cassandra.
Key and certificate information for use in TLS-encrypted environments. See the
nodejs documentation on
for the meaning of these directives.
Note: Optional; Leave unconfigured if Cassandra client encryption is not enabled.
tls: cert: /etc/restbase/tls/cert.pem key: /etc/restbase/tls/key.pem ca: - /etc/restbase/tls/root.pem
uses a datacenter-aware connection pool. The
localDc directive instructs the module
which datacenter to treat as 'local' to this instance. Cassandra nodes in the local
datacenter will be used for queries, and any others serve as a fallback. Defaults to
datacenter1 (the Cassandra default).
localDc must be in the list of configured datacenters (see below).
The list of datacenters this Cassandra cluster belongs to. Data will be replicated
across these datacenters accordingly. Defaults to
[ datacenter1 ].
Note: Changing this list alters the underlying Cassandra keyspaces in order to add or remove datacenter replicas accordingly, but replication is NOT made retroactive. You MUST perform a Cassandra repair after adding a new datacenter to realize the added redundancy. Likewise, you must perform a cleanup to reclaim space if a datacenter is removed.
datacenters: - datacenter1
Storage groups are used to map tables to one or more hosts/domains.
storage_groups: - name: default.group.local domains: /./