Artifact

Artifact is a distributed key-value datastore, which is inspired by the Amazon’s Dynamo Paper.

I’ve benchmarked artifact at 1,200 querys-per-second for a 16 node system with 1GB assigned to each container (single machine). At peak load, 99% of requests fell under 300ms.

Features

Consistently hashing load distribution
Fair Load balancing
Secondary Indexes
Custom storage backends (ETS, DETS are standard)
Quorum coordinator
Memcache and TLL interfaces
Gossip protocol membership

Warning!

This is an Elixir rewrite of my original Erlang project and is not guaranteed to work the same way.

This database could have major problems, I’ve never submitted it to any sort of substantial peer review or audit. I’ve been working on it for some time and I’m still hashing things out. If you find any such problems please file an issue!

Running

Standalone

The easiest way to get up and running with Artifact is to boot a standalone instance. Artifacts configuration (config/config.exs) is configured for this use-case by default.

Clustering

iex(1)> Application.start(:artifact)
:ok
iex(2)> Artifact.RPC.check('kai2.local': 11011, 'kai1.local': 11012)
:ok
iex(3)> Artifact.RPC.check('kai2.local': 11011, 'kai1.local': 11012)
:ok

Artifat maintains a node manifest, including the members of it’s own cluster. When invoking check, it’s asking to retrieve the manifest manually from another node. Calling this function twice can bootstrap the process by mutually ensuring that each node has been added to the other’s manifest.

Synchronization will need to synchronize log2(STORAGE_CNT)*32 + overhead bytes, so give it some time if you’ve built up a large entry count.

To verify that a node has been added to the node manifest:

iex(5)> Artifact.RPC.node_manifest('kai1.local')
{:node_manifest, [{{'10.0.0.53', 11011}}, {{'10.0.0.80', 11011}}]

Memcache

Once you have a server started, you can read and write to it just as if it were a memcached daemon.

% nc localhost 11211
set foo 0 0 3
bar
STORED
get foo
VALUE foo 0 3
bar

Configuration

All configuration values are loaded at startup from config/config.exs. If you’d like to know what configuration a system is running, you can query it with

iex> Artifact.config.node_info
{:node_info, {{127,0,0,1}, 10070}, [{:vnodes, 256}]}

Quorum

quorum: {
  # Number of Participants
  1,
  # Reader workers
  1,
  # Writer workers
  1
},

Artifact doesn’t keep a strict quorum system and so the values used in quorum refer to the minimum number of nodes to participate in some operation for it to be successful. N, the first parameter, is the number of participants in the ‘preference list’ and although there can be more nodes added to the system than N, those excess nodes won’t verify replicas. R, the second, is the minimum number of nodes that must participate in a successful read operation. W is the minimum number of nodes that must participate in a successful write operation. The latency of a get (or put) operation is dictated by the slowest of the R (or W) replicas. For this reason, R and W are usually configured to be less than N, to provide better latency.

Ring Parameters & Load Balancing

# Buckets are the atomic unit of interdatabase partitioning. Adjusting
# this upward will give you more flexibility in how finely grained your
# data will be split at the cost of more expensive negotiation.
buckets: 2048,

# Individual physical nodes present themselves as a multitude of
# virtual nodes in order to address the problem within consistent
# hashing of a non-uniform data distribution scheme, therefore vnodes
# represents a limit of how many physical nodes exist in your
# datastore.
vnodes: 256,
tables: 512,

Instead of mapping nodes to a single point in the ring, each node gets assigned to multiple points. To this end, Artifact uses the Dynamo concept of “virtual nodes”. A virtual node looks like a single node in the system, but each node can be responsible for more than one virtual node. Effectively, when a new node is added to the system, it is assigned multiple positions (“tokens”) in the ring. Using virtual nodes has the following advantages:

If a node becomes unavailable (due to failures or routine maintenance), the load handled by this node is evenly dispersed across the remaining available nodes.
When a node becomes available again, or a new node is added to the system, the newly available node accepts a roughly equivalent amount of load from each of the other available nodes.
The number of virtual nodes that a node is responsible can decided based on its capacity, accounting for heterogeneity in the physical infrastructure.

Storage

store: Artifact.Store.ETS,

I haven’t ported DETS over from my original Erlang implementation yet and so you are stuck with ETS for now.

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
config		config
lib		lib
test		test
vclock		vclock
.gitignore		.gitignore
Makefile		Makefile
README.org		README.org
mix.exs		mix.exs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Artifact

Features

Warning!

Running

Standalone

Clustering

Memcache

Configuration

Quorum

Ring Parameters & Load Balancing

Storage

About

Releases

Packages

Languages

zv/Artifact

Folders and files

Latest commit

History

Repository files navigation

Artifact

Features

Warning!

Running

Standalone

Clustering

Memcache

Configuration

Quorum

Ring Parameters & Load Balancing

Storage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages