Artifact is a distributed key-value datastore, which is inspired by the Amazon’s Dynamo Paper.
I’ve benchmarked artifact at 1,200 querys-per-second for a 16 node system with 1GB assigned to each container (single machine). At peak load, 99% of requests fell under 300ms.
- Consistently hashing load distribution
- Fair Load balancing
- Secondary Indexes
- Custom storage backends (ETS, DETS are standard)
- Quorum coordinator
- Memcache and TLL interfaces
- Gossip protocol membership
This is an Elixir rewrite of my original Erlang project and is not guaranteed to work the same way.
This database could have major problems, I’ve never submitted it to any sort of substantial peer review or audit. I’ve been working on it for some time and I’m still hashing things out. If you find any such problems please file an issue!
The easiest way to get up and running with Artifact is to boot a standalone
instance. Artifacts configuration (config/config.exs
) is configured for this
use-case by default.
iex(1)> Application.start(:artifact) :ok iex(2)> Artifact.RPC.check('kai2.local': 11011, 'kai1.local': 11012) :ok iex(3)> Artifact.RPC.check('kai2.local': 11011, 'kai1.local': 11012) :ok
Artifat maintains a node manifest, including the members of it’s own cluster.
When invoking check
, it’s asking to retrieve the manifest manually from
another node. Calling this function twice can bootstrap the process by
mutually ensuring that each node has been added to the other’s manifest.
Synchronization will need to synchronize log2(STORAGE_CNT)*32
+ overhead
bytes, so give it some time if you’ve built up a large entry count.
To verify that a node has been added to the node manifest:
iex(5)> Artifact.RPC.node_manifest('kai1.local') {:node_manifest, [{{'10.0.0.53', 11011}}, {{'10.0.0.80', 11011}}]
Once you have a server started, you can read and write to it just as if it were a memcached daemon.
% nc localhost 11211 set foo 0 0 3 bar STORED get foo VALUE foo 0 3 bar
All configuration values are loaded at startup from config/config.exs
. If you’d like to know what configuration
a system is running, you can query it with
iex> Artifact.config.node_info {:node_info, {{127,0,0,1}, 10070}, [{:vnodes, 256}]}
quorum: {
# Number of Participants
1,
# Reader workers
1,
# Writer workers
1
},
Artifact doesn’t keep a strict quorum system and so the values used in
quorum refer to the minimum number of nodes to participate in some operation
for it to be successful. N
, the first parameter, is the number of
participants in the ‘preference list’ and although there can be more nodes
added to the system than N
, those excess nodes won’t verify replicas. R
,
the second, is the minimum number of nodes that must participate in a
successful read operation. W
is the minimum number of nodes that must
participate in a successful write operation. The latency of a get (or put)
operation is dictated by the slowest of the R
(or W
) replicas. For this
reason, R
and W
are usually configured to be less than N, to provide
better latency.
# Buckets are the atomic unit of interdatabase partitioning. Adjusting
# this upward will give you more flexibility in how finely grained your
# data will be split at the cost of more expensive negotiation.
buckets: 2048,
# Individual physical nodes present themselves as a multitude of
# virtual nodes in order to address the problem within consistent
# hashing of a non-uniform data distribution scheme, therefore vnodes
# represents a limit of how many physical nodes exist in your
# datastore.
vnodes: 256,
tables: 512,
Instead of mapping nodes to a single point in the ring, each node gets assigned to multiple points. To this end, Artifact uses the Dynamo concept of “virtual nodes”. A virtual node looks like a single node in the system, but each node can be responsible for more than one virtual node. Effectively, when a new node is added to the system, it is assigned multiple positions (“tokens”) in the ring. Using virtual nodes has the following advantages:
- If a node becomes unavailable (due to failures or routine maintenance), the load handled by this node is evenly dispersed across the remaining available nodes.
- When a node becomes available again, or a new node is added to the system, the newly available node accepts a roughly equivalent amount of load from each of the other available nodes.
- The number of virtual nodes that a node is responsible can decided based on its capacity, accounting for heterogeneity in the physical infrastructure.
store: Artifact.Store.ETS,
I haven’t ported DETS over from my original Erlang implementation yet and so you are stuck with ETS for now.