Skip to content

Commit

Permalink
Merge 82cfa44 into 4bb090c
Browse files Browse the repository at this point in the history
  • Loading branch information
seelabs committed Sep 8, 2016
2 parents 4bb090c + 82cfa44 commit cbd0011
Show file tree
Hide file tree
Showing 58 changed files with 2,099 additions and 784 deletions.
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
1.0.0-b5

* fail_file also fails on reads
* Fix bug in rekey where an error code wasn't checked
* Increase coverage
* Add buffer unit test
* Add is_File concept and checks
* Update documentation
* Add example program
* Demote exceptions to asserts in gentex
* Improved commit process
* Dynamic block size in custom allocator

---

1.0.0-b4

* Improved test coverage
Expand Down
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -82,5 +82,6 @@ file (GLOB_RECURSE NUDB_INCLUDES
)

add_subdirectory (bench)
add_subdirectory (examples)
add_subdirectory (test)
add_subdirectory (tools)
1 change: 1 addition & 0 deletions Jamroot
Original file line number Diff line number Diff line change
Expand Up @@ -88,5 +88,6 @@ project nudb
;

build-project bench ;
build-project examples ;
build-project test ;
build-project tools ;
119 changes: 96 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<img width="880" height = "80" alt = "NuDB"
src="https://raw.githubusercontent.com/vinniefalco/NuDB/master/doc/images/readme.png">
src="https://raw.githubusercontent.com/vinniefalco/NuDB/master/doc/images/readme2.png">

[![Join the chat at https://gitter.im/vinniefalco/NuDB](https://badges.gitter.im/vinniefalco/NuDB.svg)](https://gitter.im/vinniefalco/NuDB?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![Build Status]
(https://travis-ci.org/vinniefalco/NuDB.svg?branch=master)](https://travis-ci.org/vinniefalco/NuDB) [![codecov]
Expand All @@ -10,8 +10,14 @@

# A Key/Value Store For SSDs

---

## Contents

- [Introduction](#introduction)
- [Description](#description)
- [Requirements](#requirements)
- [Example](#example)
- [Building](#building)
- [Algorithm](#algorithm)
- [Licence](#licence)
Expand All @@ -21,32 +27,94 @@

## Introduction

NuDB is an append only, key/value store specifically optimized for random
read performance on modern SSDs or equivalent high-IOPS decices. The most
NuDB is an append-only, key/value store specifically optimized for random
read performance on modern SSDs or equivalent high-IOPS devices. The most
common application for NuDB is content addressible storage where a
cryptographic digest of the data is used as the key. The read performance
and memory usage are independent of the size of the database. These are
some other features:

* Low memory footprint.
* Values are immutable.
* Value sizes from 1 to 2^32 bytes (4GB).
* All keys are the same size.
* Performance independent of growth.
* Optimized for concurrent fetch.
* Key file can be rebuilt if needed.
* Inserts are atomic and consistent.
* Data files may be efficiently iterated.
* Key and data files may be on different devices.
* Hardened against algorithmic complexity attacks.
* Header-only, no separate library to build.
* Low memory footprint
* Database size up to 281TB
* All keys are the same size
* Append-only, no update or delete
* Value sizes from 1 to 2^32 bytes (4GB)
* Performance independent of growth
* Optimized for concurrent fetch
* Key file can be rebuilt if needed
* Inserts are atomic and consistent
* Data file may be efficiently iterated
* Key and data files may be on different devices
* Hardened against algorithmic complexity attacks
* Header-only, no separate library to build

## Description

This software is close to final. Interfaces are stable.
For recent changes see the [CHANGELOG](CHANGELOG.md).

NuDB has been in use for over a year on production servers
running [rippled](https://github.com/ripple/rippled), with
database sizes over 3 terabytes.

* [Repository](https://github.com/vinniefalco/Beast)
* [Documentation](http://vinniefalco.github.io/nudb/)

## Requirements

* Boost 1.58 or higher
* C++11 or greater
* SSD drive, or equivalent device with high IOPS

## Example

This complete program creates a database, opens the database,
inserts several key/value pairs, fetches the key/value pairs,
closes the database, then erases the database files. Source
code for this program is located in the examples directory.

```C++
#include <nudb/nudb.hpp>
#include <cstddef>
#include <cstdint>

int main()
{
using namespace nudb;
std::size_t constexpr N = 1000;
using key_type = std::uint32_t;
error_code ec;
auto const dat_path = "db.dat";
auto const key_path = "db.key";
auto const log_path = "db.log";
create<xxhasher>(
dat_path, key_path, log_path,
1,
make_salt(),
sizeof(key_type),
block_size("."),
0.5f,
ec);
store db;
db.open(dat_path, key_path, log_path, ec);
char data = 0;
// Insert
for(key_type i = 0; i < N; ++i)
db.insert(&i, &data, sizeof(data), ec);
// Fetch
for(key_type i = 0; i < N; ++i)
db.fetch(&i,
[&](void const* buffer, std::size_t size)
{
// do something with buffer, size
}, ec);
db.close(ec);
erase_file(dat_path);
erase_file(key_path);
erase_file(log_path);
}
```

## Building

NuDB is header-only so there are no libraries to build. To use it in your
Expand Down Expand Up @@ -74,9 +142,10 @@ git submodule init
git submodule update
```

For the examples and tests, NuDB provides build scripts for Boost.Build (bjam)
and CMake. Developers using Microsoft Visual Studio can generate Visual Studio
project files by executing these commands from the root of the repository:
For the examples and tests, NuDB provides build scripts for Boost.Build (b2)
and CMake. To generate build scripts using CMake, execute these commands at
the root of the repository (project and solution files will be generated
for Visual Studio users):

```
cd bin
Expand All @@ -87,27 +156,31 @@ cmake .. # for Linux/Mac builds, OR
cmake -G"Visual Studio 14 2015 Win64" .. # for 64-bit Windows builds
```

To build with Boost.Build, it is necessary to have the bjam executable
in your path. And bjam needs to know how to find the Boost sources. The
easiest way to do this is make sure that the version of bjam in your path
To build with Boost.Build, it is necessary to have the b2 executable
in your path. And b2 needs to know how to find the Boost sources. The
easiest way to do this is make sure that the version of b2 in your path
is the one at the root of the Boost source tree, which is built when
running `bootstrap.sh` (or `bootstrap.bat` on Windows).

Once bjam is in your path, simply run bjam in the root of the Beast
Once b2 is in your path, simply run b2 in the root of the Beast
repository to automatically build the required Boost libraries if they
are not already built, build the examples, then build and run the unit
tests.

On OSX it may be necessary to pass "toolset=clang" on the b2 command line.
Alternatively, this may be site in site-config.jam or user-config.jam.

The files in the repository are laid out thusly:

```
./
bench/ Holds the benchmark sources and scripts
bin/ Holds executables and project files
bin64/ Holds 64-bit Windows executables and project files
examples/ Holds example program source code
extras/ Additional APIs, may change
include/ Add this to your compiler includes
nudb/
extras/ Additional APIs, may change
test/ Unit tests and benchmarks
tools/ Holds the command line tool sources
```
Expand Down
50 changes: 31 additions & 19 deletions bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,33 @@ tables have a row for each database size, and a column for each database (in
cases where NuDB is compared against other databases). A cell in the table is
the number of operations per second for that trial. For example, in the table
below NuDB had 340397 Ops/Sec when fetching from an existing database with
10,000,000 values.
10,000,000 values. This is a summary report, and only reports samples at order
of magnitudes of ten.

A sample output:

```
insert (per second)
inserts nudb rocksdb
1000000 387894.04 148233.29
5000000 348982.15 93376.19
10000000 279767.88 62597.36
num_db_keys nudb rocksdb
100000 406598 231937
1000000 374330 258519
10000000 NA NA
fetch (per second)
# db keys nudb rocksdb
1000000 455249.16 164997.45
5000000 291651.66 40969.44
10000000 340397.87 21596.47
num_db_keys nudb rocksdb
100000 325228 697158
1000000 333443 34557
10000000 337300 20835
```

In addition to the summary report, the benchmark can collect detailed samples.
The `--raw_out` command line options is used to specify a file to output the raw
samples. The python 3 script `plot_bench.py` may be used to plot the result. For
example, if bench was run as `bench --raw_out=samples.txt`, the the python
script can be run as `python plot_bench.py -i samples.txt`. The python script
requires the `pandas` and `seaborn` packages (anaconda python is a good way to
install and manage python if these packages are not already
installed: [anaconda download](https://www.continuum.io/downloads)).

# Building

Expand Down Expand Up @@ -64,19 +76,19 @@ Note: Building with RocksDB is currently not supported on Windows.

## Test the build

Try running the benchmark with a small database: `./bench --inserts=1000
10000`. A report similar to sample should appear after a few seconds.
Try running the benchmark with a small database: `./bench --num_batches=10`. A
report similar to sample should appear after a few seconds.

# Command Line Options

* `--inserts arg` : Number of values to insert. When timing fetches, the data
base will have this many values in it. The argument may be a list, so several
timing may be collected. For example, the sample output above was run with
`--inserts=1000000 5000000 10000000`. If `inserts` is not specified, it
defaults to `100000 1000000`
* `--fetches arg` : Number of values to fetch from the database. If `fetches`
is not specified, it defaults to `1000000`. Unlike `inserts`, `fetches` is
not a list. It takes a single value only.
* `batch_size arg` : Number of elements to insert or fetch per batch. If not
specified, it defaults to 20000.
* `num_batches arg` : Number of batches to run. If not specified, it defaults to
500.
* `db_dir arg` : Directory to place the databases. If not specified, it defaults to
boost::filesystem::temp_directory_path (likely `/tmp` on Linux)
* `raw_out arg` : File to record the raw measurements. This is useful for plotting. If
not specified the raw measurements will not be output.
* `--dbs arg` : Databases to run the benchmark on. Currently, only `nudb` and
`rocksdb` are supported. Building with `rocksdb` is optional on Linux, and
only `nudb` is supported on windows. The argument may be a list. If `dbs` is
Expand Down
Loading

0 comments on commit cbd0011

Please sign in to comment.