Merge 82cfa44 into 4bb090c

vinniefalco · Sep 8, 2016 · cbd0011 · cbd0011
2 parents 4bb090c + 82cfa44
commit cbd0011
Show file tree

Hide file tree

Showing 58 changed files with 2,099 additions and 784 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,18 @@
+1.0.0-b5
+
+* fail_file also fails on reads
+* Fix bug in rekey where an error code wasn't checked
+* Increase coverage
+* Add buffer unit test
+* Add is_File concept and checks
+* Update documentation
+* Add example program
+* Demote exceptions to asserts in gentex
+* Improved commit process
+* Dynamic block size in custom allocator
+
+---
+
 1.0.0-b4
 
 * Improved test coverage

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -82,5 +82,6 @@ file (GLOB_RECURSE NUDB_INCLUDES
 )
 
 add_subdirectory (bench)
+add_subdirectory (examples)
 add_subdirectory (test)
 add_subdirectory (tools)
diff --git a/Jamroot b/Jamroot
@@ -88,5 +88,6 @@ project nudb
   ;
 
 build-project bench ;
+build-project examples ;
 build-project test ;
 build-project tools ;
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 <img width="880" height = "80" alt = "NuDB"
-    src="https://raw.githubusercontent.com/vinniefalco/NuDB/master/doc/images/readme.png">
+    src="https://raw.githubusercontent.com/vinniefalco/NuDB/master/doc/images/readme2.png">
 
 [![Join the chat at https://gitter.im/vinniefalco/NuDB](https://badges.gitter.im/vinniefalco/NuDB.svg)](https://gitter.im/vinniefalco/NuDB?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![Build Status]
 (https://travis-ci.org/vinniefalco/NuDB.svg?branch=master)](https://travis-ci.org/vinniefalco/NuDB) [![codecov]
@@ -10,8 +10,14 @@
 
 # A Key/Value Store For SSDs
 
+---
+
+## Contents
+
 - [Introduction](#introduction)
+- [Description](#description)
 - [Requirements](#requirements)
+- [Example](#example)
 - [Building](#building)
 - [Algorithm](#algorithm)
 - [Licence](#licence)
@@ -21,32 +27,94 @@
 
 ## Introduction
 
-NuDB is an append only, key/value store specifically optimized for random
-read performance on modern SSDs or equivalent high-IOPS decices. The most
+NuDB is an append-only, key/value store specifically optimized for random
+read performance on modern SSDs or equivalent high-IOPS devices. The most
 common application for NuDB is content addressible storage where a
 cryptographic digest of the data is used as the key. The read performance
 and memory usage are independent of the size of the database. These are
 some other features:
 
-* Low memory footprint.
-* Values are immutable.
-* Value sizes from 1 to 2^32 bytes (4GB).
-* All keys are the same size.
-* Performance independent of growth.
-* Optimized for concurrent fetch.
-* Key file can be rebuilt if needed.
-* Inserts are atomic and consistent.
-* Data files may be efficiently iterated.
-* Key and data files may be on different devices.
-* Hardened against algorithmic complexity attacks.
-* Header-only, no separate library to build.
+* Low memory footprint
+* Database size up to 281TB
+* All keys are the same size
+* Append-only, no update or delete
+* Value sizes from 1 to 2^32 bytes (4GB)
+* Performance independent of growth
+* Optimized for concurrent fetch
+* Key file can be rebuilt if needed
+* Inserts are atomic and consistent
+* Data file may be efficiently iterated
+* Key and data files may be on different devices
+* Hardened against algorithmic complexity attacks
+* Header-only, no separate library to build
+
+## Description
+
+This software is close to final. Interfaces are stable.
+For recent changes see the [CHANGELOG](CHANGELOG.md).
+
+NuDB has been in use for over a year on production servers
+running [rippled](https://github.com/ripple/rippled), with
+database sizes over 3 terabytes.
+
+* [Repository](https://github.com/vinniefalco/Beast)
+* [Documentation](http://vinniefalco.github.io/nudb/)
 
 ## Requirements
 
 * Boost 1.58 or higher
 * C++11 or greater
 * SSD drive, or equivalent device with high IOPS
 
+## Example
+
+This complete program creates a database, opens the database,
+inserts several key/value pairs, fetches the key/value pairs,
+closes the database, then erases the database files. Source
+code for this program is located in the examples directory.
+
+```C++
+#include <nudb/nudb.hpp>
+#include <cstddef>
+#include <cstdint>
+
+int main()
+{
+    using namespace nudb;
+    std::size_t constexpr N = 1000;
+    using key_type = std::uint32_t;
+    error_code ec;
+    auto const dat_path = "db.dat";
+    auto const key_path = "db.key";
+    auto const log_path = "db.log";
+    create<xxhasher>(
+        dat_path, key_path, log_path,
+        1,
+        make_salt(),
+        sizeof(key_type),
+        block_size("."),
+        0.5f,
+        ec);
+    store db;
+    db.open(dat_path, key_path, log_path, ec);
+    char data = 0;
+    // Insert
+    for(key_type i = 0; i < N; ++i)
+        db.insert(&i, &data, sizeof(data), ec);
+    // Fetch
+    for(key_type i = 0; i < N; ++i)
+        db.fetch(&i,
+            [&](void const* buffer, std::size_t size)
+        {
+            // do something with buffer, size
+        }, ec);
+    db.close(ec);
+    erase_file(dat_path);
+    erase_file(key_path);
+    erase_file(log_path);
+}
+```
+
 ## Building
 
 NuDB is header-only so there are no libraries to build. To use it in your
@@ -74,9 +142,10 @@ git submodule init
 git submodule update
 ```
 
-For the examples and tests, NuDB provides build scripts for Boost.Build (bjam)
-and CMake. Developers using Microsoft Visual Studio can generate Visual Studio
-project files by executing these commands from the root of the repository:
+For the examples and tests, NuDB provides build scripts for Boost.Build (b2)
+and CMake. To generate build scripts using CMake, execute these commands at
+the root of the repository (project and solution files will be generated
+for Visual Studio users):
 
 ```
 cd bin
@@ -87,27 +156,31 @@ cmake ..                                    # for Linux/Mac builds, OR
 cmake -G"Visual Studio 14 2015 Win64" ..    # for 64-bit Windows builds
 ```
 
-To build with Boost.Build, it is necessary to have the bjam executable
-in your path. And bjam needs to know how to find the Boost sources. The
-easiest way to do this is make sure that the version of bjam in your path
+To build with Boost.Build, it is necessary to have the b2 executable
+in your path. And b2 needs to know how to find the Boost sources. The
+easiest way to do this is make sure that the version of b2 in your path
 is the one at the root of the Boost source tree, which is built when
 running `bootstrap.sh` (or `bootstrap.bat` on Windows).
 
-Once bjam is in your path, simply run bjam in the root of the Beast
+Once b2 is in your path, simply run b2 in the root of the Beast
 repository to automatically build the required Boost libraries if they
 are not already built, build the examples, then build and run the unit
 tests.
 
+On OSX it may be necessary to pass "toolset=clang" on the b2 command line.
+Alternatively, this may be site in site-config.jam or user-config.jam.
+
 The files in the repository are laid out thusly:
 
 ```
 ./
     bench/          Holds the benchmark sources and scripts
     bin/            Holds executables and project files
     bin64/          Holds 64-bit Windows executables and project files
+    examples/       Holds example program source code
+    extras/         Additional APIs, may change
     include/        Add this to your compiler includes
         nudb/
-    extras/         Additional APIs, may change
     test/           Unit tests and benchmarks
     tools/          Holds the command line tool sources
 ```

diff --git a/bench/README.md b/bench/README.md
@@ -15,21 +15,33 @@ tables have a row for each database size, and a column for each database (in
 cases where NuDB is compared against other databases). A cell in the table is
 the number of operations per second for that trial. For example, in the table
 below NuDB had 340397 Ops/Sec when fetching from an existing database with
-10,000,000 values.
+10,000,000 values. This is a summary report, and only reports samples at order
+of magnitudes of ten.
 
 A sample output:
 
+```
 insert (per second)
-        inserts          nudb       rocksdb
-        1000000     387894.04     148233.29
-        5000000     348982.15      93376.19
-       10000000     279767.88      62597.36
+    num_db_keys          nudb       rocksdb
+         100000        406598        231937
+        1000000        374330        258519
+       10000000            NA            NA
 
 fetch (per second)
-      # db keys          nudb       rocksdb
-        1000000     455249.16     164997.45
-        5000000     291651.66      40969.44
-       10000000     340397.87      21596.47
+    num_db_keys          nudb       rocksdb
+         100000        325228        697158
+        1000000        333443         34557
+       10000000        337300         20835
+```
+
+In addition to the summary report, the benchmark can collect detailed samples.
+The `--raw_out` command line options is used to specify a file to output the raw
+samples. The python 3 script `plot_bench.py` may be used to plot the result. For
+example, if bench was run as `bench --raw_out=samples.txt`, the the python
+script can be run as `python plot_bench.py -i samples.txt`. The python script
+requires the `pandas` and `seaborn` packages (anaconda python is a good way to
+install and manage python if these packages are not already
+installed: [anaconda download](https://www.continuum.io/downloads)).
 
 # Building
 
@@ -64,19 +76,19 @@ Note: Building with RocksDB is currently not supported on Windows.
 
 ## Test the build
 
-Try running the benchmark with a small database: `./bench --inserts=1000
-10000`. A report similar to sample should appear after a few seconds.
+Try running the benchmark with a small database: `./bench --num_batches=10`. A
+report similar to sample should appear after a few seconds.
 
 # Command Line Options
 
-*  `--inserts arg` : Number of values to insert. When timing fetches, the data
-   base will have this many values in it. The argument may be a list, so several
-   timing may be collected. For example, the sample output above was run with
-   `--inserts=1000000 5000000 10000000`. If `inserts` is not specified, it
-   defaults to `100000 1000000`
-*  `--fetches arg` : Number of values to fetch from the database. If `fetches`
-   is not specified, it defaults to `1000000`. Unlike `inserts`, `fetches` is
-   not a list. It takes a single value only.
+* `batch_size arg` : Number of elements to insert or fetch per batch. If not
+  specified, it defaults to 20000.
+* `num_batches arg` : Number of batches to run. If not specified, it defaults to
+  500.
+* `db_dir arg` : Directory to place the databases. If not specified, it defaults to
+  boost::filesystem::temp_directory_path (likely `/tmp` on Linux)
+* `raw_out arg` : File to record the raw measurements. This is useful for plotting. If
+  not specified the raw measurements will not be output.
 *  `--dbs arg` : Databases to run the benchmark on. Currently, only `nudb` and
    `rocksdb` are supported. Building with `rocksdb` is optional on Linux, and
    only `nudb` is supported on windows. The argument may be a list. If `dbs` is