Skip to content

Commit

Permalink
Added documentation for concurrent validation
Browse files Browse the repository at this point in the history
  • Loading branch information
tomasvdw committed Nov 24, 2016
1 parent e5f5b84 commit db759e5
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 13 deletions.
55 changes: 42 additions & 13 deletions bitcrust-lib/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,19 @@
# Bitcrust-db

- [Introduction](#introduction)
- [Block content](#block_content)
- [Spent tree instead of a UTXO-set](#spent_tree)
- [Concurrent block validation](#spent_tree)
- [Block content](#block-content)
- [Spent tree instead of a UTXO-set](#spent-tree)
- [Concurrent block validation](#concurrent-validation)

## Introduction

Bitcoin-core uses a linearized model of the block tree. On-disk, only a main chain
is stored to ensure that there is always one authorative UTXO set.

Bitcrust uses a tree-structure and stores spents instead of unspents. This has several key
advantages in terms of performance, simplicity and most importantly, concurrency.
Bitcrust uses a tree-structure and indexes spents instead of unspents. This has several key
advantages in terms of performance, minimal memory requirement, simplicity and most importantly, concurrency.

The first results are very positive and show that this approach addresses Core's major bottlenecks of
The first results are very positive and show that this approach addresses Core's major bottlenecks in
block verification.

## Block content
Expand All @@ -34,11 +34,11 @@ For more details, check the [store](src/store/) documentation

When transactions are stored, only their scripts are validated. When blocks come in,
we need to verify for each input of each transaction whether the referenced output exists and is unspent before
this one. For this we use the spent-tree.
this one. Instead of a UTXO-set we use the spent-tree.

This is a table (stored in a flatfileset) consisting of three types of records: blocks, transactions and spents.

Here we see a spent-tree with two blocks, where the 3rd transaction has one input referencing the 1st transaction (purple).
Here we see a spent-tree with two blocks, where the third transaction has one input referencing the output of the first transaction (purple).


![Spent tree example 1](https://cdn.rawgit.com/tomasvdw/bitcrust/master/doc/spent-tree1.svg "Spent-tree example")
Expand All @@ -47,12 +47,41 @@ If another block (2b) comes with the same block 1 as parent this can be simply a

![Spent tree example 2](https://cdn.rawgit.com/tomasvdw/bitcrust/master/doc/spent-tree2.svg "Spent-tree example 2")

The rule for verification is simple. A spent (purple) record can only be added if, when browser back through the
file, we will find the corresponding transaction before we find the same spent.
The rule for verification is simple: A spent (purple) record can only be added if, when browsing back through the
records, we will find the corresponding transaction before we find the same spent. This ensures both the existence
of the referenced transaction, and it being unspent.

Obviously, with hundreds of millions transactions, simply scanning won't do. This is where we
take advantage of the fact that these records are filepointers, and therefore *roughly* ordered. This allows us to create
a *loose skip tree*: every records contains a set of "highway" pointers that point skip over records depnding on the value searched for.
take advantage of the fact that these records are filepointers to the [block content](#block-content) fileset, and therefore *roughly* ordered. This allows us to create
a *loose skip tree*. Similarly to a skip list, each record contains a set of "highway" pointers that skip over records depending on the value searched for:

![Spent tree example 3](https://cdn.rawgit.com/tomasvdw/bitcrust/master/doc/spent-tree3.svg "Spent-tree example 3")


As the vast majority of spents refer to recent transactions, such skip tree can reduce the average number of nodes traversed per lookup to about 100.

Developers with knowledge about B-Trees and hash-tables may start to giggle at such high number of nodes per lookup, but they would be forgetting the major gains of the approach:

* Superior locality of reference. As the majority of lookups is in the end of the tree, the accessed memory usually fits in the CPU cache.
* The data structre is append-only, absolving the need for tranactional adding and removal of UTXO pointers. Adding to the tree
is done concurrently using CAS-semantics.
* The structure is a tree on disk. This absolves the need for reorgs and for
writing undo-information. A reorg in bitcurst is simply the pointing to a different tip.
* Parallel block validation. As there is no "main chain" at the storage level, concurrent blocks can
be verified in parallel.

## Concurrent validation

One major cause for sleepless nights for nodes and miners is the idea of a _toxic block_ or transaction.
The flexibility of bitcoin allows one to create blocks that will cause a huge amount of time and effort to be processed and can thereby choke or even crash other
nodes and miners, especially smaller ones. A simple example being a non-segwit transaction with a huge amount of inputs which abuses quadratic hashing.

By its architecture, bitcrust is insensitive for such malice; blocks and transaction can be processed fully in parallel:

![Parallel validation](https://cdn.rawgit.com/tomasvdw/bitcrust/master/doc/parallel-validation.svg "Parellel validation")

The long-lasting validation of block A does not at any point
block the validation of block B, C and D.

The actual orphaning and breaking of the connection (as well as deprioritizing)
can be implemented using the same cost/benefit analysis as other DOS protection.

2 changes: 2 additions & 0 deletions doc/parallel-validation.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit db759e5

Please sign in to comment.