Added documentation for concurrent validation

tomasvdw · Nov 24, 2016 · db759e5 · db759e5
1 parent e5f5b84
commit db759e5
Show file tree

Hide file tree

Showing 2 changed files with 44 additions and 13 deletions.
diff --git a/bitcrust-lib/README.md b/bitcrust-lib/README.md
@@ -2,19 +2,19 @@
 # Bitcrust-db
 
 - [Introduction](#introduction)
-- [Block content](#block_content)
-- [Spent tree instead of a UTXO-set](#spent_tree)
-- [Concurrent block validation](#spent_tree)
+- [Block content](#block-content)
+- [Spent tree instead of a UTXO-set](#spent-tree)
+- [Concurrent block validation](#concurrent-validation)
 
 ## Introduction
 
 Bitcoin-core uses a linearized model of the block tree. On-disk, only a main chain 
 is stored to ensure that there is always one authorative UTXO set.
 
-Bitcrust uses a tree-structure and stores spents instead of unspents. This has several key
-advantages in terms of performance, simplicity and most importantly, concurrency. 
+Bitcrust uses a tree-structure and indexes spents instead of unspents. This has several key
+advantages in terms of performance, minimal memory requirement, simplicity and most importantly, concurrency. 
 
-The first results are very positive and show that this approach addresses Core's major bottlenecks of
+The first results are very positive and show that this approach addresses Core's major bottlenecks in
 block verification.
 
 ## Block content
@@ -34,11 +34,11 @@ For more details, check the [store](src/store/) documentation
 
 When transactions are stored, only their scripts are validated. When blocks come in,
 we need to verify for each input of each transaction whether the referenced output exists and is unspent before
-this one. For this we use the spent-tree.
+this one. Instead of a UTXO-set we use the spent-tree.
 
 This is a table (stored in a flatfileset) consisting of three types of records: blocks, transactions and spents.
 
-Here we see a spent-tree with two blocks, where the 3rd transaction has one input referencing the 1st transaction (purple).
+Here we see a spent-tree with two blocks, where the third transaction has one input referencing the output of the first transaction (purple).
 
 
 ![Spent tree example 1](https://cdn.rawgit.com/tomasvdw/bitcrust/master/doc/spent-tree1.svg "Spent-tree example")
@@ -47,12 +47,41 @@ If another block (2b) comes with the same block 1 as parent this can be simply a
 
 ![Spent tree example 2](https://cdn.rawgit.com/tomasvdw/bitcrust/master/doc/spent-tree2.svg "Spent-tree example 2")
 
-The rule for verification is simple. A spent (purple) record can only be added if, when browser back through the 
-file, we will find the corresponding transaction before we find the same spent.
+The rule for verification is simple: A spent (purple) record can only be added if, when browsing back through the 
+records, we will find the corresponding transaction before we find the same spent. This ensures both the existence 
+of the referenced transaction, and it being unspent.
 
 Obviously, with hundreds of millions transactions, simply scanning won't do. This is where we 
-take advantage of the fact that these records are filepointers, and therefore *roughly* ordered. This allows us to create
-a *loose skip tree*: every records contains a set of "highway" pointers that point skip over records depnding on the value searched for.
+take advantage of the fact that these records are filepointers to the [block content](#block-content) fileset, and therefore *roughly* ordered. This allows us to create
+a *loose skip tree*. Similarly to a skip list, each record contains a set of "highway" pointers that skip over records depending on the value searched for:
 
 ![Spent tree example 3](https://cdn.rawgit.com/tomasvdw/bitcrust/master/doc/spent-tree3.svg "Spent-tree example 3")
-
+
+As the vast majority of spents refer to recent transactions, such skip tree can reduce the average number of nodes traversed per lookup to  about 100.
+
+Developers with knowledge about B-Trees and hash-tables may start to giggle at such high number of nodes per lookup, but they would be forgetting the major gains of the approach:
+
+* Superior locality of reference. As the majority of lookups is in the end of the tree, the accessed memory usually fits in the CPU cache.
+* The data structre is append-only, absolving the need for tranactional adding and removal of UTXO pointers. Adding to the tree 
+is done concurrently using CAS-semantics.
+* The structure is a tree on disk. This absolves the need for reorgs and for 
+writing undo-information. A reorg in bitcurst is simply the pointing to a different tip.
+* Parallel block validation. As there is no "main chain" at the storage level, concurrent blocks can
+be verified in parallel.
+
+## Concurrent validation
+
+One major cause for sleepless nights for nodes and miners is the idea of a _toxic block_ or transaction. 
+The flexibility of bitcoin allows one to create blocks that will cause a huge amount of time and effort to be processed and can thereby choke or even crash other
+  nodes and miners, especially smaller ones. A simple example being a non-segwit transaction with a huge amount of inputs which abuses quadratic hashing.
+
+  By its architecture, bitcrust is insensitive for such malice; blocks and transaction can be processed fully in parallel: 
+
+![Parallel validation](https://cdn.rawgit.com/tomasvdw/bitcrust/master/doc/parallel-validation.svg "Parellel validation")
+
+The long-lasting validation of block A does not at any point
+ block the validation of block B, C and D.
+
+ The actual orphaning and breaking of the connection (as well as deprioritizing) 
+ can be implemented using the same cost/benefit analysis as other DOS protection.
+
diff --git a/doc/parallel-validation.svg b/doc/parallel-validation.svg