Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tree-chunking.md #3799

Closed
wants to merge 3 commits into from
Closed

add tree-chunking.md #3799

wants to merge 3 commits into from

Conversation

ebuchman
Copy link
Contributor

Wasn't sure where to put this but opening for review/discussion.

I think I convinced myself that if we're going to use some form of chunking, we'll want to sync the chunks in order.

Copy link
Contributor

@ancazamfir ancazamfir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is the best way to start. I assume there will be an ADR for it and this PR is just a justification on why it needs to be done.
Also, what is the plan for the binance warp sync?

docs/architecture/tree-chunking.md Outdated Show resolved Hide resolved
front, and could then receive the other chunks in any order. However to
generalize, we may be required to receive multiple chunks in order first, before
we get to a point where we can receive them in any order.

Copy link
Contributor

@cwgoes cwgoes Jul 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a depth-first traversal, consider the following:

  • Chunkee & chunker negotiate a manifest m mapping chunk indices to contiguous subsets (ranges) of keyspace. They agree on n chunks, and they now both can map the integer range 0 to n - 1 to a subset of keyspace (start key, end key), where the end keys and start keys of contiguous chunks are identical (end is exclusive).

  • Chunkee requests chunk c_k from chunker. Chunker sends some c_l where k might or might not be equal to l.

  • Chunkee checks keys in c_l against known mapping from k. If keys are not within the range, chunkee rejects chunk & bans peer. Chunkee now knows that the keys do belong to chunk c_k, although it does not know if the values are correct or if all keys that should be in c_k were included.

  • Chunkee now chooses random key in the chunk and requests Merkle proof of key, value pair (against known state root from light client) from chunker, or chooses random key range not in the included keys but in the chunk range and requests range proof of non-inclusion from chunker. If Merkle proof is not provided, chunkee rejects chunk and bans peer. If Merkle proof does not validate, chunkee rejects chunk and bans peer.

  • Repeat random requests some r times with r a security parameter chosen according to expected peer behaviour, total number of chunks, etc.

  • Chunkee now knows (in expectation) that this chunk is valid - that it has the correct keys , the correct values, and all the keys within this range. Chunkee applies chunk.

  • Repeat for all chunks in range 0 to n - 1, in any order & in parallel if supported by the underlying tree.

  • At the end, verify whole tree. If the state root does not match, binary search by requesting Merkle proofs of subtrees from peers and fetch any incorrect chunks. This step could be taken earlier for partial subtrees if we're concerned about malicious peers.

For trees which can handle out-of-order-insertion, that should work in parallel as far as I can tell. It does require a tree which supports range exclusion proofs, and includes randomness to minimize the number of proofs required, which makes the security a bit more complex to reason about but I think will be more efficient in practice. This method should support trees and chunks of any size, with appropriate choices of security parameter, and can translate into sequential reads if the underlying store stores contiguous keyspace sequentially.

Maybe I'm missing something though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this still doesn't solve all connected chunker are the malicious (but provide consistent chunks) case.

If at first, the chunkee negotiates with 3 malicious peers, then it has no chance to connect to correct network.
If at first, the 2 peers send chunkee a set of chunk and keyspace mapping while other 2 peers send chunkee another kind of mapping, how the chunkee would behave (decide which 2 peers are malicious)?

If this is not the case merkle proof to resolve, I think negociate hash (which provides more information than indexes) of chunk (begin and end key can still be kept) would be easier.
I cannot see the additional benefit merkle proof provides than hash of each chunk. Hash verification is:

  1. much more efficient (without proof request-responde round trip)
  2. rejecting unexpected chunks earlier.
  3. doesn't need top layer chunks comes in order to achieve a verifiable tree, all chunks can be processed as long as it arrives

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Chunkee now chooses random key in the chunk and requests Merkle proof of key, value pair (against known state root from light client) from chunker

why choose random key for existence check? Wouldn't one whole range request be better (avoid c_l miss or add key but still in range)

Copy link
Contributor

@cwgoes cwgoes Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If at first, the 2 peers send chunkee a set of chunk and keyspace mapping while other 2 peers send chunkee another kind of mapping, how the chunkee would behave (decide which 2 peers are malicious)?

Different keyspace mappings could be valid, there doesn't need to be a single canonical one - if the chunkee and chunker can't agree on the manifest (maybe the chunkee wants smaller or larger chunks), the chunkee can disconnect.

If all connected chunkers are malicious, no strategy works, the chunkee can never obtain chunks if the chunkers don't want to send them since only the chunkers have the data. The best the chunkee can do is ban the peers.

If this is not the case merkle proof to resolve, I think negociate hash (which provides more information than indexes) of chunk (begin and end key can still be kept) would be easier.
I cannot see the additional benefit merkle proof provides than hash of each chunk.

The Merkle proof provides existence / non-existence proofs against a trusted state root - how would the chunkee know that a chunk with a particular hash was in fact the correct chunk for the keyspace?

why choose random key for existence check? Wouldn't one whole range request be better (avoid c_l miss or add key but still in range)

Just so the proof construction, bandwidth, and verifications costs are lower. You could request proofs for the whole chunk, it would be more expensive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would the chunkee know that a chunk with a particular hash was in fact the correct chunk for the keyspace?

later when the chunkee compares root hash with one in the block's header?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard it is to construct the chunk with a particular hash with invalid data inside?

Copy link
Contributor

@ackratos ackratos Jul 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different keyspace mappings could be valid, there doesn't need to be a single canonical one - if the chunkee and chunker can't agree on the manifest (maybe the chunkee wants smaller or larger chunks), the chunkee can disconnect.

Then how chunkee knows keyspace at first? Without fixed-size chunk, you cannot support eager state sync right?

The Merkle proof provides existence / non-existence proofs against a trusted state root - how would the chunkee know that a chunk with a particular hash was in fact the correct chunk for the keyspace?

In binance implementation, the manifest is single canoinical thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then how chunkee knows keyspace at first? Without fixed-size chunk, you cannot support eager state sync right?

Chunker and chunkee negotiate on a manifest mapping integer chunk indices to bounded subsets of (the entire) keyspace. What do fixed-size chunks have to do with eager state sync? The chunker and chunkee would also need to pick a block height of the state to sync, if that's what you mean.

Copy link
Contributor

@ackratos ackratos Jul 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chunker and chunkee negotiate on a manifest mapping integer chunk indices to bounded subsets of (the entire) keyspace.

I mean chunkee is brand new node, it doesn't have any information about how many keys in total. For entire keyspace did you mean whole possible 32 bytes of sha256 (hash of iavl tree node) i.e. 0x10..00 - 0xFF..FF?

What do fixed-size chunks have to do with eager state sync? The chunker and chunkee would also need to pick a block height of the state to sync, if that's what you mean.

I mean think about there are 100 chunkee peers and 1 chunker. All 100 chunkee want negotiate different chunk sizes of chunker. i.e. The first chunkee wants chunk size to be 1M, the second one wants 2M, the third one wants 3M, ..., the 100th chunkee want 100M. To serve 100 kinds of clients, if the chunker wants eagerly prepared the chunks (without traverse ival tree each time a request comes), does it need to prepare 100 kinds of chunks on disk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean chunkee is brand new node, it doesn't have any information about how many keys in total. For entire keyspace did you mean whole possible 32 bytes of sha256 (hash of iavl tree node) i.e. 0x10..00 - 0xFF..FF?

Yes; the manifest maps integer indices to contiguous subsets of the entire keyspace.

I mean think about there are 100 chunkee peers and 1 chunker. All 100 chunkee want negotiate different chunk sizes of chunker. i.e. The first chunkee wants chunk size to be 1M, the second one wants 2M, the third one wants 3M, ..., the 100th chunkee want 100M. To serve 100 kinds of clients, if the chunker wants eagerly prepared the chunks (without traverse ival tree each time a request comes), does it need to prepare 100 kinds of chunks on disk?

That's right, but there could be suggested chunk sizes (which could themselves change over time); it's all in the peer-to-peer protocol and altruism is required anyways.

docs/architecture/tree-chunking.md Show resolved Hide resolved
docs/architecture/tree-chunking.md Show resolved Hide resolved
docs/architecture/tree-chunking.md Show resolved Hide resolved

In each case, it appears that liveness is made much more difficult by the need
to figure out what went wrong with the applied chunks. However, these problems
can be eliminated by requiring chunks to be applied in-order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply chunk in order is not necessary and harms the performance of state sync.

What we do is calculate a mapping from iavl nodes in chunk to its multistore. (with help of record a map[storeKey]numOfNodesInThisStore and an index in the whole store of the first node of this chunk field in each chunk.
https://github.com/binance-chain/BEPs/blob/master/BEP18.md#541-app-state-chunk

Once we received a complete chunk, we write all nodes within it to corresponding multistore directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we do is calculate a mapping from iavl nodes in chunk to its multistore.

Right but isn't your mapping only verifiable according to the manifest, which depends on majority honest peers? This write up is specifically about chunking in the context of full light client security.

docs/architecture/tree-chunking.md Show resolved Hide resolved
with its index. For instance, if we receive chunk 5, the root hash of the
sub-tree contained therein should correspond to the 5th node in layer 10

While this example provides the intuition for how the design might work,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to propose https://github.com/binance-chain/BEPs/blob/master/BEP18.md#541-app-state-chunk again.

It solved the arbitrary node size and have effective chunk space utilization. :P

front, and could then receive the other chunks in any order. However to
generalize, we may be required to receive multiple chunks in order first, before
we get to a point where we can receive them in any order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this still doesn't solve all connected chunker are the malicious (but provide consistent chunks) case.

If at first, the chunkee negotiates with 3 malicious peers, then it has no chance to connect to correct network.
If at first, the 2 peers send chunkee a set of chunk and keyspace mapping while other 2 peers send chunkee another kind of mapping, how the chunkee would behave (decide which 2 peers are malicious)?

If this is not the case merkle proof to resolve, I think negociate hash (which provides more information than indexes) of chunk (begin and end key can still be kept) would be easier.
I cannot see the additional benefit merkle proof provides than hash of each chunk. Hash verification is:

  1. much more efficient (without proof request-responde round trip)
  2. rejecting unexpected chunks earlier.
  3. doesn't need top layer chunks comes in order to achieve a verifiable tree, all chunks can be processed as long as it arrives

front, and could then receive the other chunks in any order. However to
generalize, we may be required to receive multiple chunks in order first, before
we get to a point where we can receive them in any order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Chunkee now chooses random key in the chunk and requests Merkle proof of key, value pair (against known state root from light client) from chunker

why choose random key for existence check? Wouldn't one whole range request be better (avoid c_l miss or add key but still in range)

@ackratos
Copy link
Contributor

Also, what is the plan for the binance warp sync?

It's upgraded in production yesterday, you can try https://github.com/binance-chain/node-binary/tree/master/fullnode/prod/0.6.0

just turn on state_sync_reactor to true and state_sync_height to 0 https://github.com/binance-chain/node-binary/blob/master/fullnode/prod/0.6.0/config/config.toml#L19-L27

Co-Authored-By: Christopher Goes <cwgoes@pluranimity.org>
@codecov-io
Copy link

Codecov Report

Merging #3799 into master will decrease coverage by 0.14%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3799      +/-   ##
==========================================
- Coverage   64.97%   64.83%   -0.15%     
==========================================
  Files         216      216              
  Lines       17565    17565              
==========================================
- Hits        11413    11388      -25     
- Misses       5203     5222      +19     
- Partials      949      955       +6
Impacted Files Coverage Δ
privval/signer_validator_endpoint.go 75.55% <0%> (-10%) ⬇️
privval/signer_service_endpoint.go 83.63% <0%> (-5.46%) ⬇️
privval/socket_listeners.go 86.2% <0%> (-3.45%) ⬇️
p2p/pex/pex_reactor.go 83.13% <0%> (-1.17%) ⬇️
blockchain/reactor.go 70.56% <0%> (-0.94%) ⬇️
consensus/reactor.go 70.8% <0%> (-0.47%) ⬇️
blockchain/pool.go 80.26% <0%> (-0.33%) ⬇️

@brapse brapse mentioned this pull request Jul 22, 2019
5 tasks
@melekes melekes added the WIP label Sep 3, 2019
@melekes
Copy link
Contributor

melekes commented Sep 3, 2019

If I am not mistaken, @ebuchman have promised to expand on this.

@ebuchman
Copy link
Contributor Author

ebuchman commented Sep 7, 2019

If I am not mistaken, @ebuchman have promised to expand on this.

I think we probably want to consolidate with ADR 042. Maybe ADR-042 should actually have its own folder and we can put multiple files in there as we work this all out.

We still need to actually write up an ADR for the initial design - let's chat about it at next team meeting

@melekes melekes closed this Jan 22, 2020
@erikgrinaker erikgrinaker mentioned this pull request Jan 29, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants