[ADR - 43] blockchain riri-org #3753

brapse · 2019-06-25T11:49:55Z

The blockchain reactor has multiple high level responsibilities that would be easier to understand/test if seperated. This ADR contains a proposal splitting up the block processing and scheduling requests for fast-sync in order to achieve the proposed API in ADR-40.

Updated all relevant documentation in docs
Updated all code comments where relevant
Wrote tests
Updated CHANGELOG_PENDING.md

codecov-io · 2019-06-25T11:58:53Z

Codecov Report

Merging #3753 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3753      +/-   ##
==========================================
- Coverage   64.06%   64.03%   -0.03%     
==========================================
  Files         242      241       -1     
  Lines       19968    19966       -2     
==========================================
- Hits        12792    12785       -7     
- Misses       6137     6143       +6     
+ Partials     1039     1038       -1

Impacted Files	Coverage Δ
state/services.go	`25% <0%> (-25%)`	⬇️
state/validation.go	`66.38% <0%> (-18.1%)`	⬇️
privval/signer_service_endpoint.go	`85.45% <0%> (-3.64%)`	⬇️
blockchain/pool.go	`80.26% <0%> (-0.66%)`	⬇️
consensus/reactor.go	`71.94% <0%> (-0.62%)`	⬇️
rpc/core/mempool.go	`0% <0%> (ø)`	⬆️
privval/socket_listeners.go	`93.1% <0%> (ø)`	⬆️
consensus/state.go	`79.97% <0%> (ø)`	⬆️
tools/tm-bench/statistics.go	`0% <0%> (ø)`	⬆️
types/errors.go
... and 6 more

tac0turtle

👍

docs/architecture/adr-042-blockchain-riri-org.md

Co-Authored-By: Marko <marbar3778@yahoo.com>

melekes

🌟

melekes · 2019-06-27T13:04:58Z

docs/architecture/adr-042-blockchain-riri-org.md

+type Proccesor struct {
+    height ...
+    state ...
+    blocks [height]*Block


how many blocks will be stored in the worst case?

Good point. The worst case would be unbounded. We can look to provide back pressure by counting unprocessed messages and sending stopScheduling messages to the scheduler when we exceed some threshold. Was this your concern?

Yeah. I was thinking if there are any ways for malicious peer (group of peers) to exploit this.

We shouldn't be using len() on channels to "provide backpressure". Len(ch) should only be used in certain cases... (1) when you're the only reader routine, and you want to ensure that there exists something in the channel before trying something, but otherwise not pull from it. Even still, a value of len()=0 does not mean that a pull will fail. (2) for gathering statistics that does not affect the logic. Even if there was a save way to ensure that there is space in a channel before pushing to it, such a method would not be safe in the face of concurrent pushes to that channel. In other words, even if it works today, it is brittle and can easily break for a variety of reasons in the future.

https://groups.google.com/forum/#!topic/golang-nuts/yQw1Wx6BoUU

The only good general way to deal with fullness of channels is to use select{default:}, but this implies that either (1) you end up losing the thing you are trying to push, or (2) you need some complicate method of dealing with that lost data. A common antipattern here is to launch a new goroutine to push the value asynchronously which has two problems: (a) unlimited blowup of goroutines and (b) loss of order.

Seems like this ADR is incomplete... would like a complete idea of how to deal with full channels.

This design has evolved with the implementation to resolve a bunch of these open questions. In particular we explicitly use [priority queue](https://github.com/tendermint/tendermint/blob/master/blockchain/v2/routine.go#L22] within the each which is now bounded.

I'm not seeing anything that is bounding the size of the queue. bufferSize is used as a hint in constructing a priority queue. The // XXX backpressure comment and issue still exists in reactor.go.

Also, #4142

melekes · 2019-06-27T13:05:48Z

docs/architecture/adr-042-blockchain-riri-org.md

+        msg := <-ioMsgs
+        switch msg := msg.(type) {
+            case scBlockRequestMessage:
+                r.sendBlockToPeer(...)


what if IO is slow? won't this slow down demuxRoutine in effect?

IIRC this goes down results in a peer.TrySend which are queued and sent later until the queue fills up. Similar to the comment above, we can use the return value from trySend to send stopScheduling messages to the scheduler until the queue flushes.

docs/architecture/adr-042-blockchain-riri-org.md

ancazamfir

Looks good! Not sure what is the level of detail you want to have in this document wrt different scenarios. But in general I think it would benefit to clarify a bit more the responsibilities of the different modules (especially the processor and scheduler).
I have some comments for some of the code snippets, I know they are there to give hints ...but just to make them correct :)

docs/architecture/adr-042-blockchain-riri-org.md

ancazamfir · 2019-06-28T08:49:29Z

docs/architecture/adr-042-blockchain-riri-org.md

+    }
+    return events
+}
+


Not clear why the processor needs to keep track of peers. Maybe more details on responsibilities for the different modules could help.

From my understanding, The processor is responsible for removing/rescheduling all unprocessed block received from an errored peer. Is that true?

IMHO we need to clarify this. Looks like both the processor and the scheduler maintain some information about the fast sync peers and blocks. They may both detect errors and trigger peer removal and eventually the rescheduling of requests. Crossing messages may cause their view of peers and blocks to get out of sync. At this point the scheduler struct is not filled in, also how the actual scheduling works is missing. I think we need more details on the scheduler and then make the analysis.

The processor is responsible for removing/rescheduling all unprocessed block received from an errored peer. Is that true?

One way to think about Processor is that it should order, verify and execute blocks. If it fails by doing this for some block it will notify scheduler (and io) about error so they can do corresponding actions.

ancazamfir · 2019-06-28T08:52:43Z

docs/architecture/adr-042-blockchain-riri-org.md

+```
+
+## Schedule
+The scheduler (previously the pool) is responsible for squeding peer status requests and block request responses.


Suggested change

The scheduler (previously the pool) is responsible for squeding peer status requests and block request responses.

The scheduler (previously the pool) is responsible for handling status and block requests and responses.

(?) I think the scheduler is reponsible for handling all fast sync messages: statusRequest, statusResponse, blockRequest, blockResponse...
IMO we should add more detail here to clarify responsibilities.

ancazamfir · 2019-06-28T08:54:30Z

docs/architecture/adr-042-blockchain-riri-org.md

+	// clean peer list
+
+    events = []
+	for peerID := range schedule.peersTouchedSince(time) {


Suggested change

for peerID := range schedule.peersTouchedSince(time) {

for peerID := range schedule.peersNotTouchedSince(time) {

(?) Not sure I understand what this loop does.

This loop is intended to iterate over peers that have timed out, change their state to timed out and reschedule blocks pending from those timed out peer to other peers.

ancazamfir · 2019-06-28T08:55:13Z

docs/architecture/adr-042-blockchain-riri-org.md

+	}
+}
+```
+


Do we still have a monitor to detect slow peers?

Yes I think we the schedule.peersNotTouchedSince loop could be done on schedule.peersSlowerThan(minSpeed) to prune peers in a more sophisticated way. It could be as simple as that or if we may eventually need something more sophisticated. We could have a sort of PeerPruning strategy which could be applied to the schedule. Not sure if that's needed now or if we need to emit events indicating why the peer was pruned 🤔.

Both v0 and v1 have a monitor and remove slow peers. Events are sent to the switch to remove the slow peer. It is probably ok if we want to leave this aside initially for this refactor.

👍 Yeah I think we can do this without much trouble.

docs/architecture/adr-043-blockchain-riri-org.md

milosevic · 2019-07-05T14:23:07Z

docs/architecture/adr-043-blockchain-riri-org.md

+		msg := <- input
+		switch msg := msg.(type) {
+			case bcBlockRequestMessage:
+				output <- processor.handleBlockRequest(msg, time.Now())


Why current time is not appended to a message by demuxRoutine? This might be relevant as we had a scenario where msg was received by a reactor but delayed to be processed for quite some time. Also for debugging and testing purposes might be interesting.

This is a very good point! It seems like the demuxRoutine should be the sole component responsible for time ensuring a consistent view across *routines. One consequence of this change is that we need to ensure that the we channels are never blocked. If the demuxRoutine is responsible for dispatching correct timed messages to other channels, a block channel would block and starve the other channels for messages. We can do this by having a soft limit on processes that institute back pressure before their hard limit channel size is reached. I've added some inline comments to ensure we don't miss this during implementation.

For some concrete hints about how we can handle this, one technique we can use (which we also used in v1) is overflowing channel

milosevic · 2019-07-05T14:25:48Z

docs/architecture/adr-043-blockchain-riri-org.md

+
+```go
+func (r *BlockChainReactor) Start() {
+	r.msgs := make(chan Message, maxInFlight)


What this message buffer corresponds to?

This will be the channel which funnels messages from the Receive method to the demuxRoutine.

milosevic · 2019-07-05T14:27:15Z

docs/architecture/adr-043-blockchain-riri-org.md

+
+func (r *BlockchainReactor) AddPeer(peer p2p.Peer) {
+	...
+	r.msgs <- bcAddPeerEv{peer.ID}


Why we write here to a buffer which is used as an outgoing buffer for routines?

Sorry if that is not clear.This "buffer" would be used as the input for the routines. Out of curiosity, what would be the motivation for calling this a buffer here instead of channel?

docs/architecture/adr-043-blockchain-riri-org.md

milosevic · 2019-07-05T14:41:05Z

docs/architecture/adr-043-blockchain-riri-org.md

+```
+### Processor internals 
+
+The processor will be responsible for validating and processing blocks. The Processor will maintain an internal cursor `height` of the last processed block. As a set of blocks arrive unordered, the the Processor will check if it has `height+1` nessary to process the next block. The processor also maintains the map `blockPeers` of peers to height, to keep track of which peer provided the block at `height`. `blockPeers` can be used in`handleRemovePeer(...)` to reschedule all unprocessed blocks provided by a peer who has errored.


height seems to be the next block that should be processed and height-1 the last processed block. Typo two "the" in the second sentence.

milosevic · 2019-07-05T14:43:22Z

docs/architecture/adr-043-blockchain-riri-org.md

+			delete blocks[height]
+			height++
+			lastTouch = time
+			return pcBlockProcessed{height}


What if there are several blocks that can be processed? Should we have a loop here or have external mechanism to trigger this check?

milosevic · 2019-07-05T15:07:49Z

In general looks good to me. I think it gives sufficient details on how to proceed. Then some details or concerns can be addressed on the way when we focus on concrete routine.

ancazamfir

Could you add a section that covers high level a few scenarios where the fast sync terminates and switch to consensus happens?

ancazamfir · 2019-07-07T13:20:53Z

docs/architecture/adr-043-blockchain-riri-org.md

+			return pcBlockProcessed{height}
+		} else {
+			... // Delete all unprocessed block from the peer
+			return pcBlockProcessError{peerID, height}


We should remove peer for height+1 also. I was wondering if an error of processing 'height' (H) is enough to be sent to the scheduler. Probably not and the processor should include the height and the peers for both H and H+1.
In one case the scheduler would still have the blocks at H and H+1 and the same peers as the processor. But there might be a race condition where the scheduler has:

detected errors from P1 after it received H but before the processor has processed H, and/ or

detected errors from P2 after it received H+1 but before the processor has processed H

rescheduled H and H+1 with P1' and P2' respectively

received new blocks H and H+1 and sent them to the processor

now the processor tries to execute the block at H (from P1), detects an error and sends it to the scheduler.

the scheduler receives the error for H. It should not remove P1' and P2' and should not reschedule the blocks, therefore it needs the peerID for H and H+1.
Alternatively maybe call handleRemovePeer(peerID) below twice, for H and H+1?

In general I still have concerns on scheduler and processor getting out of sync and accumulation of events on processor side. In the example above the processor will have in its queue:
block(H, P1), block(H+1, P2),..., error(P1), error(P2), block(H, P1'), block(H+1, P2'),...

docs/architecture/adr-043-blockchain-riri-org.md

ancazamfir · 2019-07-07T13:50:37Z

docs/architecture/adr-043-blockchain-riri-org.md

+```
+### Processor internals 
+
+The processor will be responsible for validating and processing blocks. The Processor will maintain an internal cursor `height` of the last processed block. As a set of blocks arrive unordered, the the Processor will check if it has `height+1` nessary to process the next block. The processor also maintains the map `blockPeers` of peers to height, to keep track of which peer provided the block at `height`. `blockPeers` can be used in`handleRemovePeer(...)` to reschedule all unprocessed blocks provided by a peer who has errored.


Could you clarify if the processor receives the blocks from the scheduler (after scheduler validation) or directly from the demux routine? I thought is the former but the diagram shows they are coming from the demux (I think) and this paragraph is not clear.

ancazamfir · 2019-07-07T14:07:39Z

docs/architecture/adr-043-blockchain-riri-org.md

+
+func (sc *schedule) numBlockInState(state blockState) uint32 {
+	num := 0
+	for i := sc.minHeight(); i <= sc.maxHeight(); i++ {


How is minHeight() computed? What about maxHeight()?

ancazamfir · 2019-07-07T14:13:28Z

docs/architecture/adr-043-blockchain-riri-org.md

+func (sc *schedule) popSchedule(maxRequest int) []scBlockRequestMessage {
+	// We only want to schedule requests such that we have less than sc.targetPending and sc.targetReceived
+	// This ensures we don't saturate the network or flood the processor with unprocessed blocks
+	todo := min(sc.targetPending - sc.numBlockInState(blockStatePending), sc.numBlockInState(blockStateReceived))


Not sure this works as it looks like todo starts with 0 (there should be 0 received blocks initially) and the loop will break immediately. Also, not clear it protects from growing the block pool if we receive blocks faster than we process (which is the case). But I think we can figure this out during coding where any issues will become obvious.

ancazamfir · 2019-07-07T14:14:57Z

docs/architecture/adr-043-blockchain-riri-org.md

+    ...
+}
+
+func handlePeerError(peerID)  {


When is this called? On errors from processor?
Should we also have a handlePeerRemove()? It is not clear how the switch sending PeerRemove is handled, should it be handled in both processor and scheduler?

ancazamfir · 2019-07-07T14:25:20Z

docs/architecture/adr-043-blockchain-riri-org.md

+		if todo == 0 {
+			break
+		}
+		if blockStates[i] == blockStateNew {


How is blockStates slice initialized?

ancazamfir · 2019-07-07T14:44:43Z

docs/architecture/adr-043-blockchain-riri-org.md

+	// Delete all unprocessed blocks from peerID
+	for i = height; i < len(blocks); i++ {
+		if blockPeers[i] == peerID {
+			events = append(events, pcBlockReschedule{height})


Should the event include the peerID of the height? It's for similar reasons as above where the scheduler may have already rescheduled this (if it also processes RemovePeer events)

Co-Authored-By: Anca Zamfir <ancazamfir@users.noreply.github.com>

brapse · 2019-07-08T15:11:22Z

Documented the pending concern about processing invalidated blocks in #3777. Looks like this is ready to merge 👍

brapse added 2 commits June 25, 2019 13:38

[adr] First draft on adr-042

f9e5564

fix diagram urls

5ceb358

tac0turtle approved these changes Jun 25, 2019

View reviewed changes

docs/architecture/adr-042-blockchain-riri-org.md Outdated Show resolved Hide resolved

docs/architecture/adr-042-blockchain-riri-org.md Outdated Show resolved Hide resolved

docs/architecture/adr-042-blockchain-riri-org.md Outdated Show resolved Hide resolved

brapse and others added 5 commits June 25, 2019 16:59

Update docs/architecture/adr-042-blockchain-riri-org.md

7de9e8e

Co-Authored-By: Marko <marbar3778@yahoo.com>

Update docs/architecture/adr-042-blockchain-riri-org.md

92b1d5f

Co-Authored-By: Marko <marbar3778@yahoo.com>

Update docs/architecture/adr-042-blockchain-riri-org.md

4ba4d36

Co-Authored-By: Marko <marbar3778@yahoo.com>

add go syntax highlight

61e681e

more highlighting

bd9a7ae

tac0turtle changed the base branch from develop to master June 26, 2019 15:33

brapse added 2 commits June 27, 2019 12:41

consistency fixes

acfc7fa

Add references

2d2c6df

brapse marked this pull request as ready for review June 27, 2019 12:21

brapse requested review from ebuchman, melekes and xla as code owners June 27, 2019 12:21

melekes approved these changes Jun 27, 2019

View reviewed changes

tac0turtle added the ready-for-review label Jun 27, 2019

ebuchman reviewed Jun 27, 2019

View reviewed changes

docs/architecture/adr-042-blockchain-riri-org.md Outdated Show resolved Hide resolved

ancazamfir reviewed Jun 28, 2019

View reviewed changes

brapse added 5 commits June 29, 2019 10:58

new adr number

74d77f4

Fixes based on feedback

0f48477

aditional state info

554605c

Add details on getSchedule

d88d1c5

replace spaces with tabs

0c08e8f

milosevic reviewed Jul 5, 2019

View reviewed changes

docs/architecture/adr-043-blockchain-riri-org.md Outdated Show resolved Hide resolved

milosevic reviewed Jul 5, 2019

View reviewed changes

docs/architecture/adr-043-blockchain-riri-org.md Outdated Show resolved Hide resolved

milosevic reviewed Jul 5, 2019

View reviewed changes

docs/architecture/adr-043-blockchain-riri-org.md Outdated Show resolved Hide resolved

milosevic reviewed Jul 5, 2019

View reviewed changes

docs/architecture/adr-043-blockchain-riri-org.md Show resolved Hide resolved

milosevic reviewed Jul 5, 2019

View reviewed changes

tac0turtle changed the title ~~[ADR] blockchain riri-org~~ [ADR - 43] blockchain riri-org Jul 6, 2019

fixes based on feedback

69ae32d

ancazamfir reviewed Jul 8, 2019

View reviewed changes

brapse and others added 5 commits July 8, 2019 09:14

add clarity around r.msgs

a270731

clarify block processing

2db192a

fix off by one error

86fbf13

additional details on ioRoutine

c103f5c

Update docs/architecture/adr-043-blockchain-riri-org.md

ef5e5f2

Co-Authored-By: Anca Zamfir <ancazamfir@users.noreply.github.com>

brapse mentioned this pull request Jul 8, 2019

[blockchain] Riri processor-scheduler peer behaviour #3777

Closed

Merge branch 'master' into adr-blockchain-riri-org

d710ffb

melekes merged commit 1d5fcc2 into tendermint:master Jul 8, 2019

	The scheduler (previously the pool) is responsible for squeding peer status requests and block request responses.
	The scheduler (previously the pool) is responsible for handling status and block requests and responses.

	for peerID := range schedule.peersTouchedSince(time) {
	for peerID := range schedule.peersNotTouchedSince(time) {

[ADR - 43] blockchain riri-org #3753

[ADR - 43] blockchain riri-org #3753

Conversation

brapse commented Jun 25, 2019 • edited

codecov-io commented Jun 25, 2019 • edited

Codecov Report

tac0turtle left a comment

Choose a reason for hiding this comment

melekes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaekwon Nov 14, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ancazamfir left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

milosevic Jul 5, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

milosevic commented Jul 5, 2019

ancazamfir left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brapse commented Jul 8, 2019

brapse commented Jun 25, 2019 •

edited

codecov-io commented Jun 25, 2019 •

edited

jaekwon Nov 14, 2019 •

edited

milosevic Jul 5, 2019 •

edited