Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blockchain: Reorg reactor #3561

Merged
merged 61 commits into from
Jul 23, 2019
Merged

blockchain: Reorg reactor #3561

merged 61 commits into from
Jul 23, 2019

Conversation

ancazamfir
Copy link
Contributor

This is the draft PR for a possible new implementation of the blockchain reactor. It is still work in progress and parts of the design might change.
One piece missing are tests fo the reactor.go:poolRoutine(), currently still covered mainly in reactor_test.go
I also need to write design notes and add more comments.

Anca Zamfir added 18 commits March 7, 2019 18:39
added block requests under peer

moved the request trigger in the reactor poolRoutine, triggered now by a ticker

in general moved everything required for making block requests smarter in the poolRoutine

added a simple map of heights to keep track of what will need to be requested next

added a few more tests
send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests
blockchain_old/reactor_test.go Outdated Show resolved Hide resolved
blockchain/reactor_fsm_test.go Outdated Show resolved Hide resolved
blockchain/reactor_fsm_test.go Outdated Show resolved Hide resolved
blockchain/reactor_fsm_test.go Outdated Show resolved Hide resolved
blockchain/reactor_fsm_test.go Outdated Show resolved Hide resolved
blockchain/pool_test.go Outdated Show resolved Hide resolved
blockchain_old/store.go Outdated Show resolved Hide resolved
blockchain/pool_test.go Outdated Show resolved Hide resolved
blockchain/pool_test.go Outdated Show resolved Hide resolved
blockchain_old/store_test.go Outdated Show resolved Hide resolved
@ancazamfir ancazamfir self-assigned this Apr 15, 2019
@xla xla changed the title [WIP] blockchain reactor reorg blockchain: Reorg reactor Apr 15, 2019
blockchain/reactor_fsm_test.go Outdated Show resolved Hide resolved
blockchain/reactor_fsm_test.go Outdated Show resolved Hide resolved
blockchain/reactor_fsm_test.go Outdated Show resolved Hide resolved
@tac0turtle tac0turtle changed the base branch from develop to master June 26, 2019 15:45
@tac0turtle
Copy link
Contributor

Can we merge this? If not, let's make a list of things that need to be completed prior to merging.

@ancazamfir
Copy link
Contributor Author

Can we merge this? If not, let's make a list of things that need to be completed prior to merging.

  • review for the blockchain version configuration flag changes and integration with behavior from @brapse
  • I plan to do more testing on gaia once SDK release for tendermint v0.32.0 is out

@@ -137,6 +141,7 @@ func (bcR *BlockchainReactor) SetLogger(l log.Logger) {

// OnStart implements cmn.Service.
func (bcR *BlockchainReactor) OnStart() error {
bcR.swReporter = behaviour.NewSwitcReporter(bcR.BaseReactor.Switch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know the switch has been setup by the time OnStart is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Through code inspection :) I could add a check, but probably we don't reach here with nil Switch and crash much earlier.

The switch is created in NewNode() by createSwitch() that adds the reactors and calls SetSwitch() for each reactor :

main() -> 
     NewRunNodeCmd(nodeProvider nm.NodeProvider) ->
        n, err := nodeProvider(config, logger) ->
           node.go:NewNode() -> 
              createSwitch(.., bcReactor, ...) -> 
                 sw.AddReactor("BLOCKCHAIN", bcReactor) ->
                     sw.reactors[name] = reactor
                     reactor.SetSwitch(sw)

Then the node is started and that causes the reactor to start:

main() -> 
     NewRunNodeCmd(nodeProvider nm.NodeProvider) ->
          n, err := nodeProvider(config, logger)
          err := n.Start() ->
              (n *Node) OnStart() -> 
                    n.sw.Start()
                          (sw *Switch) OnStart() -> 
                                 reactor.Start() -> 
                                     (bs *BaseService) Start() -> 
                                           bs.impl.OnStart()

timeout time.Duration
minRecvRate int64
sampleRate time.Duration
windowSize time.Duration
}

type bpPeer struct {
// BpPeer is the datastructure associated with a fast sync peer.
type BpPeer struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to make BpPeer private?

@brapse
Copy link
Contributor

brapse commented Jul 1, 2019

@ancazamfir Besides the questions above the behaviour and config flags LGTM 👍

@@ -18,6 +18,8 @@ program](https://hackerone.com/tendermint).

### FEATURES:

- [blockchain] \#3561 Blockchain Reorg Refactor, the new reactor currently sits under a feature flag, see [here](https://github.com/tendermint/tendermint/blob/master/config/toml.go#L297) on how to use it, for further information about the versions please see: [ADR-40](https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-040-blockchain-reactor-refactor.md) & [ADR-43](https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-043-blockchain-riri-org.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd write smth like

- [blockchain] \#3561 Add early version of the new blockchain reactor, which is supposed to be more modular and testable compared to the old version. To try it, you'll have to turn on the `??` feature flag in the config file. NOTE: It's not ready for a production yet. For further information, see [ADR-40](https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-040-blockchain-reactor-refactor.md) & [ADR-43](https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-043-blockchain-riri-org.md).

@tac0turtle tac0turtle marked this pull request as ready for review July 23, 2019 08:57
@tac0turtle tac0turtle requested a review from xla as a code owner July 23, 2019 08:57
@tac0turtle tac0turtle merged commit 4d7cd80 into master Jul 23, 2019
jackzampolin pushed a commit that referenced this pull request Aug 1, 2019
* go routines in blockchain reactor

* Added reference to the go routine diagram

* Initial commit

* cleanup

* Undo testing_logger change, committed by mistake

* Fix the test loggers

* pulled some fsm code into pool.go

* added pool tests

* changes to the design

added block requests under peer

moved the request trigger in the reactor poolRoutine, triggered now by a ticker

in general moved everything required for making block requests smarter in the poolRoutine

added a simple map of heights to keep track of what will need to be requested next

added a few more tests

* send errors to FSM in a different channel than blocks

send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests

* more pool tests

* lint errors

* more tests

* more tests

* switch fast sync to new implementation

* fixed data race in tests

* cleanup

* finished fsm tests

* address golangci comments :)

* address golangci comments :)

* Added timeout on next block needed to advance

* updating docs and cleanup

* fix issue in test from previous cleanup

* cleanup

* Added termination scenarios, tests and more cleanup

* small fixes to adr, comments and cleanup

* Fix bug in sendRequest()

If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.

While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.

* remove bpPeer's didTimeout field

* Use distinct err codes for peer timeout and FSM timeouts

* Don't allow peers to update with lower height

* review comments from Ethan and Zarko

* some cleanup, renaming, comments

* Move block execution in separate goroutine

* Remove pool's numPending

* review comments

* fix lint, remove old blockchain reactor and duplicates in fsm tests

* small reorg around peer after review comments

* add the reactor spec

* verify block only once

* review comments

* change to int for max number of pending requests

* cleanup and godoc

* Add configuration flag fast sync version

* golangci fixes

* fix config template

* move both reactor versions under blockchain

* cleanup, golint, renaming stuff

* updated documentation, fixed more golint warnings

* integrate with behavior package

* sync with master

* gofmt

* add changelog_pending entry

* move to improvments

* suggestion to changelog entry
@tac0turtle tac0turtle deleted the ancaz/blockchain_reactor_reorg branch October 5, 2019 17:20
tessr pushed a commit that referenced this pull request Nov 19, 2019
This commit contains commit messages from the 52 commits from Tendermint 0.32.0 to 0.32.7. This is a result of creating releases from our security advisories, rather than merging these advisories back into the main repo before creating releases. In the future, we will adopt a git workflow that will reduce these commits to only the commits that make up RC2 for (for example) Tendermint 0.32.8.

* docs: fix consensus spec formatting (#3804)

* abci/server: recover from app panics in socket server (#3809)

fixes #3800

* abci/client: fix DATA RACE in gRPC client (#3798)

* Remove go func {}()

closes #357

- Remove go func(){}() that caused race condiditon

- To reproduce
	- add -race in make file to `install_abci`
	- Remove `CGO_ENABLED=0` & add -race to `install`

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* remove -race

* fix data race

also, reorder callbacks similarly to socket client

* docs: "Writing a built-in Tendermint Core application in Go" guide (#3608)

* docs: go built-in guide

* fix package imports, add badger db, simplify Query

* newTendermint function

* working example

* finish the first guide

* add one more note

* add the second Golang guide - external ABCI app

* fix typos

* libs: Remove db from tendermint in favor of tendermint/tm-cmn (#3811)

* Remove db from tendemrint in favor of tendermint/tm-cmn

- remove db from `libs`
- update dependancy, there have been no breaking changes in the updated deps
	- https://github.com/grpc/grpc-go/releases
	- https://github.com/golang/protobuf/releases

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* changelog add

* gofmt

* more gofmt

*  docs: add A TOC to the Readme.md of ADR Section (#3820)

* ADR TOC in readme.md

* Added A TOC to the Readme.md of ADR Section

- Added table of contents to the Readme of the architecture section.
	- Easier to traverse and when you know what is there.
	- If the Adr's become viewable online it would help guide the user

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* add tm-cmn to subprojects

* normalize word

* rpc: make max_body_bytes and max_header_bytes configurable (#3818)

* rpc: make max_body_bytes and max_header_bytes configurable

* update changelog pending

* p2p/conn: Add Bufferpool (#3664)

* use byte buffer pool to decreass allocs

* wrap to put buffer in defer

* wapper defer

* add dependency

* remove Gopkg,*

* add change log

* rpc: /broadcast_evidence (#3481)

* implement broadcast_duplicate_vote endpoint

* fix test_cover

* address comments

* address comments

* Update abci/example/kvstore/persistent_kvstore.go

Co-Authored-By: mossid <torecursedivine@gmail.com>

* Update rpc/client/main_test.go

Co-Authored-By: mossid <torecursedivine@gmail.com>

* address comments in progress

* reformat the code

* make linter happy

* make tests pass

* replace BroadcastDuplicateVote with BroadcastEvidence

* fix test

* fix endpoint name

* improve doc

* fix TestBroadcastEvidenceDuplicateVote

* Update rpc/core/evidence.go

Co-Authored-By: Thane Thomson <connect@thanethomson.com>

* add changelog entry

* fix TestBroadcastEvidenceDuplicateVote

* mempool: make max_msg_bytes configurable (#3826)

* mempool: make max_msg_bytes configurable

* apply suggestions from code review

* update changelog pending

* apply suggestions from code review again

* rpc: return err if page is incorrect (less than 0 or greater than tot… (#3825)

* rpc: return err if page is incorrect (less than 0 or greater than total pages)

Fixes #3813

* fix rpc_test

* blockchain: Reorg reactor (#3561)

* go routines in blockchain reactor

* Added reference to the go routine diagram

* Initial commit

* cleanup

* Undo testing_logger change, committed by mistake

* Fix the test loggers

* pulled some fsm code into pool.go

* added pool tests

* changes to the design

added block requests under peer

moved the request trigger in the reactor poolRoutine, triggered now by a ticker

in general moved everything required for making block requests smarter in the poolRoutine

added a simple map of heights to keep track of what will need to be requested next

added a few more tests

* send errors to FSM in a different channel than blocks

send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests

* more pool tests

* lint errors

* more tests

* more tests

* switch fast sync to new implementation

* fixed data race in tests

* cleanup

* finished fsm tests

* address golangci comments :)

* address golangci comments :)

* Added timeout on next block needed to advance

* updating docs and cleanup

* fix issue in test from previous cleanup

* cleanup

* Added termination scenarios, tests and more cleanup

* small fixes to adr, comments and cleanup

* Fix bug in sendRequest()

If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.

While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.

* remove bpPeer's didTimeout field

* Use distinct err codes for peer timeout and FSM timeouts

* Don't allow peers to update with lower height

* review comments from Ethan and Zarko

* some cleanup, renaming, comments

* Move block execution in separate goroutine

* Remove pool's numPending

* review comments

* fix lint, remove old blockchain reactor and duplicates in fsm tests

* small reorg around peer after review comments

* add the reactor spec

* verify block only once

* review comments

* change to int for max number of pending requests

* cleanup and godoc

* Add configuration flag fast sync version

* golangci fixes

* fix config template

* move both reactor versions under blockchain

* cleanup, golint, renaming stuff

* updated documentation, fixed more golint warnings

* integrate with behavior package

* sync with master

* gofmt

* add changelog_pending entry

* move to improvments

* suggestion to changelog entry

* Renamed wire.go to codec.go (#3827)

* Renamed wire.go to codec.go

- Wire was the previous name of amino
- Codec describes the file better than `wire` & `amino`

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* ide error

* rename amino.go to codec.go

* docs: add guides to docs (#3830)

* add staticcheck linting (#3828)

cleanup to add linter

    grpc change:
        https://godoc.org/google.golang.org/grpc#WithContextDialer
        https://godoc.org/google.golang.org/grpc#WithDialer
        grpc/grpc-go#2627
    prometheous change:
        due to UninstrumentedHandler, being deprecated in the future
    empty branch = empty if or else statement
        didn't delete them entirely but commented
        couldn't find a reason to have them
    could not replicate the issue #3406
        but if want to keep it commented then we should comment out the if statement as well

* types: move MakeVote / MakeBlock functions (#3819)

to the types package

Paritally Fixes #3584

* p2p: Fix error logging for connection stop (#3824)

* p2p: fix false-positive error logging when stopping connections

This changeset fixes two types of false-positive errors occurring during
connection shutdown.

The first occurs when the process invokes FlushStop() or Stop() on a
connection. While the previous behavior did properly wait for the sendRoutine
to finish, it did not notify the recvRoutine that the connection was shutting
down. This would cause the recvRouting to receive and error when reading and
log this error. The changeset fixes this by notifying the recvRoutine that
the connection is shutting down.

The second occurs when the connection is terminated (gracefully) by the other side.
The recvRoutine would get an EOF error during the read, log it, and stop the connection
with an error. The changeset detects EOF and gracefully shuts down the connection.

* bring back the comment about flushing

* add changelog entry

* listen for quitRecvRoutine too

* we have to call stopForError

Otherwise peer won't be removed from the peer set and maybe readded
later.

* p2p: Do not write 'Couldn't connect to any seeds' if there are no seeds (#3834)

* Do not write 'Couldn't connect to any seeds' if there are no seeds

* changelog

* remove privValUpgrade

* Fix typo in changelog

* Update CHANGELOG_PENDING.md

Co-Authored-By: Marko <marbar3778@yahoo.com>

I'm setting up all peers dynamically by calling dial_peers, so p2p.seeds in configs is empty, and I'm seeing error log a lot in logs.

* docs: add a footer to guides (#3835)

* docs: "Writing a Tendermint Core application in Kotlin (gRPC)" guide (#3838)

* add abci grpc kotlin guide

* Update docs/guides/kotlin.md

Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>

* Update docs/guides/kotlin.md

Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>

* Update docs/guides/kotlin.md

Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>

* Update kotlin.md

*  node: allow replacing existing p2p.Reactor(s)  (#3846)

* node: allow replacing existing p2p.Reactor(s)

using [`CustomReactors`
option](https://godoc.org/github.com/tendermint/tendermint/node#CustomReactors).
Warning: beware of accidental name clashes. Here is the list of existing
reactors: MEMPOOL, BLOCKCHAIN, CONSENSUS, EVIDENCE, PEX.

* check the absence of "CUSTOM" prefix

* merge 2 tests

* add doc.go to node package

* gocritic (1/2) (#3836)

    Add gocritic as a linter

    The linting is not complete, but should i complete in this PR or in a following.

    23 files have been touched so it may be better to do in a following PR


Commits:

* Add gocritic to linting

- Added gocritic to linting

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* gocritic

* pr comments

* remove switch in cmdBatch

* tm-cmn to tm-db (#3850)

* tm-cmn to tm-db

* go.mod changes

* go.mod changes

* more go.mod

* fix tm-db

* ci fix, pending change

* version tmdb (#3854)

* txindexer: Refactor Tx Search Aggregation (#3851)

- Replace the previous intersect call, which was called at each query condition, with a map intersection.
- Replace fmt.Sprintf with string()

closes: #3076

Benchmarks

```
Old
goos: darwin
goarch: amd64
pkg: github.com/tendermint/tendermint/state/txindex/kv
BenchmarkTxSearch-4   	     200	 103641206 ns/op	 7998416 B/op	   71171 allocs/op
PASS
ok  	github.com/tendermint/tendermint/state/txindex/kv	26.019s

New
goos: darwin
goarch: amd64
pkg: github.com/tendermint/tendermint/state/txindex/kv
BenchmarkTxSearch-4   	    1000	  38615024 ns/op	13515226 B/op	  166460 allocs/op
PASS
ok  	github.com/tendermint/tendermint/state/txindex/kv	53.618s
```

~62% performance improvement

Commits:

* Refactor tx search

* Add pending changelog entry

* Add tx search benchmarking

* remove intermediate hashes list

also reset timer in BenchmarkTxSearch
and fix other benchmark

* fix import

* Add test cases

* Fix searching

* Replace fmt.Sprintf with string

* Update state/txindex/kv/kv.go

Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>

* Rename params

* Cleanup

* Check error in benchmarks

* release for v0.32.2

* Merge PR #3860: Update log v0.32.2

* changelog updates

* pr comments

* Fix for panic in signature verification if a peer sends a nil public key.

* update version.go

* Changelog update

* Update CHANGELOG.md

Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>

* update changelog

* p2p: only allow ed25519 pubkeys when connecting

also, recover from any possible failures in acceptPeers

Refs #4030

* update changelog and bump version to v0.32.6

* set the date to today

* cs: panic only when WAL#WriteSync fails

- modify WAL#Write and WAL#WriteSync to return an error

* types: validate Part#Proof

add ValidateBasic to crypto/merkle/SimpleProof

* cs: limit max bit array size and block parts count

* cs: test new limits

* cs: only assert important stuff

* update changelog and bump version to 0.32.7

* fixes after Ethan's review

* align max wal msg and max consensus msg sizes

* fix tests

* fix test

* Rc2 v0.32.8

Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>

* move issue to big fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants