Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.31.6 release preview #3637

Closed
wants to merge 46 commits into from
Closed

v0.31.6 release preview #3637

wants to merge 46 commits into from

Conversation

melekes
Copy link
Contributor

@melekes melekes commented May 7, 2019

DO NOT MERGE

Not included:

Commits:

melekes and others added 30 commits April 17, 2019 16:44
…3573)

* use dialPeer function in a seed mode

Fixes #3532

by storing a number of attempts we've tried to connect in-memory and
removing the address from addrbook when number of attempts > 16
Continues from #3280 in building support for batched requests/responses in the JSON RPC (as per issue #3213).

* Add JSON RPC batching for client and server

As per #3213, this adds support for [JSON RPC batch requests and
responses](https://www.jsonrpc.org/specification#batch).

* Add additional checks to ensure client responses are the same as results

* Fix case where a notification is sent and no response is expected

* Add test to check that JSON RPC notifications in a batch are left out in responses

* Update CHANGELOG_PENDING.md

* Update PR number now that PR has been created

* Make errors start with lowercase letter

* Refactor batch functionality to be standalone

This refactors the batching functionality to rather act in a standalone
way. In light of supporting concurrent goroutines making use of the same
client, it would make sense to have batching functionality where one
could create a batch of requests per goroutine and send that batch
without interfering with a batch from another goroutine.

* Add examples for simple and batch HTTP client usage

* Check errors from writer and remove nolinter directives

* Make error strings start with lowercase letter

* Refactor examples to make them testable

* Use safer deferred shutdown for example Tendermint test node

* Recompose rpcClient interface from pre-existing interface components

* Rename WaitGroup for brevity

* Replace empty ID string with request ID

* Remove extraneous test case

* Convert first letter of errors.Wrap() messages to lowercase

* Remove extraneous function parameter

* Make variable declaration terse

* Reorder WaitGroup.Done call to help prevent race conditions in the face of failure

* Swap mutex to value representation and remove initialization

* Restore empty JSONRPC string ID in response to prevent nil

* Make JSONRPCBufferedRequest private

* Revert PR hard link in CHANGELOG_PENDING

* Add client ID for JSONRPCClient

This adds code to automatically generate a randomized client ID for the
JSONRPCClient, and adds a check of the IDs in the responses (if one was
set in the requests).

* Extract response ID validation into separate function

* Remove extraneous comments

* Reorder fields to indicate clearly which are protected by the mutex

* Refactor for loop to remove indexing

* Restructure and combine loop

* Flatten conditional block for better readability

* Make multi-variable declaration slightly more readable

* Change for loop style

* Compress error check statements

* Make function description more generic to show that we support different protocols

* Preallocate memory for request and result objects
* add a deterministic timeout

Co-Authored-By: kevlubkcm <36485490+kevlubkcm@users.noreply.github.com>
…lices (#2611) (#3530)

(#2611) had suggested that an iterative version of
SimpleHashFromByteSlice would be faster, presumably because
 we can envision some overhead accumulating from stack
frames and function calls. Additionally, a recursive algorithm risks
hitting the stack limit and causing a stack overflow should the tree
be too large.

Provided here is an iterative alternative, a simple test to assert
correctness and a benchmark. On the performance side, there appears to
be no overall difference:

```
BenchmarkSimpleHashAlternatives/recursive-4                20000 77677 ns/op
BenchmarkSimpleHashAlternatives/iterative-4                20000 76802 ns/op
```

On the surface it might seem that the additional overhead is due to
the different allocation patterns of the implementations. The recursive
version uses a single `[][]byte` slices which it then re-slices at each level of the tree.
The iterative version reproduces `[][]byte` once within the function and
then rewrites sub-slices of that array at each level of the tree.

Eexperimenting by modifying the code to simply calculate the
hash and not store the result show little to no difference in performance.

These preliminary results suggest:
1. The performance of the current implementation is pretty good
2. Go has low overhead for recursive functions
3. The performance of the SimpleHashFromByteSlice routine is dominated
by the actual hashing of data

Although this work is in no way exhaustive, point #3 suggests that
optimizations of this routine would need to take an alternative
approach to make significant improvements on the current performance.

Finally, considering that the recursive implementation is easier to
read, it might not be worthwhile to switch to a less intuitive
implementation for so little benefit.

* re-add slice re-writing
* [crypto] Document SimpleHashFromByteSlicesIterative
…3580)

This should fix #3576 (ran it many times locally but only time will tell). The test actually only checked for the opcode of the error. From the name of the test we actually want to test if we see a timeout after a pre-defined time.

## Commits:

* increase readWrite timeout as it is also used in the `Accept` of the tcp
listener:

 - before this caused the readWriteTimeout to kick in (rarely) while Accept
 - as a side-effect: remove obsolete time.Sleep: in both listener cases
 the Accept will only return after successfully accepting and the timeout
 that is supposed to be tested here will be triggered because there is a
 read without a write
 - see if we actually run into a timeout error (the whole purpose of
 this test AFAIU)

Signed-off-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

* makee local test-vars `const`

Signed-off-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

## Additional comments:

@xla: Confusing how an accept could take longer than that, but assuming a noisy environment full of little docker whales will be slower than what 50 years of silicon are capable of. The only thing I'd be vary of is that we mask structural issues with the code by just bumping the timeout, if we are sensitive towards that it warrants invesigation, but again this might only be true in the environment our CI runs in.
* check every block appHash

Fixes #3460

Not really needed, but it would detect if the app changed in a way it
shouldn't have.

* add a changelog entry

* no need to return an error if we panic

* rename methods

* rename methods once again

* add a test

* correct error msg

plus fix a few go-lint warnings

* better panic msg
Option to explicitly provide the -config in the testnet command, closing #3160.
Should fix #3451, #2723 and #3317.

Test TestResetTimeoutPrecommitUponNewHeight is simplified so it reduces a risk of timeout failure. Furthermore, timeout we wait for TimeoutEvents is increased, and the timeout value is more precisely computed. This should hopefully decrease a chance of non-deterministic test failures.

This assertion is problematic to ensure consistently due to dependency on scheduler. On the other hand, if I am not wrong, order in which messages are read from the channel respects order in which messages are written. Therefore, process will receive 2f+1 precommits that are not all for v (one is for nil) so TriggeredTimeoutPrecommit would be set to true. So we don't need to assert it. I know that it would be better to still assert to it but I don't know how to do it without sleep and that is ugly and is causing us nondeterministic failures.
Minor updates to reflect squash merging and how to prepare releases
* Remove deprecated PanicXXX functions from codebase

As per discussion over
[here](#3456 (comment)),
we need to remove these `PanicXXX` functions and eliminate our
dependence on them. In this PR, each and every `PanicXXX` function call
is replaced with a simple `panic` call.

* add a changelog entry
The node.NewNode method is pretty complex at the moment, an in order to address issues like #3156, we need to simplify the interface for partial node instantiation. In some places, we don't need to build up a full node (like in the node.TestCreateProposalBlock test), but the complexity of such partial instantiation needs to be reduced.

This PR aims to eventually make this easier/simpler.

See also this gist https://gist.github.com/thanethomson/56e1640d057a26186e38ad678a1d114c for some background work done when starting to refactor here.

## Commits:

* [WIP] Refactor node.NewNode to simplify

The `node.NewNode` method is pretty complex at the moment, an in order
to address issues like #3156, we need to simplify the interface for
partial node instantiation. In some places, we don't need to build up a
full node (like in the `node.TestCreateProposalBlock` test), but the
complexity of such partial instantiation needs to be reduced.

This PR aims to eventually make this easier/simpler.

* Refactor state loading and genesis doc provider into state package

* Refactor for clarity of return parameters

* Fix incorrect capitalization of error messages

* Simplify extracted functions' names

* Document optionally-prefixed functions

* Refactor optionallyFastSync for clarity of separation of concerns

* Restructure function for early return

* Restructure function for early return

* Remove dependence on deprecated panic functions

* refactor code a bit more

plus, expose PEXReactor on node

* align logger names

* add a changelog entry

* align logger names 2

* add a note about PEXReactor returning nil
If we are low on addresses for peering, we need to discover more peers. The
previous behavior would query existing peers; however, if an existing peer
does not participate in peer exchange, then our node will not discover more peers.

This change consults both existing peers as well as seeds when there is a deficit
in address book addresses. This allows for discovering peers though existing channels
as well as via seeds if existing peers do not share addresses.
* p2p: merge switch cases

also improve the error msg in privval

* pex: refactor code plus update specification

follow-up to #3603

* Update docs/spec/reactors/pex/pex.md

Co-Authored-By: melekes <anton.kalyaev@gmail.com>
…3067)

* execCommitBlock should not read from state.lastValidators

* fix height 1

* fix blockchain/reactor_test

* fix consensus/mempool_test

* fix consensus/reactor_test

* fix consensus/replay_test

* add CHANGELOG

* fix consensus/reactor_test

* fix consensus/replay_test

* add a test for replay validators change

* fix mem_pool test

* fix byzantine test

* remove a redundant code

* reduce validator change blocks to 6

* fix

* return peer0 config

* seperate testName

* seperate testName 1

* seperate testName 2

* seperate app db path

* seperate app db path 1

* add a lock before startNet

* move the lock to reactor_test

* simulate just once

* try to find problem

* handshake only saveState when app version changed

* update gometalinter to 3.0.0 (#3233)

in the attempt to fix https://circleci.com/gh/tendermint/tendermint/43165

also

    code is simplified by running gofmt -s .
    remove unused vars
    enable linters we're currently passing
    remove deprecated linters
(cherry picked from commit d470945)

* gofmt code

* goimport code

* change the bool name to testValidatorsChange

* adjust receive kvstore.ProtocolVersion

* adjust receive kvstore.ProtocolVersion 1

* adjust receive kvstore.ProtocolVersion 3

* fix merge execution.go

* fix merge develop

* fix merge develop 1

* fix run cleanupFunc

* adjust code according to reviewers' opinion

* modify the func name match the convention

* simplify simulate a chain containing some validator change txs 1

* test CI error

* Merge remote-tracking branch 'upstream/develop' into fixReplay 1

* fix pubsub_test

* subscribeUnbuffered vote channel
## Description

Previously only outbound peers can be persistent.

Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect. This part is actually optional and can be reverted.

Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes #3362

## Commits

* make persistent prop independent of conn direction

Previously only outbound peers can be persistent. Now, even if the peer
is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect.

Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes #3362

* fix TestPEXReactorDialPeer test

* add a changelog entry

* update changelog

* add two tests

* reformat code

* test UnsafeDialPeers and UnsafeDialSeeds

* add TestSwitchDialPeersAsync

* spec: update p2p/config spec

* fixes after Ismail's review

* Apply suggestions from code review

Co-Authored-By: melekes <anton.kalyaev@gmail.com>

* fix merge conflict

* remove sleep from TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

We don't need it actually.
## Description

Refs #2659

Breaking changes in the mempool package:

[mempool] #2659 Mempool now an interface
    old Mempool renamed to CListMempool
    NewMempool renamed to NewCListMempool
    Option renamed to CListOption
    MempoolReactor renamed to Reactor
    NewMempoolReactor renamed to NewReactor
    unexpose TxID method
    TxInfo.PeerID renamed to SenderID
    unexpose MempoolReactor.Mempool

Breaking changes in the state package:

[state] #2659 Mempool interface moved to mempool package
    MockMempool moved to top-level mock package and renamed to Mempool

Non Breaking changes in the node package:

[node] #2659 Add Mempool method, which allows you to access mempool

## Commits

* move Mempool interface into mempool package

Refs #2659

Breaking changes in the mempool package:

- Mempool now an interface
- old Mempool renamed to CListMempool

Breaking changes to state package:

- MockMempool moved to mempool/mock package and renamed to Mempool
- Mempool interface moved to mempool package

* assert CListMempool impl Mempool

* gofmt code

* rename MempoolReactor to Reactor

- combine everything into one interface
- rename TxInfo.PeerID to TxInfo.SenderID
- unexpose MempoolReactor.Mempool

* move mempool mock into top-level mock package

* add a fixme

TxsFront should not be a part of the Mempool interface
because it leaks implementation details. Instead, we need to come up
with general interface for querying the mempool so the MempoolReactor
can fetch and broadcast txs to peers.

* change node#Mempool to return interface

* save commit = new reactor arch

* Revert "save commit = new reactor arch"

This reverts commit 1bfceac.

* require CListMempool in mempool.Reactor

* add two changelog entries

* fixes after my own review

* quote interfaces, structs and functions

* fixes after Ismail's review

* make node's mempool an interface

* make InitWAL/CloseWAL methods a part of Mempool interface

* fix merge conflicts

* make node's mempool an interface
* VoteSignBytes builds CanonicalVote

* CommitVotes implements VoteSetReader

- new CommitVotes struct holds both the Commit and the ValidatorSet and
  implements VoteSetReader
- ToVote takes a ValidatorSet

* fix TestCommit

* use CommitSig.BlockID

Commits may include votes for a different BlockID, could be nil,
or different altogether. This means we can't use `commit.BlockID`
for reconstructing the sign bytes, since up to -1/3 of the commits
might be for independent BlockIDs. This means CommitSig will need to
include an indicator for what BlockID it signed - if it's not the
committed one or nil, it will need to include it fully in order to be
verified. This is unfortunate but unavoidable so long as we include
votes for non-committed BlockIDs (which we do to track validator
liveness)

* fixes from review

* remove CommitVotes. CommitSig contains address

* remove commit.canonicalVote method

* toVote -> getVote, takes valIdx

* update adr-025

* commit.ToVoteSet -> CommitToVoteSet

* add test

* fix from review
* ADR 037 Initial draft

* Add link to initial conversation

* Update initial draft

* Fix indentations

* Change negative consequences

* Fix indentations

* Change ADR id

* Add one more positive consequence

* Rollback ADR number
Description:

    add boltdb implement for db interface.

    The boltdb is safe and high performence embedded KV database. It is based on B+tree and Mmap.
    Now it is maintained by etcd-io org. https://github.com/etcd-io/bbolt
    It is much better than LSM tree (levelDB) when RANDOM and SCAN read.

Replaced #3604
Refs #1835

Commits:

* add boltdb for storage

* update boltdb in gopkg.toml

* update bbolt in Gopkg.lock

* dep update  Gopkg.lock

* add boltdb'bucket when create Boltdb

* some modifies

* address my own comments

* fix most of the Iterator tests for boltdb

* bolt does not support concurrent writes while iterating

* add WARNING word

* note that ReadOnly option is not supported

* fixes after Ismail's review

Co-Authored-By: melekes <anton.kalyaev@gmail.com>

* panic if View returns error

plus specify bucket in the top comment
Refs #3610 (comment)

If we do not close, other txs will be stuck forever.
Leo and others added 6 commits May 6, 2019 20:10
`[]byte` is unhashable and needs to be casted to a string.

See #3610 (comment)

Ref https://github.com/tendermint/tendermint/issues/3626
## Description

also

    handle errors from DialPeersAsync
    remove nil addr from log msg
    fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

This is a follow-up from
#3593 (review)

Fixes most of the #3617, except #3593 (comment)

## Commits

* rpc: /dial_peers: only mark peers as persistent if flag is on

also

- handle errors from DialPeersAsync
- remove nil addr from log msg
- fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

This is a follow-up from
#3593 (review)

* remove a call to AddPersistentPeers

TestDialFail will trigger a reconnect
* mempool: remove only valid (Code==0) txs on Update

so evil proposers can't drop valid txs in Commit stage.

Also remove invalid (Code!=0) txs from the cache so they can be
resubmitted.

Fixes #3322

@rickyyangz:

In the end of commit stage, we will update mempool to remove all the txs
in current block.

// Update mempool.
err = blockExec.mempool.Update(
	block.Height,
	block.Txs,
	TxPreCheck(state),
	TxPostCheck(state),
)

Assum an account has 3 transactions in the mempool, the sequences are
100, 101 and 102 separately, So an evil proposal can only package the
101 and 102 transactions into its proposal block, and leave 100 still in
mempool, then the two txs will be removed from all validators' mempool
when commit. So the account lost the two valid txs.

@ebuchman:

In the longer term we may want to do something like #2639 so we can
validate txs before we commit the block. But even in this case we'd only
want to run the equivalent of CheckTx, which means the DeliverTx could
still fail even if the CheckTx passes depending on how the app handles
the ABCI Code semantics. So more work will be required around the ABCI
code. See also #2185

* add changelog entry and tests

* improve changelog message

* reformat code
* libs/db: conditional compilation

For cleveldb: go build -tags cleveldb
For boltdb: go build -tags boltdb

Fixes #3611

* document db_backend param better

* remove deprecated LevelDBBackend

* update changelog

* add missing lines

* add new line

* fix TestRemoteDB

* add a line about boltdb tag

* Revert "remove deprecated LevelDBBackend"

This reverts commit 1aa8545.

* make PR non breaking

* change DEPRECATED label format

https://stackoverflow.com/a/36360323/820520
for storing batch keys and values in boltDBBatch.

NOTE: batch does not have to be safe for concurrent access. Delete may
be slow, but given it should not be used often, it's ok.

Fixes #3631
@melekes melekes requested a review from ebuchman May 7, 2019 17:47
milosevic and others added 2 commits May 10, 2019 13:37
* Add blockchain reactor refactor ADR

* docs: Correct markdown and go syntax in adr-036

* Update docs/architecture/adr-036-blockchain-reactor-refactor.md

Co-Authored-By: milosevic <zarko@interchain.io>

* Add comments and address reviewer comments

* Improve formating

* Small improvements

* adr-036 -> adr-040
* p2p: initial implementation of peer behaviour

* [p2p] re-use newMockPeer

* p2p: add inline docs for peer_behaviour interface

* [p2p] Align PeerBehaviour interface (#3558)

* [p2p] make switchedPeerHebaviour private

* [p2p] make storePeerHebaviour private

* [p2p] Add CHANGELOG_PENDING entry for PeerBehaviour

* [p2p] Adjustment naming for PeerBehaviour

* [p2p] Add coarse lock around storedPeerBehaviour

* [p2p]: Fix non-pointer methods in storedPeerBehaviour

    + Structs with embeded locks must specify all methods with pointer
     receivers to avoid creating a copy of the embeded lock.

* [p2p] Thorough refactoring based on comments in #3552

    + Decouple PeerBehaviour interface from Peer by parametrizing
    methods with `p2p.ID` instead of `p2p.Peer`
    + Setter methods wrapped in a write lock
    + Getter methods wrapped in a read lock
    + Getter methods on storedPeerBehaviour now take a `p2p.ID`
    + Getter methods return a copy of underlying stored behaviours
    + Add doc strings to public types and methods

* [p2p] make structs public

* [p2p] Test empty StoredPeerBehaviour

* [p2p] typo fix

* [p2p] add TestStoredPeerBehaviourConcurrency

    + Add a test which uses StoredPeerBehaviour in multiple goroutines
      to ensure thread-safety.

* Update p2p/peer_behaviour.go

Co-Authored-By: brapse <brapse@gmail.com>

* Update p2p/peer_behaviour.go

Co-Authored-By: brapse <brapse@gmail.com>

* Update p2p/peer_behaviour.go

Co-Authored-By: brapse <brapse@gmail.com>

* Update p2p/peer_behaviour.go

Co-Authored-By: brapse <brapse@gmail.com>

* Update p2p/peer_behaviour.go

Co-Authored-By: brapse <brapse@gmail.com>

* Update p2p/peer_behaviour_test.go

Co-Authored-By: brapse <brapse@gmail.com>

* Update p2p/peer_behaviour.go

Co-Authored-By: brapse <brapse@gmail.com>

* Update p2p/peer_behaviour.go

Co-Authored-By: brapse <brapse@gmail.com>

* Update p2p/peer_behaviour.go

Co-Authored-By: brapse <brapse@gmail.com>

* Update p2p/peer_behaviour.go

Co-Authored-By: brapse <brapse@gmail.com>

* [p2p] field ordering convention

* p2p: peer behaviour refactor

    + Change naming of reporting behaviour to `Report`
    + Remove the responsibility of distinguishing between the categories
    of good and bad behaviour and instead focus on reporting behaviour.

* p2p: rename PeerReporter -> PeerBehaviourReporter
@codecov-io
Copy link

codecov-io commented May 10, 2019

Codecov Report

Merging #3637 into v0.31 will decrease coverage by 0.76%.
The diff coverage is 61.63%.

@@            Coverage Diff            @@
##            v0.31   #3637      +/-   ##
=========================================
- Coverage   64.06%   63.3%   -0.77%     
=========================================
  Files         213     218       +5     
  Lines       17381   18208     +827     
=========================================
+ Hits        11136   11527     +391     
- Misses       5315    5717     +402     
- Partials      930     964      +34
Impacted Files Coverage Δ
libs/db/db.go 38.88% <ø> (ø) ⬆️
mempool/mempool.go 84.21% <ø> (+3.67%) ⬆️
state/services.go 25% <ø> (-8.34%) ⬇️
libs/common/errors.go 85.71% <ø> (+8.06%) ⬆️
rpc/core/pipe.go 28.84% <0%> (ø) ⬆️
state/store.go 59.77% <0%> (-11.47%) ⬇️
libs/common/service.go 54.54% <0%> (+0.97%) ⬆️
p2p/pex/file.go 78.04% <0%> (ø) ⬆️
consensus/reactor.go 71.54% <0%> (+0.23%) ⬆️
consensus/replay_file.go 0% <0%> (ø) ⬆️
... and 55 more

yutianwu and others added 8 commits May 21, 2019 14:12
* p2p/pex: consult seeds in crawlPeersRoutine

This changeset alters the startup behavior for crawlPeersRoutine. Previously
the routine would crawl a random selection of peers on startup. For a
new seed node, there are no peers. As a result, new seed nodes are unable
to bootstrap themselves with a list of peers until another node with a list
of peers connects to the seed. If this node relies on the seed node for peers,
then the two will not discover more peers.

This changeset makes the startup behavior for crawlPeersRoutine connect to
any seed nodes. Upon connecting, a request for peers will be sent to the seed node
thus helping bootstrap our seed node.

* p2p/pex: Adjust error message for no peers

Co-Authored-By: Ethan Buchman <ethan@coinculture.info>
…ureFromPubKey-error

Improve error message returned from AddSignatureFromPubKey
[R4R] Fix missing context parameter in proxy
@melekes melekes closed this May 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet