Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: blockchain dismiss request channel delay #3459

Merged
merged 2 commits into from
Apr 16, 2019

Conversation

unclezoro
Copy link
Contributor

@unclezoro unclezoro commented Mar 21, 2019

Fix issue #3457

  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGELOG_PENDING.md

@unclezoro unclezoro changed the title p2p: blockchain dismiss request channel delay [WIP]p2p: blockchain dismiss request channel delay Mar 21, 2019
@unclezoro unclezoro changed the title [WIP]p2p: blockchain dismiss request channel delay p2p: blockchain dismiss request channel delay Mar 21, 2019
@unclezoro
Copy link
Contributor Author

@melekes @ebuchman may you help to review if you get time, thks

@melekes
Copy link
Contributor

melekes commented Mar 23, 2019

There's a data race

WARNING: DATA RACE
Write at 0x00c00051a3d0 by goroutine 48:
  runtime.closechan()
      /usr/local/go/src/runtime/chan.go:334 +0x0
  github.com/tendermint/tendermint/blockchain.(*BlockPool).OnStop()
      /go/src/github.com/tendermint/tendermint/blockchain/pool.go:107 +0x6e
  github.com/tendermint/tendermint/libs/common.(*BaseService).Stop()
      /go/src/github.com/tendermint/tendermint/libs/common/service.go:167 +0x4e3
  github.com/tendermint/tendermint/blockchain.TestBlockPoolBasic()
      /go/src/github.com/tendermint/tendermint/blockchain/pool_test.go:120 +0x5e5
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:865 +0x163

Previous read at 0x00c00051a3d0 by goroutine 322:
  runtime.chansend()
      /usr/local/go/src/runtime/chan.go:142 +0x0
  github.com/tendermint/tendermint/blockchain.(*BlockPool).sendRequest()
      /go/src/github.com/tendermint/tendermint/blockchain/pool.go:393 +0xdc
  github.com/tendermint/tendermint/blockchain.(*bpRequester).requestRoutine()
      /go/src/github.com/tendermint/tendermint/blockchain/pool.go:604 +0x2e3

Goroutine 48 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:916 +0x699
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:1157 +0xa8
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:865 +0x163
  testing.runTests()
      /usr/local/go/src/testing/testing.go:1155 +0x523
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1072 +0x2eb
  github.com/tendermint/tendermint/blockchain.TestMain()
      /go/src/github.com/tendermint/tendermint/blockchain/store_test.go:111 +0x5cb
  main.main()
      _testmain.go:114 +0x330

Goroutine 322 (running) created at:
  github.com/tendermint/tendermint/blockchain.(*bpRequester).OnStart()
      /go/src/github.com/tendermint/tendermint/blockchain/pool.go:523 +0x64
  github.com/tendermint/tendermint/libs/common.(*BaseService).Start()
      /go/src/github.com/tendermint/tendermint/libs/common/service.go:139 +0x504
  github.com/tendermint/tendermint/blockchain.(*BlockPool).makeNextRequester()
      /go/src/github.com/tendermint/tendermint/blockchain/pool.go:379 +0x229
  github.com/tendermint/tendermint/blockchain.(*BlockPool).makeRequestersRoutine()
      /go/src/github.com/tendermint/tendermint/blockchain/pool.go:130 +0x174
==================

for {
select {
case request := <-bcR.requestsCh:
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how extracting requestsCh into its own goroutine solves anything... but I don't understand the issue well enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The topic of the issue is that : write a BlockRequest int requestsCh channel will create an timer at the same time that stop the peer 15s later if no block have been received . But pop a BlockRequest from requestsCh and send it out may delay more than 15s later. So that the peer will be stopped for error("send nothing to us").
Extracting requestsCh into its own goroutine can make sure that every BlockRequest been handled timely.

@melekes
Copy link
Contributor

melekes commented Mar 23, 2019

Should we instead forbid any ABCI queries while we're in fast sync mode?

@unclezoro
Copy link
Contributor Author

unclezoro commented Mar 23, 2019

Should we instead forbid any ABCI queries while we're in fast sync mode?

Heavy ABCI queries is a condition to make this problem easily to be reproduced. If the time to apply a block increase, this issue is more easily to be exposed, but we can't control the time in tendermint layer.
So the key is the speed of apply a block, not ABCI queries.

@unclezoro
Copy link
Contributor Author

I will try to fix data-race issue tomorrow. thks for remind.

@unclezoro
Copy link
Contributor Author

updated @melekes . Let us see the CI result then.

@codecov-io
Copy link

codecov-io commented Mar 24, 2019

Codecov Report

Merging #3459 into develop will increase coverage by 0.14%.
The diff coverage is 58.33%.

@@             Coverage Diff             @@
##           develop    #3459      +/-   ##
===========================================
+ Coverage     64.2%   64.34%   +0.14%     
===========================================
  Files          213      213              
  Lines        17362    17447      +85     
===========================================
+ Hits         11147    11227      +80     
+ Misses        5290     5287       -3     
- Partials       925      933       +8
Impacted Files Coverage Δ
libs/db/mem_batch.go 92.59% <ø> (ø) ⬆️
blockchain/reactor.go 71.49% <58.33%> (-0.97%) ⬇️
mempool/reactor.go 68.35% <0%> (-10.64%) ⬇️
libs/events/events.go 93.2% <0%> (-4.86%) ⬇️
privval/signer_remote.go 80% <0%> (-2%) ⬇️
blockchain/pool.go 80.59% <0%> (-1.65%) ⬇️
mempool/mempool.go 79.36% <0%> (-1.18%) ⬇️
rpc/client/localclient.go 63.8% <0%> (ø) ⬆️
config/toml.go 65.95% <0%> (ø) ⬆️
rpc/client/httpclient.go 66.51% <0%> (ø) ⬆️
... and 11 more

Copy link
Contributor

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍰

@melekes melekes requested a review from ancazamfir March 26, 2019 11:29
@ancazamfir
Copy link
Contributor

Instead of the requestsCh handling, we should probably pull the didProcessCh handling in a separate go routine since this is the one "starving" the other channel handlers. I believe the way it is right now, we still have issues with high delays in errorsCh handling that might cause sending requests to invalid/ disconnected peers.

@unclezoro
Copy link
Contributor Author

unclezoro commented Apr 1, 2019

Instead of the requestsCh handling, we should probably pull the didProcessCh handling in a separate go routine since this is the one "starving" the other channel handlers. I believe the way it is right now, we still have issues with high delays in errorsCh handling that might cause sending requests to invalid/ disconnected peers.

you are right. The code is updated now, I put equestsCh in main routine. @ancazamfir would you like review again.

@melekes
Copy link
Contributor

melekes commented Apr 11, 2019

@guagualvcha could you please add a changelog entry (CHANGELOG_PENDING.md)?

Copy link
Contributor

@ancazamfir ancazamfir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@melekes melekes merged commit 439312b into tendermint:develop Apr 16, 2019
brapse pushed a commit to brapse/tendermint that referenced this pull request Jun 5, 2019
Fixes tendermint#3457

The topic of the issue is that : write a BlockRequest int requestsCh channel will create an timer at the same time that stop the peer 15s later if no block have been received . But pop a BlockRequest from requestsCh and send it out may delay more than 15s later. So that the peer will be stopped for error("send nothing to us").
Extracting requestsCh into its own goroutine can make sure that every BlockRequest been handled timely.

Instead of the requestsCh handling, we should probably pull the didProcessCh handling in a separate go routine since this is the one "starving" the other channel handlers. I believe the way it is right now, we still have issues with high delays in errorsCh handling that might cause sending requests to invalid/ disconnected peers.
Stumble pushed a commit to lino-network/tendermint that referenced this pull request Jul 25, 2019
Fixes tendermint#3457

The topic of the issue is that : write a BlockRequest int requestsCh channel will create an timer at the same time that stop the peer 15s later if no block have been received . But pop a BlockRequest from requestsCh and send it out may delay more than 15s later. So that the peer will be stopped for error("send nothing to us").
Extracting requestsCh into its own goroutine can make sure that every BlockRequest been handled timely.

Instead of the requestsCh handling, we should probably pull the didProcessCh handling in a separate go routine since this is the one "starving" the other channel handlers. I believe the way it is right now, we still have issues with high delays in errorsCh handling that might cause sending requests to invalid/ disconnected peers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants