Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: re-check after sleeps #2664

Merged
merged 13 commits into from Nov 11, 2018
Merged

p2p: re-check after sleeps #2664

merged 13 commits into from Nov 11, 2018

Conversation

mgurevin
Copy link
Contributor

Hi there, I saw there are many re-connection attempts and trying to connect itself while ensuring peers in pex reactor. I think that issues appear for me because I'm testing three node setup with persistent peers. So I guess this patch is the solution. It works for me.

@mgurevin mgurevin changed the title Develop p2p: re-check after sleeps Oct 18, 2018
@codecov-io
Copy link

codecov-io commented Oct 18, 2018

Codecov Report

Merging #2664 into develop will decrease coverage by 0.04%.
The diff coverage is 0%.

@@             Coverage Diff             @@
##           develop    #2664      +/-   ##
===========================================
- Coverage    62.37%   62.32%   -0.05%     
===========================================
  Files          212      212              
  Lines        17210    17215       +5     
===========================================
- Hits         10734    10729       -5     
- Misses        5578     5585       +7     
- Partials       898      901       +3
Impacted Files Coverage Δ
p2p/switch.go 52.45% <0%> (+0.8%) ⬆️
p2p/netaddress.go 71.51% <0%> (-0.88%) ⬇️
consensus/reactor.go 67.13% <0%> (-1.17%) ⬇️
p2p/pex/pex_reactor.go 73.37% <0%> (+0.32%) ⬆️

Copy link
Contributor

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Would be great if you answer my questions and provide logs with reconnect attempts (if possible).

p2p/switch.go Outdated
@@ -325,6 +325,11 @@ func (sw *Switch) reconnectToPeer(addr *NetAddress) {
return
}

if sw.IsDialingOrExistingAddress(addr) {
sw.Logger.Info("Peer connection has been established or dialed while we waiting next try", "addr", addr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@@ -395,6 +394,9 @@ func (r *PEXReactor) ensurePeers() {
if r.Switch.IsDialingOrExistingAddress(try) {
continue
}
if r.Switch.NodeInfo().ID() == try.ID {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you decide to add this check? If we're picking our address for connecting in ensurePeers, then something is wrong. That's why we add code like

if !netAddr.Same(ourAddr) {

and

tendermint/node/node.go

Lines 458 to 459 in f94eb42

// Add ourselves to addrbook to prevent dialing ourselves
addrBook.AddOurAddress(nodeInfo.NetAddress())

it's a little messy, but the point is ensurePeers should only use external addresses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. The protocol part of the listen address definitions in the configuration caused my problem.

If the listen address looks like tcp://995f83b21e6b66b62883853cc77fe567da48986c@0.0.0.0:2101 the id part can not be resolved in node_info.

So it's caused to unable to check our address.

@melekes
Copy link
Contributor

melekes commented Nov 3, 2018

@mgurevin Hi! Could you follow up on my questions? We want to merge this, but #2664 (comment) needs to be addressed before.

@mgurevin
Copy link
Contributor Author

mgurevin commented Nov 4, 2018

Hi @melekes, I saw your comment. I will split my additions and send their detailed logs in a few days, sorry for delay.

@mgurevin
Copy link
Contributor Author

mgurevin commented Nov 8, 2018

Hi again,

I have split my additions into different commits and attach their detailed logs.

My setup is three node and they defined as persistent nodes in the configuration files.

Log files;

@mgurevin
Copy link
Contributor Author

mgurevin commented Nov 8, 2018

I have complete my pull request, if it looks good you can merge.

The main problem is the listen address definition because node ID cannot be resolved if it contains the protocol part.

In this pull request, we fixed this issue and add check the actual connection status when reconnecting after sleep.

Copy link
Contributor

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍓 🍌 🥛

p2p/netaddress.go Outdated Show resolved Hide resolved
Co-Authored-By: mgurevin <mehmet@gurevin.net>
Copy link
Contributor

@ebuchman ebuchman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. I'll merge develop in, fix the conflicts, and merge this PR

func IDAddressString(id ID, hostPort string) string {
// we respect the protocol definition in here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We merged a similar fix in https://github.com/tendermint/tendermint/pull/2797/files#diff-91e8069d5cb91a5b3067a01482559679R38. There we just drop the protocol prefix, since we shouldn't need it anymore. Sorry didn't see this earlier, it would have saved us some time.

@@ -328,6 +328,11 @@ func (sw *Switch) reconnectToPeer(addr *NetAddress) {
return
}

if sw.IsDialingOrExistingAddress(addr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. These changes are partial fixes for #2716.

Note the races are still possible, but this should help significantly. We'll likely have to rearchitect the code a bit to solve this fully.

Thanks!

@ebuchman ebuchman merged commit 905abf1 into tendermint:develop Nov 11, 2018
maxim-levy pushed a commit to maxim-levy/tendermint that referenced this pull request Nov 13, 2018
* 'master' of https://github.com/tendermint/tendermint: (330 commits)
  Release/v0.26.1 (tendermint#2803)
  fix amino overhead computation for Tx (tendermint#2792)
  p2p: re-check after sleeps (tendermint#2664)
  check the result of `ps.peer.Send` before calling `ps.setHasVote` (tendermint#2787)
  p2p: AddressBook requires addresses to have IDs; Do not close conn immediately after sending pex addrs in seed mode (tendermint#2797)
  test AutoFile#Size (happy path)
  [autofile/group] do not panic when checking size
  openFile creates a file if not exist => ErrNotExist is not possible
  use our logger in autofile/group
  Add tests for ValidateBasic methods (tendermint#2754)
  [docs] improve organization of ABCI docs & fix links (tendermint#2749)
  p2p: peer-id -> peer_id (tendermint#2771)
  mempool: print postCheck error (tendermint#2762)
  Fix crypto/merkle ProofOperators.Verify to check bounds on keypath pa… (tendermint#2756)
  Mempool WAL is still created by default in home directory, leads to permission errors (tendermint#2758)
  mempool: ErrPreCheck and more log info (tendermint#2724)
  Release/v0.26.0 (tendermint#2726)
  [ADR] [DRAFT] pubsub 2.0 (tendermint#2532)
  validate reactor messages (tendermint#2711)
  TMHASH is 32 bytes. Closes tendermint#1990 (tendermint#2732)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants