p2p: peer store and dialing changes #8737

tychoish · 2022-06-10T11:28:22Z

Closes: #8768

This needs some testing or validation, and maybe to be split into a
few batches, but I wanted to collect a number of changes that we'd
been discussing into one branch to facilitate testing and conversation.

The high level:

peer scores can now be negative so there's no (realistic) floor on how being offline can impact your peer score
reduced the dialing sleep interval maximum from 3s to 500ms
changed peer ranking to more heavily weight "peers we have successfully dialed recently.
find the 2x the limit number of peers and then shuffle them before returning the limit, to avoid leaving some peers never getting gossiped.
added a notion of "inactive" peers, which replaces "deleted peers", but we only use it in the "wrong network" case. Inactivation functions like a banlist, so it's a relatively course tool, I'm not sure where else to use it, but we can.

Questions:

persistent peers are going to sort high no matter what (and always have). I think they shouldn't be maxint, (changing that in the first patch,) but I don't know what's best there, so it's a bit arbitrary
are we worried about peer scores wrapping around? I don't think we should but it's a new risk that this introduces.
this pushes the weight toward "nodes that have successfully connected" in the score, which means nodes with lots of connections will tend to get more connections (until their connection slots fill up), but hopefully the shuffle
should we use inactivate in more places?

internal/p2p/peermanager.go

cmwaters · 2022-06-10T12:04:11Z

Do we need the Seed field in peerInfo?

cmwaters · 2022-06-10T12:15:40Z

internal/p2p/peermanager.go

+		// sort peers who our most recent dialing attempt was
+		// successful ahead of peers with recent dialing
+		// failures
+		switch {


Score already incorporates this logic by subtracting one for every failed dial. Peers who have previously failed will naturally be scored lower

but it doesn't increase for successful dials, and "I just dialed you and succeeded after failing for a while" is a problem.

I also think that given that the small range of possible scores (before this) we end up not really capturing (meaningfully) "this peer is real"

Does it need to increase for successful dials when we reset the counter after being successful. That should be enough to sufficiently offset your score

this is commented out and we can revisit it later.

internal/p2p/router.go

tychoish · 2022-06-10T12:51:07Z

Do we need the Seed field in peerInfo?

Totally unused, removed.

internal/p2p/peermanager.go

proto/tendermint/p2p/types.proto

internal/p2p/router.go

internal/p2p/peermanager.go

…ange

internal/p2p/peermanager.go

williambanfield · 2022-06-16T20:22:24Z

internal/p2p/peermanager.go

+	// MaxOutgoingConnections specifies how many outgoing
+	// connections. It must be lower than MaxConnected. If it is
+	// 0, then all connections can be outgoing.
+	MaxOutgoingConnections uint16


OK, I think that does make sense.

node/setup.go

internal/p2p/peermanager.go

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

internal/p2p/metrics.go

config/config.go

internal/p2p/peermanager.go

williambanfield · 2022-06-16T22:55:26Z

internal/p2p/peermanager.go

+					// peer.
+
+					// nolint:gosec // G404: Use of weak random number generator
+					if numAddresses <= int(limit) || rand.Intn(totalScore+1) <= scores[peer.ID]+1 || rand.Intn((idx+1)*10) <= idx+1 {


This coin flip is:

Pick a random number between 0 and totalScore +1 If this value is less than the score of the peer, select the peer.

I think we'll hit maxAttempts a lot doing it this way since most peers will have small scores. Either we need maxAttempts to grow with the limit or just switch to the simpler coinflip logic before. I'm fine with the simpler coinflip since I'd prefer this logic to be as dead simple as possible.

Perhaps my suggestion was too complex for now. The risk that this adds is frequently gossiping too few peers and slowing down PEX. I'd really prefer to keep any and all risks from complexity as low as possible if it means definitely shipping an improvement. To that end, I'm ultimately fine with the initial:

pick top limit * 2 shuffle reslice to be of size limit

because we add 1 to both sides, (which we have to do to avoid passing 0 to rand.Intn) the +1 isn't notable.

The extra 10% of flip, I think helps reduce the max attempts case. Anyway, I've made it == to limit, which means we'd have to get < 1 address per iteration. (and the addedLastIteration check will let us continue past that as long as we make progress in each iteration.

Can we do 2 * limit? I think it's quite likely we'll get less than 1 per iteration.

(cherry picked from commit 9e5b137)

(cherry picked from commit 9e5b137) Co-authored-by: Sam Kleinman <garen@tychoish.com>

p2p: peer store and dialing changes

9dbb135

tychoish requested review from ebuchman, cmwaters, williambanfield, creachadair, sergio-mena, jmalicevic, thanethomson and ancazamfir as code owners June 10, 2022 11:28

tychoish added 3 commits June 10, 2022 07:31

reduce persistent peer max

b213a27

don't gossip inactive peers

cc28ce2

fix small case

56a9164

cmwaters reviewed Jun 10, 2022

View reviewed changes

internal/p2p/peermanager.go Outdated Show resolved Hide resolved

internal/p2p/peermanager.go Outdated Show resolved Hide resolved

internal/p2p/peermanager.go Outdated Show resolved Hide resolved

internal/p2p/peermanager.go Outdated Show resolved Hide resolved

cmwaters reviewed Jun 10, 2022

View reviewed changes

internal/p2p/router.go Outdated Show resolved Hide resolved

cmwaters reviewed Jun 10, 2022

View reviewed changes

internal/p2p/router.go Outdated Show resolved Hide resolved

tychoish added 2 commits June 10, 2022 08:49

fix error message

86db59f

remove seed flag

000aa05

cmwaters reviewed Jun 10, 2022

View reviewed changes

internal/p2p/peermanager.go Outdated Show resolved Hide resolved

creachadair reviewed Jun 10, 2022

View reviewed changes

reduce logging level

4e2bc8f

cmwaters reviewed Jun 10, 2022

View reviewed changes

internal/p2p/peermanager.go Outdated Show resolved Hide resolved

tychoish added 6 commits June 10, 2022 11:51

make const

e3068b5

update comment

31bd396

cleanup

eddb23b

Merge remote-tracking branch 'origin/master' into p2p-dialer-store-ch…

a12fa36

…ange

fix tests

d6e3cab

overflows

4c86510

tychoish added 4 commits June 16, 2022 15:49

Merge remote-tracking branch 'origin/master' into p2p-dialer-store-ch…

ddbeb36

…ange

fix rand

acd12b9

fix rand

3f5b12d

use numaddresses correctly

88b2536

williambanfield reviewed Jun 16, 2022

View reviewed changes

node/setup.go Outdated Show resolved Hide resolved

cmwaters reviewed Jun 16, 2022

View reviewed changes

internal/p2p/peermanager.go Outdated Show resolved Hide resolved

cmwaters reviewed Jun 16, 2022

View reviewed changes

internal/p2p/peermanager.go Show resolved Hide resolved

tychoish and others added 5 commits June 16, 2022 16:52

readd

6d3c623

Update internal/p2p/peermanager.go

3b08f1a

Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>

remove some things

bfdcb9a

cleanup comment

690a929

more fixes

1b24241

williambanfield reviewed Jun 16, 2022

View reviewed changes

internal/p2p/metrics.go Outdated Show resolved Hide resolved

williambanfield reviewed Jun 16, 2022

View reviewed changes

internal/p2p/metrics.go Show resolved Hide resolved

williambanfield reviewed Jun 16, 2022

View reviewed changes

config/config.go Show resolved Hide resolved

tychoish added 4 commits June 16, 2022 18:12

toml

b24fcf8

fix port

e63dad0

fix comment

5bd378b

dec limit

2e46fe0

williambanfield reviewed Jun 16, 2022

View reviewed changes

internal/p2p/peermanager.go Outdated Show resolved Hide resolved

williambanfield reviewed Jun 16, 2022

View reviewed changes

tychoish added 2 commits June 16, 2022 19:40

fixes

0ea605a

up the attmept max

50afbc2

williambanfield approved these changes Jun 16, 2022

View reviewed changes

tychoish merged commit 9e5b137 into tendermint:master Jun 17, 2022

mergify bot pushed a commit that referenced this pull request Jun 17, 2022

p2p: peer store and dialing changes (#8737)

84f1d0d

(cherry picked from commit 9e5b137)

mergify bot mentioned this pull request Jun 17, 2022

p2p: peer store and dialing changes (backport #8737) #8784

Merged

tychoish added a commit that referenced this pull request Jun 17, 2022

p2p: peer store and dialing changes (#8737) (#8784)

7e7a253

(cherry picked from commit 9e5b137) Co-authored-by: Sam Kleinman <garen@tychoish.com>

cason mentioned this pull request Jul 4, 2022

WIP: specify the operation of the p2p layer in v0.35 #8935

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p2p: peer store and dialing changes #8737

p2p: peer store and dialing changes #8737

tychoish commented Jun 10, 2022 •

edited by cmwaters

Loading

cmwaters commented Jun 10, 2022

cmwaters Jun 10, 2022

tychoish Jun 10, 2022

cmwaters Jun 10, 2022

tychoish Jun 16, 2022

tychoish commented Jun 10, 2022

williambanfield Jun 16, 2022

williambanfield Jun 16, 2022 •

edited

Loading

tychoish Jun 16, 2022

williambanfield Jun 16, 2022

tychoish Jun 16, 2022

p2p: peer store and dialing changes #8737

p2p: peer store and dialing changes #8737

Conversation

tychoish commented Jun 10, 2022 • edited by cmwaters Loading

cmwaters commented Jun 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tychoish commented Jun 10, 2022

Choose a reason for hiding this comment

williambanfield Jun 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tychoish commented Jun 10, 2022 •

edited by cmwaters

Loading

williambanfield Jun 16, 2022 •

edited

Loading