Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: peer store and dialing changes #8737

Merged
merged 70 commits into from
Jun 17, 2022

Conversation

tychoish
Copy link
Contributor

@tychoish tychoish commented Jun 10, 2022

Closes: #8768

This needs some testing or validation, and maybe to be split into a
few batches, but I wanted to collect a number of changes that we'd
been discussing into one branch to facilitate testing and conversation.

The high level:

  • peer scores can now be negative so there's no (realistic) floor on how being offline can impact your peer score
  • reduced the dialing sleep interval maximum from 3s to 500ms
  • changed peer ranking to more heavily weight "peers we have successfully dialed recently.
  • find the 2x the limit number of peers and then shuffle them before returning the limit, to avoid leaving some peers never getting gossiped.
  • added a notion of "inactive" peers, which replaces "deleted peers", but we only use it in the "wrong network" case. Inactivation functions like a banlist, so it's a relatively course tool, I'm not sure where else to use it, but we can.

Questions:

  • persistent peers are going to sort high no matter what (and always have). I think they shouldn't be maxint, (changing that in the first patch,) but I don't know what's best there, so it's a bit arbitrary
  • are we worried about peer scores wrapping around? I don't think we should but it's a new risk that this introduces.
  • this pushes the weight toward "nodes that have successfully connected" in the score, which means nodes with lots of connections will tend to get more connections (until their connection slots fill up), but hopefully the shuffle
  • should we use inactivate in more places?

internal/p2p/peermanager.go Outdated Show resolved Hide resolved
internal/p2p/peermanager.go Outdated Show resolved Hide resolved
internal/p2p/peermanager.go Outdated Show resolved Hide resolved
internal/p2p/peermanager.go Outdated Show resolved Hide resolved
@cmwaters
Copy link
Contributor

Do we need the Seed field in peerInfo?

Comment on lines 1214 to 1217
// sort peers who our most recent dialing attempt was
// successful ahead of peers with recent dialing
// failures
switch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Score already incorporates this logic by subtracting one for every failed dial. Peers who have previously failed will naturally be scored lower

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it doesn't increase for successful dials, and "I just dialed you and succeeded after failing for a while" is a problem.

I also think that given that the small range of possible scores (before this) we end up not really capturing (meaningfully) "this peer is real"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to increase for successful dials when we reset the counter after being successful. That should be enough to sufficiently offset your score

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is commented out and we can revisit it later.

internal/p2p/router.go Outdated Show resolved Hide resolved
internal/p2p/router.go Outdated Show resolved Hide resolved
@tychoish
Copy link
Contributor Author

Do we need the Seed field in peerInfo?

Totally unused, removed.

internal/p2p/peermanager.go Outdated Show resolved Hide resolved
internal/p2p/peermanager.go Outdated Show resolved Hide resolved
proto/tendermint/p2p/types.proto Show resolved Hide resolved
internal/p2p/router.go Outdated Show resolved Hide resolved
internal/p2p/peermanager.go Show resolved Hide resolved
internal/p2p/peermanager.go Outdated Show resolved Hide resolved
internal/p2p/peermanager.go Outdated Show resolved Hide resolved
internal/p2p/peermanager.go Outdated Show resolved Hide resolved
internal/p2p/peermanager.go Outdated Show resolved Hide resolved
// MaxOutgoingConnections specifies how many outgoing
// connections. It must be lower than MaxConnected. If it is
// 0, then all connections can be outgoing.
MaxOutgoingConnections uint16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think that does make sense.

node/setup.go Outdated Show resolved Hide resolved
tychoish and others added 5 commits June 16, 2022 16:52
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
// peer.

// nolint:gosec // G404: Use of weak random number generator
if numAddresses <= int(limit) || rand.Intn(totalScore+1) <= scores[peer.ID]+1 || rand.Intn((idx+1)*10) <= idx+1 {
Copy link
Contributor

@williambanfield williambanfield Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This coin flip is:

Pick a random number between 0 and totalScore +1
If this value is less than the score of the peer, select the peer. 

I think we'll hit maxAttempts a lot doing it this way since most peers will have small scores. Either we need maxAttempts to grow with the limit or just switch to the simpler coinflip logic before. I'm fine with the simpler coinflip since I'd prefer this logic to be as dead simple as possible.

Perhaps my suggestion was too complex for now. The risk that this adds is frequently gossiping too few peers and slowing down PEX. I'd really prefer to keep any and all risks from complexity as low as possible if it means definitely shipping an improvement. To that end, I'm ultimately fine with the initial:

pick top limit * 2
shuffle
reslice to be of size limit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we add 1 to both sides, (which we have to do to avoid passing 0 to rand.Intn) the +1 isn't notable.

The extra 10% of flip, I think helps reduce the max attempts case. Anyway, I've made it == to limit, which means we'd have to get < 1 address per iteration. (and the addedLastIteration check will let us continue past that as long as we make progress in each iteration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do 2 * limit? I think it's quite likely we'll get less than 1 per iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure!

@tychoish tychoish merged commit 9e5b137 into tendermint:master Jun 17, 2022
mergify bot pushed a commit that referenced this pull request Jun 17, 2022
tychoish added a commit that referenced this pull request Jun 17, 2022
(cherry picked from commit 9e5b137)

Co-authored-by: Sam Kleinman <garen@tychoish.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

p2p: smarter handling of connection slots
4 participants