Pex reactor fixes #9 #10

melekes · 2017-01-11T20:23:57Z

Started to work on pex reactor issues (Refs #9).

What's done / left:

pex_reactor_test
removePeer
prevent abuse
improve ensurePeers logic
update addrbook from https://github.com/btcsuite/btcd/blob/master/addrmgr/addrmanager.go

melekes · 2017-01-12T19:07:43Z

I've looked into addrmanager and there were no significant changes since mid-2015.

Still, I liked that they choose address based on distance https://github.com/btcsuite/btcd/blob/master/addrmgr/addrmanager.go#L1046 (we choose randomly).

melekes · 2017-01-12T19:26:30Z

pex_reactor.go


 	// Try to pick numToDial addresses to dial.
-	// TODO: improve logic.


@jaekwon what did you have in mind? something concrete? I've managed to get rid of alreadyConnected flag by using addrbook old group. I will try to think about something else for the time being.

I don't understand how alreadyConnected can be removed. An address's membership in addrbook (old or new) has no bearing on whether the address is already connected.

If we mark the peer good after successful connection and pick addresses with the bias = 100% (always from new buckets), we don't need to check alreadyConnected (since all the peers with whom we already have a connection will be in the old buckets) b898bc3

Sure, but that's not what we want to do.

Old bucket / New bucket are arbitrary categories to denote whether an address is vetted or not, and this needs to be determined over time via a heuristic that we haven't perfected yet, or, perhaps is manually edited by the node operator. It should not be used to compute what addresses are already connected or not.

The old code may have been buggy, but these modifications are definitely bad.

If we mark the peer good after successful connection...

Basically, we need to work harder on our good-peer/bad-peer marking. What we're currently doing in terms of marking good/bad peers is just a placeholder.
It should not be the case that an address becomes old/vetted upon a single successful connection. That's not the intent of the old/new system.

Thank you for explaining this to me. It wasn't clear to me before. I will revert the change and add a comment to the source code so we could refer to it in the future.

ebuchman · 2017-01-17T18:04:05Z

pex_reactor.go

+func (r *PEXReactor) RemovePeer(p *Peer, reason interface{}) {
+	addr := NewNetAddressString(p.ListenAddr)
+	// addr will be ejected from the book
+	r.book.MarkBad(addr)


I think this will need to depend on the reason. If the peer just goes offline, we probably don't want to remove them

Are you sure?

NOTE: The peer will be proposed to us by other peers (PexAddrsMessage) and we will add it again upon successful connection.

But if the peer actually went offline, wont all the other peers remove him too? Granted the PEX should enable everyone to find him when he next connects.

Yes. Peer will need to send first requests to others by itself (he will have an addrbook or the seeds). Is it bad?

No I guess it's ok. Down the road I think we will want to preserve "MarkBad" for peers that actually misbehave.

Sounds right. I will add RemoveAddress to addrbook then.

ebuchman · 2017-01-17T18:10:02Z

Looks good. I think limiting to eg 1000 msg/hour is fine for now. Maybe down the road we want to keep track of the quality of peer messages so if peerA keeps telling us about peers we can't connect to then maybe we should care less about peerA. But I don't think that kind of complexity is priority right now

melekes · 2017-01-17T18:10:56Z

Maybe down the road we want to keep track of the quality of peer messages so if peerA keeps telling us about peers we can't connect to then maybe we should care less about peerA. But I don't think that kind of complexity is priority right now

OK. I will add the comment

@ebuchman

after discussion with @ebuchman (#10 (comment))

jaekwon · 2017-01-20T17:04:23Z

pex_reactor.go

+// will remove him too. The peer will need to send first requests to others by
+// himself (he will have an addrbook or the seeds).
+func (r *PEXReactor) RemovePeer(p *Peer, reason interface{}) {
+	addr := NewNetAddressString(p.ListenAddr)


AddPeer/RemovePeer are just housekeeping methods, we shouldn't remove a peer from the addrbook just because they got disconnected.

We could rename AddPeer/RemovePeer to "OnPeerConnect" and "OnPeerDisconnect". If we aren't keeping track of local temp data for each peer here, then we don't have to do anything.

I thought if we add the peer to the book upon connection (AddPeer), we should respectively remove him upon losing the connection (RemovePeer). It just sounds logical. In addition, the peer will reach us once he up again (or network healed; though, not sure about the network case, need to test it with iptables). #10 (comment)

Ok, I will revert this too.

@ebuchman

after discussion with @ebuchman (#10 (comment))

jaekwon · 2017-04-19T01:50:22Z

pex_reactor.go

-			try := pexR.book.PickAddress(newBias)
+			// NOTE always picking from the new group because old one stores already
+			// connected peers.
+			try := r.book.PickAddress(100)


This is bad. The purpose of newBias is to first prioritize old (more vetted) peers when we have few connections, but to allow for new (less vetted) peers if we already have many connections. This algorithm isn't perfect, but it somewhat ensures that we prioritize connecting to more-vetted peers. Please revert.

@ebuchman

after discussion with @ebuchman (#10 (comment))

melekes · 2017-04-20T09:35:51Z

Rebased and reverted some commits as per Jae's comments. Done here.

no need for repeate timer here (no need for goroutine safety)

optimizations: - if we move peer to the old bucket as soon as connected and pick only from new group, we can skip alreadyConnected check

This is better than waiting because while we wait, anything could happen (crash, timeout of the code who's using addrbook, ...). If we save immediately, we have much greater chances of success.

@ebuchman

after discussion with @ebuchman (#10 (comment))

ebuchman · 2017-04-20T16:20:51Z

pex_reactor.go

+	for {
+		select {
+		case <-ticker.C:
+			r.msgCountByPeer = make(map[string]uint16)


this doesn't seem thread safe

does it have to be? peer has only 1 MConn ; MConn calls onReceive in the same thread (blocking)

melekes force-pushed the pex-reactor-fixes-#9 branch 2 times, most recently from e106259 to c0ca201 Compare January 12, 2017 18:31

melekes force-pushed the pex-reactor-fixes-#9 branch from c0ca201 to 51efd40 Compare January 12, 2017 19:21

melekes commented Jan 12, 2017

View reviewed changes

ebuchman force-pushed the develop branch from 67963ab to e47722e Compare January 13, 2017 01:49

melekes force-pushed the pex-reactor-fixes-#9 branch from 51efd40 to b898bc3 Compare January 16, 2017 15:26

ebuchman reviewed Jan 17, 2017

View reviewed changes

melekes force-pushed the pex-reactor-fixes-#9 branch from 9481c05 to 4d04534 Compare January 17, 2017 19:25

melekes added the enhancement label Jan 20, 2017

melekes added a commit that referenced this pull request Jan 20, 2017

add public RemoveAddress API

6562d9b

after discussion with @ebuchman (#10 (comment))

jaekwon reviewed Jan 20, 2017

View reviewed changes

melekes force-pushed the pex-reactor-fixes-#9 branch from 6562d9b to 24b8716 Compare April 14, 2017 20:00

melekes added a commit that referenced this pull request Apr 14, 2017

add public RemoveAddress API

31c1b2f

after discussion with @ebuchman (#10 (comment))

jaekwon reviewed Apr 19, 2017

View reviewed changes

melekes force-pushed the pex-reactor-fixes-#9 branch from b489774 to 32661a0 Compare April 20, 2017 08:50

melekes added a commit that referenced this pull request Apr 20, 2017

add public RemoveAddress API

de34eed

after discussion with @ebuchman (#10 (comment))

melekes added 10 commits April 20, 2017 13:36

remove unused error

057cfb3

prefer short names

26f661a

add Dockerfile

3af7c67

implement RemovePeer for PEXReactor

37d5a2c

test ensurePeers goroutine

0109f1e

replace repeate timer with simple ticker

1a59b6a

no need for repeate timer here (no need for goroutine safety)

test PEXReactor#Receive

47df1fb

prevent abuse from peers

873d341

improve ensurePeers routine

07e7b98

optimizations: - if we move peer to the old bucket as soon as connected and pick only from new group, we can skip alreadyConnected check

do not create file, just temp dir

5eeaffd

melekes added 10 commits April 20, 2017 13:36

call saveToFile OnStop

590efc1

This is better than waiting because while we wait, anything could happen (crash, timeout of the code who's using addrbook, ...). If we save immediately, we have much greater chances of success.

make GoLint happy

52d9cf0

note on preventing abuse [ci skip]

324293f

add public RemoveAddress API

cf18bf2

after discussion with @ebuchman (#10 (comment))

fix merge

0277e52

return wg to addrbook

4c0d1d3

fix race

5ab8ca0

revert e448199

9ce7101

revert 2710873

17ec70f

it is non-deterministic (could fail sometimes)

8655e24

melekes force-pushed the pex-reactor-fixes-#9 branch from 1720c41 to 8655e24 Compare April 20, 2017 09:37

ebuchman reviewed Apr 20, 2017

View reviewed changes

ebuchman added 2 commits April 20, 2017 12:21

update comment about outbound peers and addrbook

391c738

msgCountByPeer is a CMap

75bad13

ebuchman merged commit 1712498 into develop Apr 20, 2017

ebuchman deleted the pex-reactor-fixes-#9 branch April 20, 2017 21:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pex reactor fixes #9 #10

Pex reactor fixes #9 #10

melekes commented Jan 11, 2017 •

edited

Loading

melekes commented Jan 12, 2017

melekes Jan 12, 2017 •

edited

Loading

jaekwon Jan 20, 2017

melekes Jan 23, 2017

jaekwon Apr 19, 2017

jaekwon Apr 19, 2017 •

edited

Loading

melekes Apr 20, 2017

ebuchman Jan 17, 2017

melekes Jan 18, 2017

ebuchman Jan 18, 2017

melekes Jan 18, 2017

ebuchman Jan 18, 2017

melekes Jan 19, 2017

ebuchman commented Jan 17, 2017

melekes commented Jan 17, 2017

jaekwon Jan 20, 2017

melekes Jan 23, 2017

melekes Apr 20, 2017

jaekwon Apr 19, 2017

melekes commented Apr 20, 2017

ebuchman Apr 20, 2017

melekes Apr 20, 2017


		// Try to pick numToDial addresses to dial.
		// TODO: improve logic.

Pex reactor fixes #9 #10

Pex reactor fixes #9 #10

Conversation

melekes commented Jan 11, 2017 • edited Loading

melekes commented Jan 12, 2017

melekes Jan 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaekwon Apr 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebuchman commented Jan 17, 2017

melekes commented Jan 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

melekes commented Apr 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

melekes commented Jan 11, 2017 •

edited

Loading

melekes Jan 12, 2017 •

edited

Loading

jaekwon Apr 19, 2017 •

edited

Loading