Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADR] ADR-037: Peer Behaviour #3539

Merged
merged 6 commits into from
Apr 12, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
145 changes: 145 additions & 0 deletions docs/architecture/adr-037-peer-behaviour.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# ADR 037: Peer Behaviour Interface

## Changelog
* 07-03-2019: Initial draft

## Context

The responsibility for signaling and acting upon peer behaviour lacks a single
owning component and is heavily coupled with the network stack[<sup>1</sup>](#references). Reactors
maintain a reference to the `p2p.Switch` which they use to call
`switch.StopPeerForError(...)` when a peer misbehaves and
`switch.MarkAsGood(...)` when a peer contributes in some meaningful way.
While the switch handles `StopPeerForError` internally, the `MarkAsGood`
method delegates to another component, `p2p.AddrBook`. This scheme of delegation
across Switch obscures the responsibility for handling peer behaviour
and ties up the reactors in a larger dependency graph when testing.

## Decision

Introduce a `PeerBehaviour` interface and concrete implementations which
provide methods for reactors to signal peer behaviour without direct
coupling `p2p.Switch`. Introduce a ErrPeer to provide
concrete reasons for stopping peers.

### Implementation Changes

PeerBehaviour then becomes an interface for signaling peer errors as well
as for marking peers as `good`.

XXX: It might be better to pass p2p.ID instead of the whole peer but as
a first draft maintain the underlying implementation as much as
possible.

```go
type PeerBehaviour interface {
Errored(peer Peer, reason ErrPeer)
MarkPeerAsGood(peer Peer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first func now tells what happened (errored), the second one - what to do (mark). Maybe we should rename MarkPeerAsGood to BecomeGood?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, consistent would be sth like BehavedWell, WasUseful, Contributed? I think the actual name stems from the fact that the switch carries around an addrBook and the method there is MarkGood. Sound like we could improve the naming here @melekes 👍 Let's keep this in mind and wait for the implementation though.

Copy link
Contributor

@xla xla Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Let's amend the ADR to not have it fall through the cracks during implementation. What about:

type PeerBehaviour interface {
    Behaved(id ID, reason GoodBehaviourPeer)
    Errored(id ID, reason ErrorBehaviourPeer)
}

@brapse & @milosevic Thoughts on this?

Copy link
Contributor Author

@brapse brapse Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the Behaved, Errored dichotomy!

The change from peer p2p.Peer -> id p2p.ID also further decouples the reactor from the network concern heavy peer interface. 👍

The introduction of GoodBehaviourPeer also seems to complete the symmetry 👍

Happy to amend
image

}
```

Instead of signaling peers to stop with arbitrary reasons:
`reason interface{}`

We introduce a concrete error type ErrPeer:
```go
type ErrPeer int

const (
ErrPeerUnknown = iota
ErrPeerBadMessage
ErrPeerMessageOutofOrder
...
)
```

As a first iteration we provide a concrete implementation which wraps
the switch:
```go
type SwitchedPeerBehaviour struct {
sw *Switch
}

func (spb *SwitchedPeerBehaviour) Errored(peer Peer, reason ErrPeer) {
spb.sw.StopPeerForError(peer, reason)
}

func (spb *SwitchedPeerBehaviour) MarkPeerAsGood(peer Peer) {
spb.sw.MarkPeerAsGood(peer)
}

func NewSwitchedPeerBehaviour(sw *Switch) *SwitchedPeerBehaviour {
return &SwitchedPeerBehaviour{
sw: sw,
}
}
```

Reactors, which are often difficult to unit test[<sup>2</sup>](#references). could use an implementation which exposes the signals produced by the reactor in
manufactured scenarios:

```go
type PeerErrors map[Peer][]ErrPeer
type GoodPeers map[Peer]bool

type StorePeerBehaviour struct {
pe PeerErrors
gp GoodPeers
}

func NewStorePeerBehaviour() *StorePeerBehaviour{
return &StorePeerBehaviour{
pe: make(PeerErrors),
gp: GoodPeers{},
}
}

func (spb StorePeerBehaviour) Errored(peer Peer, reason ErrPeer) {
if _, ok := spb.pe[peer]; !ok {
spb.pe[peer] = []ErrPeer{reason}
} else {
spb.pe[peer] = append(spb.pe[peer], reason)
}
}

func (mpb *StorePeerBehaviour) GetPeerErrors() PeerErrors {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the Get methods used to inspect the peer topology from test code?

Copy link
Contributor Author

@brapse brapse Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's correct. In particular errors emitted by reactors in manufactured scenarios, eg. a mock peer which only sends invalid messages.

return mpb.pe
}

func (spb *StorePeerBehaviour) MarkPeerAsGood(peer Peer) {
if _, ok := spb.gp[peer]; !ok {
spb.gp[peer] = true
}
}

func (spb *StorePeerBehaviour) GetGoodPeers() GoodPeers {
return spb.gp
}
```

## Status

Proposed

## Consequences

### Positive

* De-couple signaling from acting upon peer behaviour.
* Reduce the coupling of reactors and the Switch and the network
stack
* The responsibility of managing peer behaviour can be migrated to
a single component instead of split between the switch and the
address book.

### Negative

* The first iteration will simply wrap the Switch and introduce a
level of indirection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could could also argue that this is not negative as you propose to improve the status quo incrementally (which is a good thing IMHO).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


### Neutral

## References

1. Issue [#2067](https://github.com/tendermint/tendermint/issues/2067): P2P Refactor
2. PR: [#3506](https://github.com/tendermint/tendermint/pull/3506): ADR 036: Blockchain Reactor Refactor