Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Nodes and their capabilities #87

Open
oskarth opened this issue Jan 7, 2020 · 8 comments
Open

RFC: Nodes and their capabilities #87

oskarth opened this issue Jan 7, 2020 · 8 comments
Assignees

Comments

@oskarth
Copy link
Member

@oskarth oskarth commented Jan 7, 2020

Abstract

Currently we describe the various types of capabilities as different node types, such as bootstrap nodes, mailserver servers, mailserver clients, relay nodes, and light nodes. In some places we describe these as actual capabilities that different things a node can do. This inconsistency and lack of clarity around a node and what it can do makes it harder to do many things. Among others, to reason about and extend things related to specs, analysis, clients, simulations, running a node and explaining things to end users.

A cleaner mental model is to be consistent about it being a node with a set of capabilities. The "X node" then becomes more of a shortcut to refer to certain optimized client configurations. This is more in-line with the adaptive nodes ethos that we want to practice, as well as future enhancements we want to make (libp2p, dagger). It is conceptually simpler and more accurate. If clients implement this correctly, it also simplifies the use of different capabilities for people who want to run their own node.

Specific proposal

Use the following terminology consistently throughout the spec:

  1. A node is defined by its set of capabilities.

  2. For Waku, these capabilities are:

  • send messages
  • receive messages, including historical messages
  • relay messages
  • bootstrap nodes
  • store historical messages and return these upon request
  1. We use the following shortcuts to refer to nodes with certain set of common capabilities:
  • A light node only has send and receive capability
  • A full node has all capabilities (send/receive/relay/bootstrap/store)
  • A bootstrap node only has capability to bootstrap nodes
  1. These common configurations and their capabilities can be cleanly visualized as follows:
  • Light node [x] send [x] receive [ ] relay [ ] bootstrap [ ] store
  • Full node [x] send [x] receive [x] relay [x] bootstrap [x] store
  • Bootstrap node [ ] send [ ] receive [ ] relay [x] bootstrap [ ] store
  1. Each capability have a set of requirements. For example, to have the bootstrap capability you currently:
  • MUST be connectable through a static IP
  • SHOULD be long-running process
  1. A client SHOULD be able to specify all (legal) set of capabilities through a simple configuration and the same executable.

  2. The previously used "Waku server node", "mailserver node" and "Whisper relay node" is supplanted by simply a "full node" as in "run a full node to make the network more robust".

  • This doesn't preclude nodes, e.g. in a specific cluster configuration, choosing to have fewer capabilities. In that case they'll simply be referred to as the set of their capabilities.

Next steps

  • Consensus within protocol team
  • Update Waku specs with more precise terminology
  • Update Status client spec
  • Update client configuration and docs to reflect and enable this
  • Update infra configuration snad docs to reflect and enable this
  • Update relevant places in core app

Notes

@oskarth oskarth added this to Triage in Waku project Jan 7, 2020
@oskarth oskarth closed this Jan 8, 2020
@oskarth oskarth reopened this Jan 8, 2020
@oskarth

This comment has been minimized.

Copy link
Member Author

@oskarth oskarth commented Jan 8, 2020

We should also distinguish between these capabilities as outlined above, and what are called "RLPx subprotocol capabilities" in devp2p. To me the naming above seems accurate, but open to other ideas to not confuse the two.

An important follow up question is in terms of how we communicate these capabilities and act on them. For devp2p, this is done in the Hello handshake as a list of protocols we handle.

Looking at libp2p, which has way better protocol negotiation / multiplexing support than devp2p, it works similarly conceptually but with more flexibility. Essentially we define a string such as vac/waku/0.2 and then this is communicated along with other protocols supported. This string is then matched with protocol id, optional fuzzy handler function, and version id. Multiple versions of the same protocol can be specified, and protocols can be namespaced. https://docs.libp2p.io/concepts/protocols/#protocol-negotiation

There appears to be no specific support for communicating configurations of a protocol other than as subprotocols or version numbers. Unless we want to do this as an "inner negotiation", it seems to me that specifying separate subprotocols is the simplest. Open to other ideas here.

What do we care about? If I'm sending a Waku message, I want to make sure the node I'm connecting to has the ability to relay messages. If I'm receiving historical messages I want to make sure the node I'm asking stores messages. To me this would suggest a protocol selection as follows: [vac/waku/0.2, vac/waku/relay/0.2, vac/waku/store/0.1]. As far as I can tell, sending and receiving messages capability doesn't impact who connects to whom outside of basic vac/waku support (other random nodes should probably be disconnected). Bootstrap appears to me to be one level up. This design would also allow routing to change to e.g. pss, introducing accounting protocol modules, etc.

What do people think?

cc @decanus @cammellos @adambabik @kdeme @arnetheduck

@arnetheduck

This comment has been minimized.

Copy link

@arnetheduck arnetheduck commented Jan 8, 2020

@kdeme

This comment has been minimized.

Copy link
Contributor

@kdeme kdeme commented Jan 8, 2020

@oskarth Great plan to define terminology on types of nodes and their capabilities!

I think ideally there should be a clear definition & name for each actual use case of a node. So starting from that would perhaps be a good approach.

Some changes I would make:

  • I'd split the current full node in full node and historical (full) node.
  • Add in some way the fact that a node (full, light, ...) can be "selective" based on bloom filter. And possibly in future (Waku/1) based on topics list: "selective node", "selective light node"?
  • bootstrap node: Not really a Waku capability, should be seen separate from Waku as it is to bootstrap discovery, but you mentioned that already at the end of your last comment.

If "selective" is not clear enough specific names could be given here for these nodes or it could also just be simplified by limiting whether if a specific node type can be or must be selective based on the use cases that exist, e.g. for a light node, it would typically only make sense to be selective, and probably selective based on list of topics (in future). So that could be within the definition of light node to simplify things, but it doesn't have to be.

Some example use cases for the types of node:

  • Historical node: Status fleet (or any one who wants to run one) mail servers. Currently providing expired messages. Ideally handled in the future at a layer above and in a more decentralized way (like MVDS, remote log).
  • Full node:
    • Similar as historical node, somebody who wants to run a node but e.g. on small footprint own hardware, where storage and/or availability is not as guaranteed.
    • Desktop client user that wants maximum privacy
  • Full selective node:
    • Desktop client user willing to trade privacy for bandwidth.
    • Mobile client that is connected to WiFi + power outlet.
  • Light selective node: mobile client on mobile data plan.

Ideally (some of) these are adaptive capabilities that a user can change based on either their privacy needs or the current available power/bandwidth.

Most simple form could then be e.g.:

  • full historic node
  • full node
  • selective node (full node selective with bloom filter)
  • light node (light node selective with topics list)

Regarding the communicating of configurations, I've thought about this when discussing #41 and also looked before briefly into the libp2p information linked. And while this is very useful to negotiate protocols and version of protocols, I believe announcing the actual "role" within the protocol will still be required through a typical hello/status message. E.g. for a mail server, the store protocol would still require the client that requests that data and the type of client that responds with a set of data. But perhaps I'm not fully understanding the idea here or how this works in libp2p.

@zah

This comment has been minimized.

Copy link

@zah zah commented Jan 8, 2020

In the latest discovery protocol (v5), there are two ways to discover nodes:

  1. By their capabilities (these are the kinds of messages and requests that a node can understand and answer)

  2. By "topics" (these are more akin to "resources" the node has access to. for example, the history of a particular channel, the type of data of data being served, etc). It's up to the application to define what the topic names would be and what is their significance.

I'm sharing the above, just to highlight the subtle difference between "supported protocols" and "data being served" when one thinks about capabilities. The protocol negotiation mechanism in DevP2P and LibP2P covers only the aspect of supported protocols and it comes with one additional restriction - the capabilities are assumed to be symmetric. If you support the light client protocol for example, it's assumed that you'll be able to respond to server-side requests even when you are just a client.

Anyway, if you break the Waku protocol into smaller pieces, you'll be able to write code along these lines:

if peer.supports(WakuRelay):
  peer.sendMessage(...) # These are hypothetical protocol messages
                        # Sorry for not using the real ones

if peer.supports(WakuStore):
  peer.fetchFromStore(...) # These are hypothetical protocol messages
                           # Sorry for not using the real ones 

The alternative is to exchange the list of capabilities in the handshake message (status), to remember them in the PeerState object and to use conditional code like the following:

if peer.supports(Waku) and WakuStore in peer.protocolState(Waku).capabilites:
  ...

This approach will be able to support more arbitrary logic related to the capabilities, but you are likely going to lose the type 1) discovery mechanism.

@dryajov

This comment has been minimized.

Copy link

@dryajov dryajov commented Jan 8, 2020

I like the idea of capabilities, but I wouldn't (only if informally), try to associate names with groups of capabilities. A capability is some well defined, self contained functionality - for example bootstrap or relay; a node can have either one or both at any given time. I wouldn't go any more granular than that, because it complicates the mental model and makes little sense from a functionality perspective. This also pushes the protocols to be more modular.

What do we care about? If I'm sending a Waku message, I want to make sure the node I'm connecting to has the ability to relay messages. If I'm receiving historical messages I want to make sure the node I'm asking stores messages. To me this would suggest a protocol selection as follows: [vac/waku/0.2, vac/waku/relay/0.2, vac/waku/store/0.1].

Yes, I think this makes sense, but there is more than one way of doing it.

My recommendation is to stick to well defined protocol string that identify a specific functionality, something like what @oskarth outlined above, as well as communicating capabilities either as part of the HELLO message and/or having a dedicated message - this might make sense if capabilities change during the lifetime of the node, as it would be the case of an adaptive client. It's possible to use the stack's protocol negotiation mechanism for this as well, but in my experience this is not very flexible and it falls short at some point. As always, YMMV.

@adambabik

This comment has been minimized.

Copy link
Contributor

@adambabik adambabik commented Jan 9, 2020

Great discussion!

I like the idea of a well defined protocol string which includes protocol name and version. This is a very simple model which can be supported by even more naive peers discovery protocols.

To me this would suggest a protocol selection as follows: [vac/waku/0.2, vac/waku/relay/0.2, vac/waku/store/0.1]

This is also fine. Alternatively, [vac/waku/0.2, vac/waku/0.2/relay]. It depends what is more important: capability or version. Traditionally, it has been version. Also, this is just a hint because one can describe only using vac/waku/0.2 and the capabilities should be still confirmed in the handshake.

Each peer also must share its capabilities/data being server and it should be possible to traverse through the peers list for a given protocol in order to select only these that supports a given capability -- just like @zah described it.

it comes with one additional restriction - the capabilities are assumed to be symmetric. If you support the light client protocol for example, it's assumed that you'll be able to respond to server-side requests even when you are just a client.

It means that each type of node must support all packet codes within a given protocol but the response might be error: not supported. In this example, the node supporting only light client protocol might decide to disconnect from that peer assuming it is malicious one and does not respect the capabilities exchanged in the handshake.

Exchanging capabilities in a handshake proved to be working fine but extra care needs to be taken to make sure the handshake is backward and forward compatible. A separate message to update the capabilities also sounds good but I would reserve a single packet to do that instead of one packet per capability like the current Whisper spec describes it.

In terms of peers discovery mechanism, I think it's a separate problem. A situation when we decide that we match by the protocol string and capabilities are exchanged in the handshake should still allow a node to register itself in the discovery mechanism using either capabilities and/or topics. If a peers discovery protocol is very generic, the spec must describe how to register in each of these discovery protocols.

@arnetheduck

This comment has been minimized.

Copy link

@arnetheduck arnetheduck commented Jan 9, 2020

version

might want to keep these simple/opaque - the more complicated they get (ordering, semver), the harder it is to create a forward-compatible upgrade because other (older) clients in the wild will have complex behaviours.

a dumb opaque string that's matched exactly is generally the easiest to deal with - clients then generally implement a facade that simulates old versions they still support and have the logic of deciding protocol priorities locally in them if there are several similar protocols that do the job (instead of using the version string for this). in this world, versions don't exist really, only completely separate protocols that happen to have some similarities.

there's a lot of juicy info in https://github.com/ethereum/eth2.0-specs/blob/dev/specs/networking/p2p-interface.md#design-decision-rationale (edit: new link https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md) - the eth2 networking spec on top of libp2p - during the development of that spec, many of the same questions were posed.

bootstrap / discovery

the critical difference here is how much you think you know about a peer before connecting to them - rich discovery allows you to not connect to uninteresting peers, but carries a cost in terms of a more expensive peer discovery dht. you can generally get peer information from many sources: dht, lan, gossip between already connected peers, connect-and-negotiate etc etc - ideally, the information content from all these sources is the same but practicalities might make some richer than others requiring peers to act on incomplete information (ie dht's often operate over udp that has a datagram size limit)

symmetric

this is an interesting point, though one can usually deal with this in an in-protocol negotiation (again, it's all about how early you can discard a peer as uninteresting)

changing capabilities

the simple way is to disconnect and reconnect - because you are now effectively a different peer with different capabilities - generally, this is supposed to be fairly cheap in a p2p system. fancy reconnection messages and strategies sound like something to leave a later version if it becomes a actual problem.

@oskarth oskarth moved this from Triage to In progress in Waku project Jan 9, 2020
@oskarth oskarth self-assigned this Jan 9, 2020
@oskarth oskarth mentioned this issue Jan 12, 2020
2 of 6 tasks complete
@decanus

This comment has been minimized.

Copy link
Member

@decanus decanus commented Jan 15, 2020

How does this play in with the previous discussion we had on compatibility (#41), it seems like version numbers need to be assigned to each capability vs waku as a whole am I right in that assumption?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Waku project
  
In progress
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.