feat(peer_db)!: more accurate peer stats per address #5142

stringhandler · 2023-01-26T08:36:07Z

Description

Removes a lot of duplicate stats and online/offline fields on peers.

Motivation

Due to a recent bug, wallets would get a new address every time they restarted. There was some handling of peers with multiple addresses, but in general only the latest address signed by the peer was used. This prevents a node from being a bridge between tor and IP or even IP and DNS. This PR controversially (I'm sure) removes the signed address and instead allows a node to try an address to see if the correct public key is on the other end. If so, the peer can choose the best latency according to communicate with it.

IMO, this is no less secure, since the worst case is that a node can force another node to contact an address that is not correct, only to find out the node at the other end has a different public key, This already can be done by generating a new peer id, using the address and signing it with the new peer id (i.e a sybil address).

Previously a peer may have been banned for answering an address with the wrong public key, but this can also be manipulated. Now, only the address is marked as bad.

Also reworked the general sorting of peer addresses, although in most previous cases, there would only be one address.

Also reworked the order in which the peer tries neighbours. In order to make it more predictable and prevent looping, the connectivity service will try to find it's neighbours in a strict distance away from it. Previously it took into account a last_attempted_at or last_seen_at, which I am not convinced was working.

You'll note that there are some tokio improvements as well.

During the testing, I noticed that the peer query was, and still is, extremely heavy. In tokio console, queries took in the order of 10s to 100s. Moving this to a block_in_place frees up the tokio worker threads. In a future PR I will improve this even further by removing unnecessary calls.

Also:

Removed some dead code OptionalBoundedExecutor and replaced the re-export of runtime from tokio
Used standard tokio spawns for BoundedExecutor instead of holding a runtime reference.

BREAKING CHANGE: Note that because the Peer struct has changed so dramatically, a migration is not really pragmatic. The peer_db for wallets and base nodes must be deleted on before starting up

SWvheerden

This looks good, brings much-needed address functionality.
I tried to mark all commented out code there is.

I do however thing we should not remove address singing. I think each address needs to be signed. The reason for this is, it will allow a peer to poison the address of some other peer by spamming unsigned addresses to the network obfuscating the peer's true address and make it harder to connect to it.
If we force peers to sign each address, then peers can only poison their own addresses and not those of other peers.

SWvheerden · 2023-01-30T06:47:16Z

applications/tari_app_grpc/proto/network.proto

@@ -48,7 +48,7 @@ message Peer{
    /// Features supported by the peer
    uint64 features = 9;
    /// Connection statics for the peer
-    google.protobuf.Timestamp last_connected_at = 10;    /// Protocols supported by the peer. This should not be considered a definitive list of supported protocols and is
+//    google.protobuf.Timestamp last_connected_at = 10;    /// Protocols supported by the peer. This should not be considered a definitive list of supported protocols and is


SWvheerden · 2023-01-30T06:52:58Z

comms/core/src/connection_manager/common.rs

+            // peer.connection_stats.set_connection_success();
+            peer.addresses.update_addresses(addresses);
+            // peer.set_offline(false);


remove comments?
Or put these lines back?

SWvheerden · 2023-01-30T06:55:59Z

comms/core/src/connection_manager/dialer.rs

-                        address,
-                        dial_state.peer().node_id.short_str()
-                    );
+        for address in addresses {


Could it not be possible to make this a parallel loop to make it execute faster? This could get quite slow if the peer has multiple old addresses.

Perhaps with something like: rayon::par_iter

Yeah, I did consider that. In this case tokio would be better than rayon, because it would wait on the network dial. Rayon would be a better choice if it were CPU bound. The dials already, for a few cases, are parallelized during a call to each peer, but you are correct in saying that if a peer had many addresses that it would take a while.

Some things to note though:

The addresses for a peer are always sorted by their quality score, so the best addresses will be at the front.

Addresses that have failed previously are moved to the back of the list, so will be tried after all others.

So from these, it makes sense to keep it trying addresses in synch and to exit on the first dialled address

SWvheerden · 2023-01-30T07:01:32Z

comms/core/src/connectivity/manager.rs

@@ -472,17 +480,20 @@ impl ConnectivityManagerActor {
    fn mark_peer_succeeded(&mut self, node_id: NodeId) {
        let entry = self.get_connection_stat_mut(node_id);
        entry.set_connection_success();
+        // self.peer_manager.mark_last_seen(node_id.clone(), Some(addr_used));


SWvheerden · 2023-01-30T07:01:38Z

comms/core/src/connectivity/manager.rs

    }

    fn mark_peer_failed(&mut self, node_id: NodeId) -> usize {
        let entry = self.get_connection_stat_mut(node_id);
        entry.set_connection_failed();
+        // self.peer_manager.mark_peer_failed(node_id, addr_tried);


SWvheerden · 2023-01-30T07:12:00Z

comms/dht/src/connectivity/mod.rs

                    return false;
                }

                let is_excluded = excluded.contains(&peer.node_id);
                if is_excluded {
-                    excluded_count += 1;
+                    // excluded_count += 1;


SWvheerden · 2023-01-30T07:12:02Z

comms/dht/src/connectivity/mod.rs

                    .unwrap_or(false)
                {
-                    connect_ineligable_count += 1;
+                    // connect_ineligable_count += 1;


SWvheerden · 2023-01-30T07:12:04Z

comms/dht/src/connectivity/mod.rs

                    return false;
                }

                if connected.contains(&&peer.node_id) {
-                    already_connected += 1;
+                    // already_connected += 1;


SWvheerden · 2023-01-30T07:14:21Z

comms/dht/src/peer_validator.rs

+                // let can_update = can_update && {
+                //     // Update/insert peer if newer
+                //     // unreachable panic: can_update is true only is identity_signature is present and valid
+                //     // let new_dt = new_peer
+                //     //     .identity_signature
+                //     //     .as_ref()
+                //     //     .map(|i| i.updated_at())
+                //     //     .expect("unreachable panic");
+                //
+                //     // Update if new_peer has newer timestamp than current_peer, and if the newer timestamp is after
+                // the     // added date
+                //     current_peer
+                //         .identity_signature
+                //         .as_ref()
+                //         .map(|i| i.updated_at() < new_dt && (
+                //             !current_peer.is_seed() ||
+                //             current_peer.added_at < new_dt.naive_utc()))
+                //         // If None, update to peer with valid signature
+                //         .unwrap_or(true)
+                // };
+
+                // if !can_update {
+                //     debug!(
+                //         target: LOG_TARGET,
+                //         "Peer `{}` already exists or is up to date and will not be updated", new_peer.node_id
+                //     );
+                //     return Ok(false);
+                // }


SWvheerden · 2023-01-30T07:14:33Z

comms/dht/src/peer_validator.rs

+                // current_peer
+                //     .update_addresses(new_peer.addresses.into_vec())
+                //     .set_features(new_peer.features);
+                // if let Some(sig) = new_peer.identity_signature {


I dont think we should remove this

See the notes in the motivation about this

sdbondi

The goals of this PR are to:

improve currently incomplete support for multiple peer addresses (perhaps to allow support later for WebRTC addresses)
- This has been greatly improved in the PR. Although more work needs to happen, I like the addition of an address score and the improvements in the collection of address stats.
improve the performance and selection strategy for maintaining connections to neighbouring peers
- As far as I can see, there are no improvements made in this area but are TODO for another PR.
- IMO one way to improve this is by maintaining a list of neighbouring and available peers in the peer manager DB. This list can be used and maintained by the DhtConnectivity component.
Improve the propagation of updated address information across the network
- This may have been made worse by removing the ability to validate peer sync information.
- Was there a confirmed bug that caused wallet addresses not to eventually update over the network over time?

Other non-trivial changes:

Removal of PeerIdentitySignature

Previously a peer may have been banned for answering an address with the wrong public key, but this can also be manipulated.

Is this not only true because the peer no longer signs its addresses?

since the worst case is that a node can force another node to contact an address that is not correct

This removes the invariant that each assigned address was in fact provided by the peer, allowing another node to provide arbitrary addresses for a peer on their behalf.

As you said, a peer may provide other self-generated identities. I'd argue this is a distinct case from allowing a peer to modify another peer's address set, even if that is only to add addresses to the set.

Perhaps the worst case is wasting network resources (memory, time, disk, bandwidth), perhaps there are other attacks/abuses not thought of (is a coordinated DDoS possible?).

Could you provide more details on the specific reasons for removing this invariant?

I took a look into what libp2p does
https://github.com/libp2p/rust-libp2p/blob/d344406235c779922e6ffec85f0a94181f3efecc/core/src/peer_record.rs#L62

, but in general only the latest address signed by the peer was used

Minor correction, the address set is signed by the peer not individual addresses. The handling of multiple addresses was primitive and even incorrect, this PR greatly improves this. This improvement is not precluded by the removal of signed address sets.

comms/core/src/connection_manager/dialer.rs

sdbondi · 2023-01-30T06:29:27Z

comms/dht/src/connectivity/mod.rs

@@ -363,6 +364,10 @@ impl DhtConnectivity {
    }

    async fn refresh_neighbour_pool(&mut self) -> Result<(), DhtConnectivityError> {
+        if self.is_refreshing_neighbour_pool {


is_refreshing_neighbour_pool will always be false here. There is no concurrency in DhtConnectivity (it runs on a single thread at a time) so this function has to exit before it can be called again.

Cool, I will remove this. It's easy to see in tokio::console whether this is in fact the case, but I believe you are correct and will confirm

sdbondi · 2023-01-30T06:39:59Z

comms/dht/src/connectivity/mod.rs

-            .sort_by(PeerQuerySortBy::DistanceFromLastConnected(node_id))
-            // Fetch double here so that there is a bigger closest peer set that can be ordered by last seen
-            .limit(n * 2);
+            // TODO: Check if this logic is correct. At the moment I want them in the same order every time


This is fine for the purposes of evaluating performance. The previous method would include a larger number of peers so that there is a better chance of selecting a peer that was recently available. This call was notably heavy and not an adequate or good strategy, a symptom of not having a good way to maintain an optimised/cached list of highly available neighbours in the peer manager (or some similar solution e.g. k-buckets).

My problem with the previous approach was that it could end up in a stale loop, if the peers came online again before new peers were tried. This approach is more predictable and ensures that all the peers are tried. It may be a symptom of the current network state, but the majority of these nodes are actually offline.

Given that the offline data is more accurate now, it may be much of a muchness. I think I would however lobby for selecting the same n (instead of n*2) peers from the DB and returning them directly.

n == 8 by default.
The goal was to fetch peers that we recently connected to, ordered by distance. When a peer comes back online, it will send a join message, which will be propagated to neighbours, and/or directly connect. This will update the last_seen which will put them back on the top of the list for dialing if ordered by last seen.

This change will select the same 8 peers (unless a nearer one joins) which may never come online again. Currently "culling" of offline peers is a slow process, I think we have to attempt and fail to connect to them at least once and then we have to have not connected to them for 24 hours before we remove them from the peer list entirely.

I think a good solution would be to maintain a list that can be loaded that has available peers. Essentially just optimising this peer query from an LMDB perspective.

In this PR, as soon as an address fails it is immediately marked offline. If all addresses are offline, the peer is offline

sdbondi · 2023-01-30T07:01:57Z

comms/dht/src/peer_validator.rs

+                //     .set_features(new_peer.features);
+                // if let Some(sig) = new_peer.identity_signature {
+                //     current_peer.set_valid_identity_signature(sig);
+                // }
                self.peer_manager.add_peer(current_peer).await?;


Noting that add_peer is changed to only append new addresses to mitigate the fact that addresses are no longer signed. It also merges stats for addresses however that does not apply here because the peer sync message cannot and should not contain this data.

Should a remote peer F be allowed to add any amount (1000s, 10_000, ?) of addresses for any peer? If not, perhaps we should make the maximum e.g. 10. If the remote peer F provides 10 new addresses for peer A which already has 5 in our peer manager, we have no choice but to discard 5 of the addresses (at random?) provided by peer F. If peer A already has 10, we cannot add any addresses as all of them could be false.

Comparison of happy path case:

Setup: Peer A has an updated address set, peer B is made aware of the change via an existing authenticated mechanism (DHT join, DHT discovery, direct connection). There are no malicious peers.

This PR:
Peer X later connects to peer B and appends new addresses given by peer B for peer A. Peer X dials Peer A using out-of-date addresses that have a higher score first, but eventually uses one of the updated addresses which have no score after some time.

Previous with signed peer record:
Peer X later connects to peer B and sets the new addresses given by peer B for peer A. Peer X can be sure that Peer B is giving the correct address set for peer A and therefore can use that address set (discarding addresses that it knows that peer A is not using). Peer B dials Peer A using the correct address set without having to first discover the previous addresses are out of date.

stringhandler · 2023-01-30T07:52:08Z

I see there is a lot of contention about removing the address signature. Probably not unfounded. I will look into adding the signed address back, but the structure will need to change. Currently, it is difficult to reassemble the challenge when multiple addresses are signed, so I will change the identity message to sign each address in a way that makes it easy to re-verify and store with each address and see how that looks

hansieodendaal · 2023-01-31T06:22:36Z

This PR touches many things and is a bit big to comprehend what the actual system-wide effects would be without proper stress testing.

I agree that it is an improvement to mark an address as bad and not ban the peer straight away, although I think that eventually, a peer should be medium-term banned if none of its addresses is working, however, I do not agree that addresses should not be signed anymore.

Maybe we can implement just one thing from this PR at a time and evaluate it, leaving the changes with the biggest system impact for last?

# Conflicts: # Cargo.lock # comms/dht/src/inbound/decryption.rs

# Conflicts: # base_layer/p2p/src/peer_seeds.rs # comms/dht/src/storage/connection.rs

stringhandler

self reviewed and removed all commented code

### ⚠ BREAKING CHANGES * **peer_db:** more accurate peer stats per address (5142) * use consensus hashing API for validator node MMR (5207) * **consensus:** add balanced binary merkle tree (5189) ### Features * add favourite flag to contact ([5217](#5217)) ([0371b60](0371b60)) * add indexer config ([5210](#5210)) ([cf95601](cf95601)) * add merge proof for balanced binary merkle tree ([5193](#5193)) ([8962909](8962909)) * **consensus:** add balanced binary merkle tree ([5189](#5189)) ([8d34e8a](8d34e8a)) * log to base dir ([#5197](#5197)) ([5147b5c](5147b5c)) * **peer_db:** more accurate peer stats per address ([5142](#5142)) ([fdad1c6](fdad1c6)) ### Bug Fixes * add grpc commitment signature proto type ([5200](#5200)) ([d523f1e](d523f1e)) * peer seeds for esme/igor ([5202](#5202)) ([1bc226c](1bc226c)) * remove panics from merged BBMT verification ([5221](#5221)) ([a4c5fce](a4c5fce)) * source coverage ci failure ([5209](#5209)) ([80294a1](80294a1)) * use consensus hashing API for validator node MMR ([5207](#5207)) ([de28115](de28115)) * wallet reuse existing tor address ([5092](#5092)) ([576f44e](576f44e)) * **wallet:** avoids empty addresses in node identity ([5224](#5224)) ([1a66312](1a66312))

hnidoaht-101 · 2023-03-21T02:54:00Z

hi team, I know this PR was already merged. Just curious how @stringhandler used tokio-console to troubleshoot the long-running queries. Is it possible to do a demo for it?

stringhandler · 2023-03-28T10:55:17Z

Sure, will try put something together

hnidoaht-101 · 2023-03-30T06:20:38Z

Sure, will try put something together

Thanks a lot. 💯 I'll be waiting for it.

hnidoaht-101 · 2023-03-31T16:15:33Z

Sure, will try put something together

@stringhandler Can you just tag my name once you have something? Thank you.

stringhandler · 2023-04-03T07:05:21Z

will do

stringhandler added 4 commits January 26, 2023 10:30

WIP

1cd6381

WIP 2 - better dialing

8abb5d6

block on calc region threshold

8d805f0

fix tests

f2d623f

stringhandler changed the title ~~WIP - better stats on peer addresses~~ feat!: more accurate peer stats per address Jan 27, 2023

stringhandler marked this pull request as ready for review January 27, 2023 14:37

stringhandler added the P-controversial Process - This PR or Issue is controversial and/or requires more attention that simpler issues label Jan 27, 2023

stringhandler mentioned this pull request Jan 27, 2023

fix: wallet reuse existing tor address #5092

Merged

SWvheerden reviewed Jan 30, 2023

View reviewed changes

sdbondi reviewed Jan 30, 2023

View reviewed changes

stringhandler added 15 commits February 1, 2023 17:36

wip

3b7fdcd

wip

c029543

wip

dfe0ed6

wip

defaeb4

removed block_inplace

caf5f24

wip

90f1013

discovery

1be61a8

Merge branch 'development' into st-comms-addresses-data

bf671ff

# Conflicts: # Cargo.lock # comms/dht/src/inbound/decryption.rs

fix clippy

e704408

wip fixing compile

4060fbb

fix tests

ef2eddd

fix tests

fea06f0

fix ignore

10e40f4

Merge branch 'development' into st-comms-addresses-data

87cc0c5

# Conflicts: # base_layer/p2p/src/peer_seeds.rs # comms/dht/src/storage/connection.rs

fix clippy

14f9afc

stringhandler changed the title ~~feat!: more accurate peer stats per address~~ feat(peer_db)!: more accurate peer stats per address Feb 22, 2023

stringhandler added 2 commits February 28, 2023 17:18

Merge branch 'development' into st-comms-addresses-data

da2582d

debugging cucumber

0b0e02e

stringhandler added 5 commits March 1, 2023 15:21

Merge branch 'development' into st-comms-addresses-data

1694f06

debugging cucumber

8588c03

debugging cucumber

c07d8a6

wait for mempool

4348603

last clippy hopefully

adf80b0

stringhandler commented Mar 3, 2023

View reviewed changes

last last clippy

f7c4818

stringhandler merged commit fdad1c6 into tari-project:development Mar 3, 2023

stringhandler deleted the st-comms-addresses-data branch March 3, 2023 13:59

sdbondi approved these changes Mar 3, 2023

View reviewed changes

SWvheerden mentioned this pull request Mar 7, 2023

chore(release): v0.48.0-pre.0 #5225

Merged

ghpbot-tari-project added the P-acks_required Process - Requires more ACKs or utACKs label Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(peer_db)!: more accurate peer stats per address #5142

feat(peer_db)!: more accurate peer stats per address #5142

stringhandler commented Jan 26, 2023 •

edited

Loading

SWvheerden left a comment

SWvheerden Jan 30, 2023

SWvheerden Jan 30, 2023

SWvheerden Jan 30, 2023

SWvheerden Jan 30, 2023

stringhandler Jan 30, 2023

SWvheerden Jan 30, 2023

SWvheerden Jan 30, 2023

SWvheerden Jan 30, 2023

SWvheerden Jan 30, 2023

SWvheerden Jan 30, 2023

SWvheerden Jan 30, 2023

SWvheerden Jan 30, 2023

stringhandler Jan 30, 2023

sdbondi left a comment •

edited

Loading

sdbondi Jan 30, 2023

stringhandler Jan 30, 2023

sdbondi Jan 30, 2023

stringhandler Jan 30, 2023

sdbondi Jan 30, 2023

stringhandler Jan 30, 2023

sdbondi Jan 30, 2023 •

edited

Loading

stringhandler commented Jan 30, 2023

hansieodendaal commented Jan 31, 2023

stringhandler left a comment

hnidoaht-101 commented Mar 21, 2023

stringhandler commented Mar 28, 2023

hnidoaht-101 commented Mar 30, 2023

hnidoaht-101 commented Mar 31, 2023 •

edited

Loading

stringhandler commented Apr 3, 2023

feat(peer_db)!: more accurate peer stats per address #5142

feat(peer_db)!: more accurate peer stats per address #5142

Conversation

stringhandler commented Jan 26, 2023 • edited Loading

Description

Motivation

SWvheerden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdbondi left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdbondi Jan 30, 2023 • edited Loading

Choose a reason for hiding this comment

stringhandler commented Jan 30, 2023

hansieodendaal commented Jan 31, 2023

stringhandler left a comment

Choose a reason for hiding this comment

hnidoaht-101 commented Mar 21, 2023

stringhandler commented Mar 28, 2023

hnidoaht-101 commented Mar 30, 2023

hnidoaht-101 commented Mar 31, 2023 • edited Loading

stringhandler commented Apr 3, 2023

stringhandler commented Jan 26, 2023 •

edited

Loading

sdbondi left a comment •

edited

Loading

sdbondi Jan 30, 2023 •

edited

Loading

hnidoaht-101 commented Mar 31, 2023 •

edited

Loading