Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: preventing IP 0.0.0.0 from being published #1982

Merged
merged 1 commit into from
Sep 11, 2023

Conversation

gabrielmer
Copy link
Contributor

@gabrielmer gabrielmer commented Sep 1, 2023

Description

Currently, the IP address 0.0.0.0 is being published in the Identify message when there's no external addresses found. This happens in cases when Waku nodes are being run locally.

This goes against RFC 1122, section 3.2.1.3 where referring to the 0.0.0.0 IP it states:

This host on this network. MUST NOT be sent, except as a source address as part of an initialization procedure by which the host learns its own IP address.

In that case, we will publish localhost's loopback interface's address 127.0.0.1 instead.

Changes

  • Added function formatListenAddress, which checks if IP 0.0.0.0 is present in the node's published address and if so, replaces it for 127.0.0.1.
  • Added an unit test suite for this functionality
  • Allowing the function addPeer to modify an existing peer entry whenever it's missing ENR data
  • Modified tests that used and expected the IP 0.0.0.0 to use IP 127.0.0.1

How to test

When running the node locally, if no IP address is specified (or if the 0 IP is provided as an input), the node should publish the 127.0.0.1 IP instead.

image

If a custom IP is provided, it should be published

image

Issue

closes #1427

@github-actions
Copy link

github-actions bot commented Sep 1, 2023

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:1982

Built from 6d34228

@kaiserd
Copy link
Contributor

kaiserd commented Sep 4, 2023

Note: publishing 127. IP addrs would also go against the same RFC.
Same RFC section 3.2.1.3 (g).

{ 127, <any> }

                 Internal host loopback address.  Addresses of this form
                 MUST NOT appear outside a host.

If there is no canonical libp2p way to handle this (not aware of any atm), I'd use /tcp/60000/... to indicate that this multiaddr does not have an (external) IP address associated with it.

@gabrielmer gabrielmer self-assigned this Sep 5, 2023
@github-actions
Copy link

github-actions bot commented Sep 5, 2023

You can find the experimental image built from this PR at

quay.io/wakuorg/nwaku-pr:1982-experimental

Built from 6d34228

@gabrielmer gabrielmer force-pushed the fixing-identify-message-on-local-runs branch from f781669 to 7772e87 Compare September 5, 2023 16:55
@gabrielmer gabrielmer marked this pull request as ready for review September 6, 2023 06:56
@@ -121,6 +121,7 @@ proc addPeer*(pm: PeerManager, remotePeerInfo: RemotePeerInfo, origin = UnknownO
discard remotePeerInfo.peerId.extractPublicKey(publicKey)

if pm.peerStore[AddressBook][remotePeerInfo.peerId] == remotePeerInfo.addrs and
not ($remotePeerInfo.addrs).contains("127.0.0.1") and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this is needed, please? I cannot wrap my head around it, sorry:D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! So we were having issues with the peer exchange tests in which we weren't able to add peers when running locally after the initial change we made. pm.peerStore[AddressBook][remotePeerInfo.peerId] equals 127.0.0.1 in local runs, and remotePeerInfo.addrs used to equal 0.0.0.0 but after our fix to the Identify message, it equals 127.0.0.1.

Because this if statement was always yielding true, it considered the IP 127.0.0.1 as duplicate, returned early from the addPeer procedure and we weren't able to add local peers. So we added a special case to bypass that check when it comes to localhost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that is strange, isn't it? We are also checking the publicKey, so that would mean that not only the address is the loopback interface, but also the pubKey is the same - IOW this check should be fals if you have multiple local node peers as long as they have different peerId, so the explicit check for 127.0.0.1 should not be necessary. Or I am still missing something:D

Can you share which tests were failing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I look further into it I agree that there's something weird in there.
Maybe the public key check is reduntant/wrong? If pm.peerStore[KeyBook][remotePeerInfo.peerId] has to have the value of the public key for a peerId, and publicKey is a variable with the public key of the peer we are checking - aren't they supposed to be always the same?

The test that fails without that line is nwaku/tests/test_waku_peer_exchange test 'retrieve and provide peer exchange peers from discv5'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would probably make sense to check that the remotePeerInfo.peerId is a key in the KeyBook, but checking the match with publicKey does not really add value, since the peerId is (AFAIU) directly derived from publicKey, so it will be always true as long as the key exists (which we already verify by the first check for AddressBook, IMO).

Now the question is why does it not work without the check for loopback..

Since the peerId should be unique we should get false from the AddressBook check when first seeing the new node regardless of which addresses it advertises.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I commented out this line and ran the test you mentioned locall and everything passed:)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly! I'll take a deeper look on what's going on with the AddressBook check.
In my test runs pm.peerStore[KeyBook][remotePeerInfo.peerId] were always returning 127.0.0.1, but if we are adding a new peer there shouldn't be an address assigned to it on the AddressBook

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I commented out this line and ran the test you mentioned locall and everything passed:)

That's really weird, without that line I couldn't get the test to pass neither locally nor in the CI. I'll DM you to have a look :)

@alrevuelta
Copy link
Contributor

When running the node locally, if no IP address is specified (or if the 0 IP is provided as an input), the node should publish the 127.0.0.1 IP instead.

Unsure I undestand why we want to advertise 127.0.0.1? If no nat=extip:<IP> is provided, perhaps we should default to the public ip of the node? Which well, we would need a 3rd party to give it to us. How is nimbus doing this? Perhaps we can use it as inspiration.

@jm-clius
Copy link
Contributor

jm-clius commented Sep 6, 2023

@alrevuelta just to give some background: this only addresses the marginal case where no external IP of the node was configured or could be determined in any other way (NAT-PMP, UPnP, 3rd party, etc.). In other words, it really only affects the case where you are running node(s) locally. The issue was that you would then wrongly advertise the default 0.0.0.0 (in identify), which causes unexpected behaviour and even crashes in e.g. js-waku. There are two possible solutions to this:

  1. The most correct thing to do is probably to not advertise any addresses. However, this makes local building and experiments a bit difficult (e.g. you won't have a usable address logged for Listening on...).
  2. Default to localhost/loopback when you really couldn't determine an external IP by any means. This should be the exception, but at least allows proper local setups, consistent logging/debugging.

The reason for going with (2) is that it fixes the bug - i.e. undefined behaviour when including a zero IP in identify, but doesn't significantly impact how local nodes have been running.

@gabrielmer gabrielmer marked this pull request as draft September 6, 2023 09:21
@gabrielmer
Copy link
Contributor Author

gabrielmer commented Sep 6, 2023

Labeled the PR again as draft, as it is still WIP.

We're investigating why the following test is not passing without the exception for the the loopback address:

https://github.com/waku-org/nwaku/blob/master/tests/test_waku_peer_exchange.nim#L74-L194

@Menduist
Copy link
Contributor

Menduist commented Sep 6, 2023

The proper way to do this is to use an addressMapper that would remove every addresses whos IP is 0.0.0.0, I think
cc @diegomrsantos

@gabrielmer
Copy link
Contributor Author

We tried the approach of using an addressMapper to remove every 0.0.0.0 IP. However, the previous workaround seems necessary as it doesn't affect Identify when we return an empty set of IPs. The address field is required in Identify and if we have an empty set, 0.0.0.0 is populated.
cc @diegomrsantos

@gabrielmer gabrielmer force-pushed the fixing-identify-message-on-local-runs branch from 96ab7e8 to 013dd5a Compare September 7, 2023 13:32
Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach lgtm now. :) Feel free to mark as ready for review when you feel it's mergeable and I'll do a full review, but should be an approve.

pm.peerStore[KeyBook][remotePeerInfo.peerId] == publicKey:
# Peer already managed
pm.peerStore[KeyBook][remotePeerInfo.peerId] == publicKey and
pm.peerStore[ENRBook][remotePeerInfo.peerId].raw.len > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, could you add a comment (to the issue, not the code) on why this is necessary? Is this because these peers were explicitly connected to in addition to being added via discovery?

@gabrielmer
Copy link
Contributor Author

The issue found was the following:

We are calling addPeer multiple times for the same peerId. Only the first call adds the peer and subsequent calls for the same peer are dropped in the duplicate check.
The problem is that for every peer, its first call to addPeer doesn't contain the peer's ENR data, but subsequent calls do. Because we see the peer as duplicate and return immediately, we are not filling the peer's ENR information which is later needed for establishing connections.

With the new fix, if we call addPeer for a peerId that we already have saved, if we don't have its ENR information yet we don't fail the duplicate check and continue to update the peer's data with the latest information. It means that addPeer will work as an addition function when the peer doesn't exist, and as an update function in the case it exists but information is incomplete.

Why did it work before? Out of luck it seems.
Before we changed the published IP from 0.0.0.0 to 127.0.0.1, the first call for adding each peer assigned the peer the IP 0.0.0.0 and subsequent calls for the same peer were done with the IP 127.0.0.1.
As the calls that contained the ENR information were done with a different IP than the first call, the check didn't see it as duplicate and let it update the peer information correctly.

After changing the published IP to 127.0.0.1, the first call for adding a peer is done with 127.0.0.1, the same IP as the later calls. So in this case it was correctly flagged as duplicate earlier and did not let us update the ENR information once we had it.

@gabrielmer gabrielmer marked this pull request as ready for review September 7, 2023 17:43
Copy link
Contributor

@SionoiS SionoiS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The only question I have is unrelated, why do we call addPeer() twice?

I'm looking into it...

@gabrielmer
Copy link
Contributor Author

gabrielmer commented Sep 7, 2023

LGTM

The only question I have is unrelated, why do we call addPeer() twice?

I'm looking into it...

According to stack traces I added, the calls come from the functions connectRelay and searchLoop, so it seems to be exactly the reason Hanno pointed out - that nodes are being added both by discovery and explicitly in the test

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for it!


import
stew/shims/net as stewNet,
std/[strutils],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick

Suggested change
std/[strutils],
std/strutils,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thank you! :)

@gabrielmer gabrielmer force-pushed the fixing-identify-message-on-local-runs branch from 013dd5a to 6788709 Compare September 8, 2023 15:24
@jm-clius
Copy link
Contributor

jm-clius commented Sep 8, 2023

LGTM! Well done. :)

Copy link
Member

@vpavlin vpavlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! Good job, sir!

@gabrielmer gabrielmer force-pushed the fixing-identify-message-on-local-runs branch from 6788709 to 33769b1 Compare September 8, 2023 17:22
@vpavlin vpavlin added the release-notes Issue/PR needs to be evaluated for inclusion in release notes highlights or upgrade instructions label Sep 11, 2023
@gabrielmer gabrielmer merged commit 47ae19c into master Sep 11, 2023
29 of 30 checks passed
@gabrielmer gabrielmer deleted the fixing-identify-message-on-local-runs branch September 11, 2023 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes Issue/PR needs to be evaluated for inclusion in release notes highlights or upgrade instructions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: 0.0.0.0 included in listenAddrs of identify message
8 participants