Addnode fixes by lploom · Pull Request #10 · unicitynetwork/unicity-node

lploom · 2026-04-28T14:41:25Z

No description provided.

Make addnode rpc call synchronous

gemini-code-assist

Code Review

This pull request implements synchronous connection handling for the addnode RPC by introducing an optional completion callback to the network manager and using std::promise and std::future to wait for the result. The feedback suggests adding an explicit default case to the ConnectionResult switch statement for better maintainability and replacing the non-portable <unistd.h> header in unit tests with standard C++ chrono and thread utilities to ensure cross-platform compatibility.

gemini-code-assist · 2026-04-28T14:46:14Z

+  switch (r) {
+    case network::ConnectionResult::Success:
+      return "Connected";
+    case network::ConnectionResult::NotRunning:
+      return "Network is not running";
+    case network::ConnectionResult::AddressBanned:
+      return "Address is banned";
+    case network::ConnectionResult::AddressDiscouraged:
+      return "Address is discouraged";
+    case network::ConnectionResult::AlreadyConnected:
+      return "Already connected (or connection attempt already in flight)";
+    case network::ConnectionResult::NoSlotsAvailable:
+      return "No outbound connection slots available";
+    case network::ConnectionResult::TransportFailed:
+      return "Transport connect failed";
+  }
+  return "Unknown connection error";


The switch statement on network::ConnectionResult handles all current enum members, but lacks a default case. The return statement after the switch acts as an implicit default, which can be less clear and may prevent compilers from warning about unhandled enum values if new members are added in the future. For improved maintainability and robustness, it's better to include an explicit default case.

switch (r) { case network::ConnectionResult::Success: return "Connected"; case network::ConnectionResult::NotRunning: return "Network is not running"; case network::ConnectionResult::AddressBanned: return "Address is banned"; case network::ConnectionResult::AddressDiscouraged: return "Address is discouraged"; case network::ConnectionResult::AlreadyConnected: return "Already connected (or connection attempt already in flight)"; case network::ConnectionResult::NoSlotsAvailable: return "No outbound connection slots available"; case network::ConnectionResult::TransportFailed: return "Transport connect failed"; default: return "Unknown connection error"; }

gemini-code-assist · 2026-04-28T14:46:15Z

 #include "util/uint.hpp"
 #include "util/hash.hpp"
 #include <memory>
+#include <unistd.h>


Including <unistd.h> can limit the portability of this test code, as it's specific to POSIX-compliant systems. If this header is being used for functions like sleep() or usleep(), consider using the standard C++ equivalent std::this_thread::sleep_for from the <thread> and <chrono> headers. This will ensure the tests can run on a wider range of platforms, including Windows.

gemini-code-assist · 2026-04-28T14:46:15Z

 #include <memory>
 #include <vector>
 #include <atomic>
+#include <unistd.h>


Including <unistd.h> can limit the portability of this test code, as it's specific to POSIX-compliant systems. If this header is being used for functions like sleep() or usleep(), consider using the standard C++ equivalent std::this_thread::sleep_for from the <thread> and <chrono> headers. This will ensure the tests can run on a wider range of platforms, including Windows.

Copilot

Pull request overview

This PR improves the addnode RPC behavior by waiting for an outbound manual connection attempt to resolve and returning a more accurate success/error response, enabled by threading a new optional completion callback through the network connection stack.

Changes:

Add a synchronous-wait wrapper in RPCServer::HandleAddNode that blocks (bounded) until the async transport connect outcome is known and returns a clearer RPC response.
Extend NetworkManager::connect_to / ConnectionManager::ConnectTo to accept an optional ConnectCompletion callback and invoke it when the async connect path completes.
Add <unistd.h> includes to a couple of unit test files.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`test/unit/chain/notifications_tests.cpp`	Adds a POSIX header include for tests.
`test/unit/chain/miner_tests.cpp`	Adds a POSIX header include for tests.
`src/network/rpc_server.cpp`	Implements bounded synchronous waiting for `addnode` and improved error messaging.
`src/network/network_manager.cpp`	Plumbs a new optional completion callback through `NetworkManager::connect_to`.
`src/network/connection_manager.cpp`	Invokes the new completion callback from the async connect flow with specific `ConnectionResult`s.
`include/network/network_manager.hpp`	Updates `connect_to` signature to accept an optional completion callback.
`include/network/connection_manager.hpp`	Introduces `ConnectCompletion` and updates `ConnectTo` signature accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T15:42:40Z

+          LOG_NET_WARN("outbound connect failed: {}:{} (conn_type={})", address, port,
+                       ConnectionTypeAsString(effective_conn_type));
+        }
+        if (on_complete) {
+          on_complete(ConnectionResult::TransportFailed);


on_complete is invoked while mutex_ is held (inside the asio::post handler). If a caller’s completion handler calls back into ConnectionManager/NetworkManager (or anything else that might take mutex_), this can deadlock or create hard-to-debug re-entrancy. Consider capturing the result under the lock, releasing the lock, and then invoking on_complete (or posting the callback separately) so user code never runs under mutex_.

Copilot · 2026-04-28T15:42:41Z

+  if (future.wait_for(addNodeWaitTimeout) != std::future_status::ready) {
+    return util::JsonError("Connection attempt timed out");
+  }
+  auto final_result = future.get();


future.get() can throw (e.g., std::future_error for a broken promise if the connect attempt is aborted and the completion callback is destroyed without setting the promise). Since this is on an RPC path, it would be better to catch std::future_error around wait_for/get and return a deterministic JsonError message instead of relying on the outer handler exception path.

Suggested change

if (future.wait_for(addNodeWaitTimeout) != std::future_status::ready) {

return util::JsonError("Connection attempt timed out");

}

auto final_result = future.get();

network::ConnectionResult final_result;

try {

if (future.wait_for(addNodeWaitTimeout) != std::future_status::ready) {

return util::JsonError("Connection attempt timed out");

}

final_result = future.get();

} catch (const std::future_error&) {

return util::JsonError("Connection attempt failed");

}

MastaP

Targeted review on top of the existing bot comments — focusing on the new bifurcated sync/async API surface, timeout coupling, and a few cosmetic/refactor notes. Big picture this is fine and useful; the issues are mostly around future-proofing the contract.

Substantive

ConnectCompletion API contract is undocumented. The lambda callers must understand that on_complete is invoked iff ConnectTo returns Success synchronously — for NotRunning / AddressBanned / AlreadyConnected / NoSlotsAvailable / sync-TransportFailed, the early returns drop the callback on the floor. AddNodeSyncConnect happens to handle this correctly by branching on sync_result first, but the next caller will not. (Inline.)
addNodeWaitTimeout is coupled-by-comment to DEFAULT_CONNECT_TIMEOUT. Both constants live in unrelated translation units; tests can also override the transport timeout via connect_timeout_override_ms_ to a value larger than 12s, which would make the RPC report a false-positive timeout while the transport is still working. (Inline.)
Late-completion divergence. If the async transport callback lands after the 12s wait expires, the RPC user sees "Connection attempt timed out" but the peer has actually been added (peer registry, manual_addresses_, outbound count). A retry returns AlreadyConnected, which is confusing. Worth either cancelling the in-flight transport on RPC timeout or documenting the race. (Inline.)
Shutdown amplification (pre-existing pattern, slightly worsened). RPCServer::Stop() drains active requests for at most 5s; addNodeWaitTimeout is 12s, so an in-flight addnode can outlive Stop() by ~7s. After Stop returns, Application::shutdown() proceeds to network_manager_->stop() and the surrounding state can teardown. Long-poll already has the same gap (30s vs 5s), so this isn't new, but the diff makes the entry point bigger. Consider raising the drain timeout to envelope all known handlers, or piping a cancellation token into wait_for.

Cosmetic / refactor
5. addNodeWaitTimeout uses lowerCamelCase; every other compile-time constant in rpc_server.cpp is UPPER_SNAKE_CASE (MAX_CONCURRENT_REQUESTS, MAX_SOCKET_PATH, SHUTDOWN_TIMEOUT_MS, POLL_INTERVAL_MS, STALE_TIP_THRESHOLD, DEFAULT_BANTIME_SEC, LONGPOLL_TIMEOUT_SECONDS). (Inline.)
6. Trailing em-dash with no continuation in the synchronous-rejection branch comment. (Inline.)
7. connect_to(..., const ConnectCompletion& on_complete) then on_complete = on_complete in the capture forces an extra copy. Idiomatic alternative: take ConnectCompletion by value and std::move into the capture — saves one copy of the std::function. (Picky.)
8. The <unistd.h> adds in the two test files are correct on macOS for mkdtemp, but the underlying mkdtemp boilerplate already exists as unicity::test::TempDir in test/common/test_util.hpp (which itself includes <unistd.h>). Replacing the open-coded blocks with TempDir would obviate the include and remove duplication. Not in this PR's scope, but worth a follow-up. (Inline on test file.)

Coverage
No new tests cover the synchronous-wait behavior (timeout path, success-after-handshake, NotRunning/Banned sync rejection). A small RPC-level integration test using a stub transport that defers the ConnectCallback would lock in the new contract and prevent regressions.

MastaP · 2026-04-28T16:12:11Z

 enum class ConnectionResult;  // From network_manager.hpp

+// Optional connection completion callback fired from the io_context thread
+using ConnectCompletion = std::function<void(ConnectionResult)>;


Please document the contract here, since it's non-obvious from the type alone:

on_complete is invoked iff the corresponding ConnectTo call returned ConnectionResult::Success synchronously. On any synchronous-failure return (NotRunning, AddressBanned, AddressDiscouraged, AlreadyConnected, NoSlotsAvailable, sync TransportFailed), it is not invoked.

It runs on the io_context thread while mutex_ is held (also flagged elsewhere) — callbacks must not re-enter ConnectionManager/NetworkManager and should keep work to a minimum (e.g. setting a promise).

It fires at most once.

The current AddNodeSyncConnect happens to get this right because it branches on the sync result before touching the future, but a future caller without this context can easily wait on a promise that will never be set.

MastaP · 2026-04-28T16:12:11Z

+// timeout (see DEFAULT_CONNECT_TIMEOUT in include/network/real_transport.hpp,
+// currently 10s) so the transport always reports a real result before this
+// fires; this is a defensive outer bound in case the callback never fires.
+constexpr auto addNodeWaitTimeout = std::chrono::seconds(12);


Two concerns about the timeout choice:

Naming convention. Every other compile-time constant in this file is UPPER_SNAKE_CASE (MAX_CONCURRENT_REQUESTS, MAX_SOCKET_PATH, SHUTDOWN_TIMEOUT_MS, POLL_INTERVAL_MS, STALE_TIP_THRESHOLD, DEFAULT_BANTIME_SEC, LONGPOLL_TIMEOUT_SECONDS). Suggest ADDNODE_WAIT_TIMEOUT for consistency.

Coupled-by-comment to RealTransportConnection::DEFAULT_CONNECT_TIMEOUT. The 12s value only "works" because it's > 10s, but that relationship lives only in this comment. Two failure modes:

If someone bumps DEFAULT_CONNECT_TIMEOUT in real_transport.hpp to e.g. 15s without grepping, the addnode RPC will start reporting false-positive timeouts.

Tests can already override the transport timeout via RealTransportConnection::connect_timeout_override_ms_ (a public static atomic), which silently breaks the relationship in test scenarios.

Consider deriving programmatically, e.g.:

constexpr auto ADDNODE_WAIT_TIMEOUT = network::RealTransportConnection::DEFAULT_CONNECT_TIMEOUT + std::chrono::seconds(2);

(Or expose a public accessor on the transport.) That removes the comment-only coupling and survives test overrides if connect_timeout_ms() is consulted instead.

MastaP · 2026-04-28T16:12:11Z

+      [promise](network::ConnectionResult r) { promise->set_value(r); });
+
+  if (sync_result != network::ConnectionResult::Success) {
+    // Synchronous rejection (NotRunning, AlreadyConnected, Banned etc) —


Trailing em-dash with no continuation — looks like an unfinished thought. Either complete it (... — return immediately, no async path will fire) or drop the dash.

MastaP · 2026-04-28T16:12:11Z

+  }
+
+  if (future.wait_for(addNodeWaitTimeout) != std::future_status::ready) {
+    return util::JsonError("Connection attempt timed out");


Late-completion race: if the transport callback lands between 12s and ~12s+ε, this branch returns "Connection attempt timed out" but the peer is actually being added (peer registry, manual_addresses_, outbound count). The user thinks the addnode failed; a retry now returns AlreadyConnected, which is the worst-of-both.

Two possible mitigations:

On timeout, walk the connection back: ask ConnectionManager to cancel the pending entry and disconnect the peer if it lands. Requires a small new API on ConnectionManager (a cancel handle keyed on AddressKey).

Document the race explicitly here and in the user-facing RPC docs, so operators know that a timeout error doesn't preclude a successful late connection.

Low severity in practice (you'd need transport timing > 10s and < 12s + scheduling slack), but it's a real divergence between RPC reply and node state.

MastaP · 2026-04-28T16:12:11Z

 #include "util/uint.hpp"
 #include "util/hash.hpp"
 #include <memory>
+#include <unistd.h>


Follow-up suggestion (out of scope for this PR's intent, but it would obviate the macOS portability concern raised by the bots): the mkdtemp block at L41–46 (and the matching block in notifications_tests.cpp) duplicates unicity::test::TempDir in test/common/test_util.hpp, which already encapsulates the call and already includes <unistd.h>. Replacing the open-coded boilerplate with TempDir test_dir{"unicity_miner_test"} would let you drop this <unistd.h> add entirely and centralize the POSIX dependency.

Also fixes open PR review feedback: - default-case in switch - em-dash - programmatic timeout derivation - <unistd.h> removal via TempDir - ConnectionManager contract clarified - integration test for addnode

Increase listen backlog from 20 to SOMAXCONN

lploom added 2 commits April 27, 2026 20:04

Add synchronous callback option to outgoing connections

67044f6

Make addnode rpc call synchronous

Fix build on macos

fd90cf6

lploom requested a review from MastaP April 28, 2026 14:41

lploom linked an issue Apr 28, 2026 that may be closed by this pull request

Silent outbound connection failure — addnode returns success but connection never established #6

Open

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

MastaP requested a review from Copilot April 28, 2026 15:26

Copilot started reviewing on behalf of MastaP April 28, 2026 15:33 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

MastaP reviewed Apr 28, 2026

View reviewed changes

Replace addnode async-callback API with connect_to_sync

e71001d

Also fixes open PR review feedback: - default-case in switch - em-dash - programmatic timeout derivation - <unistd.h> removal via TempDir - ConnectionManager contract clarified - integration test for addnode

MastaP approved these changes Apr 29, 2026

View reviewed changes

lploom added 2 commits April 30, 2026 13:06

Increase listen backlog from 20 to SOMAXCONN

6f7bd67

Merge pull request #11 from unicitynetwork/rpc-socket-enhancements

addf7dd

Increase listen backlog from 20 to SOMAXCONN

lploom merged commit 4e94471 into bft-snapshots Apr 30, 2026

Conversation

lploom commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

MastaP left a comment

Choose a reason for hiding this comment

Uh oh!

MastaP Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

MastaP Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

MastaP Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

MastaP Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

MastaP Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants