Skip to content

Add preliminary harmony-crypto & harmony-identity crates#1

Merged
jenglund merged 5 commits into
mainfrom
jake-harmony-id-crypto
Mar 1, 2026
Merged

Add preliminary harmony-crypto & harmony-identity crates#1
jenglund merged 5 commits into
mainfrom
jake-harmony-id-crypto

Conversation

@jenglund
Copy link
Copy Markdown
Contributor

@jenglund jenglund commented Mar 1, 2026

Note

High Risk
Introduces new cryptographic and identity primitives (AEAD, Fernet, HKDF, key handling) where subtle correctness issues can have security implications; although largely additive, it’s security-sensitive code that will be reused broadly.

Overview
Adds a new Rust workspace (Cargo.toml, Cargo.lock) with two new crates: harmony-crypto (hashing, HKDF, Reticulum-compatible fernet token encryption, and Harmony-native ChaCha20-Poly1305 aead, plus shared CryptoError types) and harmony-identity (Ed25519/X25519 identity generation, serialization, signing/verification, and encrypt/decrypt using ephemeral ECDH + HKDF + Fernet).

Includes extensive unit tests and a reticulum_interop test suite with Python-derived vectors to validate byte-level compatibility. Also updates .gitignore to ignore Dolt .beads/* runtime files and removes those tracked artifacts.

Written by Cursor Bugbot for commit 5a63244. This will update automatically on new commits. Configure here.

@jenglund
Copy link
Copy Markdown
Contributor Author

jenglund commented Mar 1, 2026

bugbot run

@jenglund
Copy link
Copy Markdown
Contributor Author

jenglund commented Mar 1, 2026

@greptile

Comment thread crates/harmony-crypto/src/fernet.rs
Comment thread crates/harmony-identity/src/identity.rs Outdated
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 1, 2026

Greptile Summary

This PR introduces the harmony-crypto and harmony-identity crates, providing the core cryptographic foundation for the Harmony network stack. harmony-crypto delivers well-audited primitives (HKDF-SHA256 with RFC 5869 vectors, Reticulum-compatible Fernet AES-256-CBC + HMAC-SHA256, and ChaCha20-Poly1305 AEAD), while harmony-identity builds Ed25519/X25519 identity management with Reticulum-interoperable encrypt/decrypt and verified Python test vectors.

Overall the implementation is sound — correct encrypt-then-MAC ordering, constant-time HMAC comparison via subtle, ZeroizeOnDrop on all private-key structs, and comprehensive unit tests. However, three issues in identity.rs need attention before this is merged:

  • to_private_bytes leaks key material — returns a plain [u8; 64] with no zeroization; should return Zeroizing<[u8; 64]>.
  • PartialEq compares by 128-bit address hash, not the full 64-byte public key — weaker security guarantee and semantic mismatch between routing identity and cryptographic identity.
  • HKDF info field is always empty — no domain separation between protocol versions or use-cases; fine for current Reticulum compatibility, but worth documenting explicitly.
  • Hardcoded private key test vectors lack a "test-only, do not use in production" warning — low-severity risk but important to guard against accidental production reuse.

Confidence Score: 2/5

  • Not safe to merge until to_private_bytes is hardened and PartialEq is corrected — both are security-sensitive in a cryptographic identity crate.
  • The primitive crates (harmony-crypto) are solid and well-tested. The identity crate has two logic-level security issues: private key material is returned without automatic zeroization, and identity equality is based only on a 128-bit truncated hash rather than the full public key. Either issue could be exploited in downstream code that trusts this API. The test file also commits real private key bytes without a warning, which is a reuse risk. Fixing these issues is straightforward and the overall cryptographic design is sound.
  • Pay close attention to crates/harmony-identity/src/identity.rs — all three flagged issues are concentrated there.

Important Files Changed

Filename Overview
crates/harmony-identity/src/identity.rs Core identity module with three issues: to_private_bytes returns unprotected key material (no ZeroizeOnDrop), PartialEq compares only the 128-bit truncated address hash instead of the full 64-byte public key, and HKDF derivation uses empty info with no domain separation.
crates/harmony-crypto/src/fernet.rs Well-structured Reticulum-compatible Fernet (AES-256-CBC + HMAC-SHA256) with correct encrypt-then-MAC ordering, constant-time HMAC verification via subtle::ConstantTimeEq, and proper minimum-length checks. Key material is passed in and out via caller-owned slices with FernetKey providing zeroization.
crates/harmony-crypto/src/aead.rs ChaCha20-Poly1305 AEAD wrapper with comprehensive nonce-reuse documentation. AeadKey zeroizes on drop. from_bytes accepts value by move, so the caller must manually zeroize their source array — minor but worth noting.
crates/harmony-crypto/src/hkdf.rs HKDF-SHA256 wrapper with RFC 5869 test vectors, DerivedKey zeroizing wrapper, and explicit bounds checking. Clean and correct implementation.
crates/harmony-identity/tests/reticulum_interop.rs Solid interop test suite with Reticulum Python vectors covering address derivation, signing, and decryption. Four hardcoded private keys committed to source without a "test-only" warning — risks accidental production reuse.

Sequence Diagram

sequenceDiagram
    participant Sender
    participant Identity
    participant HKDF
    participant Fernet

    Note over Sender,Fernet: Identity encrypt (Reticulum-compatible)
    Sender->>Identity: encrypt(rng, plaintext)
    Identity->>Identity: generate ephemeral X25519 keypair
    Identity->>Identity: X25519 ECDH → shared secret
    Identity->>HKDF: derive_key_256(shared_secret, salt=address_hash)
    HKDF-->>Identity: 64-byte Fernet key
    Identity->>Fernet: encrypt(rng, fernet_key, plaintext)
    Fernet->>Fernet: random IV (16 bytes)
    Fernet->>Fernet: AES-256-CBC encrypt (PKCS7)
    Fernet->>Fernet: HMAC-SHA256 over IV + ciphertext
    Fernet-->>Identity: token = IV + ciphertext + HMAC
    Identity->>Identity: zeroize fernet_key
    Identity-->>Sender: ephemeral_pub (32B) + token

    Note over Sender,Fernet: PrivateIdentity decrypt
    Sender->>Identity: decrypt(ciphertext)
    Identity->>Identity: extract ephemeral_pub (first 32 bytes)
    Identity->>Identity: X25519 ECDH → shared secret
    Identity->>HKDF: derive_key_256(shared_secret, salt=address_hash)
    HKDF-->>Identity: 64-byte Fernet key
    Identity->>Fernet: decrypt(fernet_key, token)
    Fernet->>Fernet: constant-time HMAC verify
    Fernet->>Fernet: AES-256-CBC decrypt + unpad
    Fernet-->>Identity: plaintext
    Identity->>Identity: zeroize fernet_key
    Identity-->>Sender: plaintext
Loading

Last reviewed commit: 5a63244

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

18 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@jenglund
Copy link
Copy Markdown
Contributor Author

jenglund commented Mar 1, 2026

bugbot run
@greptile

Comment thread crates/harmony-identity/src/identity.rs Outdated
Comment on lines +188 to +190
if ciphertext.len() < 33 {
return Err(IdentityError::DecryptionFailed);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimum ciphertext check should be 96 bytes (32-byte ephemeral key + 64-byte minimum Fernet token), not 33

Suggested change
if ciphertext.len() < 33 {
return Err(IdentityError::DecryptionFailed);
}
if ciphertext.len() < 96 {
return Err(IdentityError::DecryptionFailed);
}

Comment on lines +24 to +31
/// Encrypt plaintext with ChaCha20-Poly1305.
///
/// - `key`: 32-byte symmetric key
/// - `nonce`: 12-byte nonce (must never be reused with the same key)
/// - `plaintext`: data to encrypt
/// - `aad`: additional authenticated data (authenticated but not encrypted)
///
/// Returns ciphertext with appended 16-byte Poly1305 tag.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add documentation warning that nonce MUST NEVER be reused with the same key in ChaCha20-Poly1305, as reuse completely breaks security

Comment thread crates/harmony-crypto/src/fernet.rs Outdated
@jenglund
Copy link
Copy Markdown
Contributor Author

jenglund commented Mar 1, 2026

bugbot run
@greptile

Comment thread crates/harmony-identity/tests/reticulum_interop.rs Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@jenglund
Copy link
Copy Markdown
Contributor Author

jenglund commented Mar 1, 2026

As much as I could run tests/reviews one more time to be extra sure, I'm excited to move on, there's a lot more to build out! SO MERGING CHOO CHOO HERE WE GO

@jenglund jenglund merged commit 426b935 into main Mar 1, 2026
@jenglund jenglund deleted the jake-harmony-id-crypto branch March 1, 2026 21:02
@jenglund
Copy link
Copy Markdown
Contributor Author

jenglund commented Mar 6, 2026

bugbot run

fn eq(&self, other: &Self) -> bool {
self.address_hash == other.address_hash
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identity equality ignores actual cryptographic key material

Low Severity

The PartialEq impl for Identity compares only the 16-byte address_hash, not the actual encryption_key or verifying_key. This means two Identity values with different public keys but a colliding truncated hash would be considered equal, even though they behave differently for verify() and encrypt(). Comparing the full 64-byte public key material would be equally efficient and semantically correct, avoiding a potential foot-gun where code assumes a == b implies cryptographic interchangeability.

Fix in Cursor Fix in Web

@jenglund
Copy link
Copy Markdown
Contributor Author

jenglund commented Mar 6, 2026

bugbot run

@jenglund
Copy link
Copy Markdown
Contributor Author

jenglund commented Mar 6, 2026

@greptile

Comment on lines +222 to +228
/// Serialize the private key to 64 bytes: `[32B X25519 secret][32B Ed25519 secret]`.
pub fn to_private_bytes(&self) -> [u8; PRIVATE_KEY_LENGTH] {
let mut bytes = [0u8; PRIVATE_KEY_LENGTH];
bytes[..32].copy_from_slice(self.encryption_secret.as_bytes());
bytes[32..].copy_from_slice(self.signing_key.as_bytes());
bytes
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_private_bytes returns unprotected key material

to_private_bytes returns a plain [u8; 64] containing raw private key bytes. Unlike PrivateIdentity itself (which is ZeroizeOnDrop), the returned array carries no automatic zeroization guarantee. Any caller that stores the returned bytes, passes them to another function, or simply lets the binding go out of scope without explicitly calling zeroize() will leave live private key material on the stack/heap until it is overwritten by unrelated data.

The function should return a zeroizing wrapper (e.g., zeroize::Zeroizing<[u8; PRIVATE_KEY_LENGTH]>) so that the secret bytes are automatically erased when the returned value is dropped:

use zeroize::Zeroizing;

pub fn to_private_bytes(&self) -> Zeroizing<[u8; PRIVATE_KEY_LENGTH]> {
    let mut bytes = Zeroizing::new([0u8; PRIVATE_KEY_LENGTH]);
    bytes[..32].copy_from_slice(self.encryption_secret.as_bytes());
    bytes[32..].copy_from_slice(self.signing_key.as_bytes());
    bytes
}

Zeroizing<T> implements Deref<Target = T>, so callers that read the bytes via as_slice() or indexing continue to work without change, while the secret material is guaranteed to be wiped on drop.

Comment on lines +135 to +139
impl PartialEq for Identity {
fn eq(&self, other: &Self) -> bool {
self.address_hash == other.address_hash
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartialEq uses truncated 128-bit hash instead of full public key

The PartialEq implementation compares identities only by address_hash, which is a 16-byte (128-bit) truncation of SHA-256. This is weaker than comparing the full 64-byte public key for two reasons:

  1. Cryptographic strength: A birthday collision in 128-bit space requires roughly 2^64 attempts — feasible for a nation-state-level adversary targeting a specific identity. If equality is used in any security-sensitive context (access control, trust checks, deduplication of allowed peers), a crafted identity that collides in the address hash would pass these checks despite having different — and attacker-controlled — public keys.

  2. Semantic mismatch: The address hash is a routing address (Reticulum's network-layer notion of location); cryptographic identity equality should be based on the actual public key material. Using address hashes for equality silently conflates two different concepts, which can confuse future users of the API.

The comparison should use the full combined public key bytes:

impl PartialEq for Identity {
    fn eq(&self, other: &Self) -> bool {
        self.to_public_bytes() == other.to_public_bytes()
    }
}

impl Eq for Identity {}

If two Identity values are equal under this definition, their address_hash fields will also match (since the hash is derived deterministically from the keys). The converse is not guaranteed.

Comment on lines +115 to +118
let mut derived = hkdf::derive_key_256(
shared_secret.as_bytes(),
Some(&self.address_hash),
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing HKDF info context — no domain separation

derive_key_256 is called with an empty info slice (&[] is the default in the implementation). The HKDF info parameter is the standard mechanism for domain separation: it binds the derived key to a specific protocol, version, and usage. Without it, any other code that performs an ECDH exchange over the same X25519 keys and then applies HKDF-SHA256 with the same salt would derive the identical Fernet key, even if that code is meant for a completely different purpose.

Consider adding a well-known, constant context string:

let mut derived = hkdf::derive_key_256(
    shared_secret.as_bytes(),
    Some(&self.address_hash),
    b"harmony-identity-v1-fernet",  // add an info parameter to derive_key_256
);

Note: if Reticulum Python uses empty info, and the interoperability test vectors rely on that, changing the info would break wire compatibility. In that case, keep the empty info but document this explicitly so that future protocol versions (with a different info string) remain clearly separated. The same pattern applies to the decrypt path at line 202.

Comment on lines +11 to +51
mod vectors {
pub const X25519_PRIVATE: &str =
"20373f976f707a256b450d7bd63b7230978fa623395a9bc91edd890a5a38654b";
pub const ED25519_PRIVATE: &str =
"f25ade2a1327293ccb885f4bfdaabdd4a64708ba40c7a4410bdcf1ddaccd7dfa";
pub const X25519_PUBLIC: &str =
"e080f2b82f9a08581008c8391e0a5e1e29da4b49d5364e7dbb1a20caf4a1ea57";
pub const ED25519_PUBLIC: &str =
"685cf6f3f7887d5b10b6efcbdcecaf5efe908e7b002c57c53547d6df1f1a962c";
pub const ADDRESS_HASH: &str = "af0de81878d6685b62d3459968c2d9f2";
pub const COMBINED_PUBLIC: &str =
"e080f2b82f9a08581008c8391e0a5e1e29da4b49d5364e7dbb1a20caf4a1ea57\
685cf6f3f7887d5b10b6efcbdcecaf5efe908e7b002c57c53547d6df1f1a962c";

pub const SIGN_MESSAGE: &str = "6861726d6f6e7920696e7465726f702074657374206d657373616765";
pub const SIGNATURE: &str =
"f4074a5045d06b779004c08f4dfd0daf6923c8746e2493ddc70bf882e98abcea\
3a3fe75da5e4990c53dd0c293f501447312dba6f900710d9435e34a66294aa09";

pub const ENCRYPT_PLAINTEXT: &str =
"68656c6c6f2066726f6d207265746963756c756d20707974686f6e";
pub const ENCRYPT_CIPHERTEXT: &str =
"2fbe26fc3ce9f604e09d26881e1a805c21ea379cfd40b8741254bc92e270a20a\
ed8fb66b9d56b90d2483989aba56b5b27007c59738505076bf4169abe0492260\
1d1f98d358d3e4d70707320c594ae1d0e58b297164d662ca4d7c0fa92cfe50ad\
9ddf653538bd70f1d0d8a8aabff15f2e";

pub const ID2_X25519_PRIVATE: &str =
"400b83cab205e3383838868344ed4e52c94da309d1a6d6098289768104877353";
pub const ID2_ED25519_PRIVATE: &str =
"e6bdab8f004a72904d8c62c692a5e8cf1db023f7d6f973072fcf56265bac7175";
pub const ID2_ADDRESS_HASH: &str = "d9b3e6139565245d1a434975430956d2";

pub const CROSS_ENCRYPT_PLAINTEXT: &str =
"63726f73732d6964656e7469747920656e6372797074696f6e2074657374";
pub const CROSS_ENCRYPT_CIPHERTEXT: &str =
"6bee97688961561418bb283b163d9037307c42579f92728f45e5fddc44c78114\
6bfa9f0820e7e28f9ef3754cd19a7d9b66feb5c632cb6bf64123a1272053c2dc\
e1eef7ff4a61e18b0b2d7eadf069410bf35bb286afa2eb55903bdf458b250df7\
9f6060a48ce828f850e46d95f5aca45a";
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded private key bytes in source — add prominent "test-only" warning

vectors::X25519_PRIVATE, vectors::ED25519_PRIVATE, vectors::ID2_X25519_PRIVATE, and vectors::ID2_ED25519_PRIVATE are real X25519 and Ed25519 private keys committed to the repository. Anyone who can read this file now possesses those private keys. If these values are ever accidentally used in a production deployment (e.g., as "default" seed keys, for smoke-testing against a live node, or via copy-paste), all traffic encrypted to those identities would be trivially decryptable.

Add a clear comment block at the top of the vectors module:

/// ⚠ WARNING: These are TEST-ONLY private keys.
/// They are committed to a public repository and MUST NEVER be used
/// outside of unit/integration tests. Any identity or data protected
/// by these keys should be considered fully compromised.
mod vectors {

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

let token = harmony_crypto::fernet::encrypt(rng, &derived, plaintext)?;

// Zeroize derived key
derived.zeroize();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derived key not zeroized on error path in encrypt

Low Severity

In Identity::encrypt, the ? operator on the fernet::encrypt call causes an early return that skips derived.zeroize(), leaving 64 bytes of derived key material on the stack. The sibling PrivateIdentity::decrypt method correctly avoids this by capturing the result into a variable, calling derived.zeroize(), and only then returning — the encrypt path is inconsistent with this safer pattern.

Additional Locations (1)

Fix in Cursor Fix in Web

jenglund added a commit that referenced this pull request Mar 20, 2026
8-task TDD plan for Bead #1 (harmony-641): PQ tunnel state machine.
Covers handshake, frame encryption, session lifecycle, keepalive,
and adversarial tests. Includes signing_key() accessor prerequisite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jenglund added a commit that referenced this pull request May 3, 2026
Three coupled API contract changes from PR #267 review:

1. Rename `RuntimeEvent::SendUnicastToDevice.target` → `destination_hash`.
   The runtime resolves the field via `path_table().get(...)`, but the
   path table is keyed by Reticulum destination hashes
   (SHA256(name_hash || identity_address_hash)[:16]) populated from
   announces, NOT raw device identity hashes. The previous name was
   load-bearing-wrong: real announced peers would miss the lookup and
   surface as "target unknown" drops even when reachable. The runtime is
   generic plumbing; computing the destination hash from
   identity-hash + destination-name is a harmony-client concern (Phase
   3b, ZEB-227 — clients know their own destination naming: DM inbox,
   voice channel, file sync, etc.).

2. Type `RuntimeAction::UnicastReceived.source` and
   `NodeAction::DeliverLocally.source` as `Option<[u8; 16]>` instead of
   `[u8; 16]` with `[0u8; 16]` as a sentinel for "unknown until Phase
   3b wires terminal-link tracking." Sentinels can be misinterpreted as
   a real authenticated identity. The Option makes the unknown state
   explicit in the type and forces consumers to handle it. Construction
   site at `node.rs:route_packet` sets `None` (was `[0u8; 16]`); the
   runtime dispatch arm forwards the Option as-is.

3. Defer `SendUnicastToDevice` resolution from `push_event` to `tick`.
   The previous code did `path_table().get(...)` synchronously inside
   `push_event`, racing against any inbound announces queued in the
   same batch (announces sit in `router_queue` and aren't applied to
   the path table until `tick` drains them). Now `push_event` enqueues
   into `pending_unicast_sends`, and `tick` drains that queue AFTER the
   router queue inside the Tier 0 (Router) block — so any same-batch
   announces are applied first.

Tests updated to use the new field names + Option<source>:
- `runtime_event_send_unicast_to_device_constructs_and_compares`
- `runtime_action_unicast_received_constructs_and_compares`
- `dispatch_router_actions_surfaces_deliver_locally_as_unicast_received`
- `send_unicast_to_device_emits_send_on_interface_for_known_target`
- `send_unicast_to_device_drops_for_unknown_target`
- `unicast_round_trip_a_to_b_surfaces_as_unicast_received`

Round-trip test placeholder assertion is now `received.0 == None`
instead of `[0u8; 16]`. The seeded path-table value in the outbound
test is unchanged in semantics (the seed key was always a destination
hash by contract — only the field name shifts).

Refs: PR #267 review (Qodo #1, CodeRabbit, CodeAnt).
jenglund added a commit that referenced this pull request May 3, 2026
…26) (#267)

* docs(zeb-226): runtime unicast IPC implementation plan (Phase 3a)

ZEB-216 Sub-B Phase 3a — adds RuntimeEvent::SendUnicastToDevice +
RuntimeAction::UnicastReceived to harmony-runtime, wiring to the
existing Reticulum unicast plumbing (NodeAction::DeliverLocally
inbound, NodeAction::SendOnInterface outbound).

5 implementation tasks + 1 process task. Each implementation task
is TDD-shaped (failing test → impl → gates → commit) with bite-sized
steps. Two open implementation questions (target→interface routing,
source identity resolution) are surfaced as explicit investigation
steps with file:line pointers — the implementer reads the actual
code to choose the cleanest approach.

* feat(zeb-226): add RuntimeEvent::SendUnicastToDevice variant

Phase 3a, step 1 of 5. Adds the inbound side of the unicast IPC
contract that harmony-client's DM outbox (ZEB-216 Phase 3b) will
consume. Variant is wired in subsequent steps; this commit is just
the type addition + constructor smoke test + a placeholder no-op
arm in the event dispatcher (translation to SendOnInterface lands
in Task 4).

Scope is intentionally minimal — the surrounding tree's pre-existing
fmt drift (covered by chore-cargo-fmt-tree-cleanup PR #266) and
clippy warnings (separate concern, not in this PR's diff) are NOT
addressed here. Each task in the plan ships as its own commit; only
crates/harmony-runtime/src/runtime.rs changes in this one.

Test: runtime::tests::runtime_event_send_unicast_to_device_constructs_and_compares

* feat(zeb-226): add RuntimeAction::UnicastReceived variant

Phase 3a, step 2 of 5. Adds the outbound side of the unicast IPC
contract — the runtime emits this from the next tick() when a
unicast packet arrives at a locally-registered destination. Variant
is wired in step 3 (currently DeliverLocally is silently dropped
at runtime.rs:2350-2351 with the comment "Other router actions are
diagnostics — drop for now" — that's the call site Task 3 replaces).

Scope is intentionally minimal — only crates/harmony-runtime/src/runtime.rs
changes in this commit. Pre-existing fmt drift (PR #266) and
pre-existing clippy warnings are NOT addressed here.

Test: runtime::tests::runtime_action_unicast_received_constructs_and_compares

* feat(zeb-226): surface DeliverLocally as UnicastReceived to client

Phase 3a, step 3 of 5. Replaces the silent drop at runtime.rs:2350
("Other router actions are diagnostics — drop for now") with an
explicit DeliverLocally handler that emits RuntimeAction::UnicastReceived
in the next tick().

Source identity resolution: Option A — added a `source: [u8; 16]`
field to NodeAction::DeliverLocally so the identity travels with the
action. Reticulum data packets don't carry sender identity in their
wire header (identity is bound at the link handshake layer, in
Link::remote_identity), and the Node does not currently track
terminal-link state for delivered packets. The construction site at
process_data_packet populates [0u8; 16] as a placeholder; real
link → identity binding is deferred to ZEB-227 (harmony-client
Phase 3b, which owns the link-state side). This commit makes the
type contract honest end-to-end so the placeholder can be replaced
without rippling through the surface.

This activates the inbound half of harmony-client's ZEB-216 Sub-B
DM transport. The drop comment was a deliberate placeholder waiting
for the client-facing consumer — Phase 3a is that consumer.

Scope is intentionally minimal: only crates/harmony-runtime/src/runtime.rs
and crates/harmony-reticulum/src/node.rs. All existing DeliverLocally
matchers in the reticulum test suite use shape-matching (`..`), so
adding a field is non-breaking. Pre-existing fmt drift (PR #266) and
pre-existing clippy warnings are NOT addressed here.

* feat(zeb-226): translate SendUnicastToDevice to SendOnInterface

Phase 3a, step 4 of 5. Wires the outbound half of the unicast IPC
contract: RuntimeEvent::SendUnicastToDevice (client request) is
processed during push_event() and translated to
RuntimeAction::SendOnInterface via the inner Reticulum router's
path-table lookup (Node::path_table().get(&target) +
Node::route_packet on hit). On hit, route_packet() runs IFAC masking
and the existing send_on_interface plumbing unchanged; the resulting
SendOnInterface NodeAction is forwarded through the existing
dispatch_router_actions translator into pending_direct_actions so
it surfaces at the next tick().

Unknown targets are logged at WARN and dropped — retry/expiration
semantics live in harmony-client's outbox (ZEB-216 Sub-B Phase 3b,
ZEB-227). The two new tests cover both paths: a seeded-path-table
happy path asserting one SendOnInterface to the resolved interface
with packet bytes intact (weight=None, directed), and an empty
path table asserting drop-with-no-emission.

Adds a small symmetric API on Node: `path_table_mut()` alongside
the existing read-only `path_table()` accessor, used by the new
runtime test to seed a route without driving a full announce flow.
The doc comment flags it as test/bootstrap-only — production code
should still go through process_inbound so cooperation scoring,
reverse-table bookkeeping, and replay detection stay in sync.

Scope is intentionally minimal: the dispatch arm in runtime.rs and
one accessor in harmony-reticulum's node.rs. Pre-existing fmt drift
(PR #266) and pre-existing clippy warnings are NOT addressed here.

* test(zeb-226): end-to-end unicast round-trip A → B

Phase 3a, step 5 of 5 (final implementation task). Validates the
full plumbing wired across Tasks 1-4 by running two independent
NodeRuntime instances and hand-bridging A's outbound SendOnInterface
action into B's InboundPacket event:

  A: SendUnicastToDevice event   (Task 1)
    → SendOnInterface action      (Task 4)
  B: InboundPacket event
    → DeliverLocally NodeAction
    → UnicastReceived action      (Task 3)

Shape α (the recommended shape from the task brief): the test
exercises the full chain with B's local destination registered
directly on its router and A's path table seeded via the
path_table_mut() test seam from Task 4. Wire bytes are constructed
once with Packet::to_bytes(), shipped through A's outbound dispatch,
then fed verbatim into B's inbound queue.

Known limitation documented inline: source resolves to the
[0u8; 16] placeholder because Reticulum's process_data_packet does
not yet bind link → identity. Real link-identity binding lands in
ZEB-227 / Phase 3b — flip the source assertion to a real-identity
check then. The round-trip still proves payload bytes flow end-to-end
and the four wiring touchpoints from Tasks 1-4 stay connected.

Test name: unicast_round_trip_a_to_b_surfaces_as_unicast_received

Scope: test-only addition to runtime.rs. No production code touched.
Pre-existing fmt drift (PR #266) and clippy warnings unchanged.

* fix(zeb-226): tighten unicast IPC contracts (dest_hash + Option<source>)

Three coupled API contract changes from PR #267 review:

1. Rename `RuntimeEvent::SendUnicastToDevice.target` → `destination_hash`.
   The runtime resolves the field via `path_table().get(...)`, but the
   path table is keyed by Reticulum destination hashes
   (SHA256(name_hash || identity_address_hash)[:16]) populated from
   announces, NOT raw device identity hashes. The previous name was
   load-bearing-wrong: real announced peers would miss the lookup and
   surface as "target unknown" drops even when reachable. The runtime is
   generic plumbing; computing the destination hash from
   identity-hash + destination-name is a harmony-client concern (Phase
   3b, ZEB-227 — clients know their own destination naming: DM inbox,
   voice channel, file sync, etc.).

2. Type `RuntimeAction::UnicastReceived.source` and
   `NodeAction::DeliverLocally.source` as `Option<[u8; 16]>` instead of
   `[u8; 16]` with `[0u8; 16]` as a sentinel for "unknown until Phase
   3b wires terminal-link tracking." Sentinels can be misinterpreted as
   a real authenticated identity. The Option makes the unknown state
   explicit in the type and forces consumers to handle it. Construction
   site at `node.rs:route_packet` sets `None` (was `[0u8; 16]`); the
   runtime dispatch arm forwards the Option as-is.

3. Defer `SendUnicastToDevice` resolution from `push_event` to `tick`.
   The previous code did `path_table().get(...)` synchronously inside
   `push_event`, racing against any inbound announces queued in the
   same batch (announces sit in `router_queue` and aren't applied to
   the path table until `tick` drains them). Now `push_event` enqueues
   into `pending_unicast_sends`, and `tick` drains that queue AFTER the
   router queue inside the Tier 0 (Router) block — so any same-batch
   announces are applied first.

Tests updated to use the new field names + Option<source>:
- `runtime_event_send_unicast_to_device_constructs_and_compares`
- `runtime_action_unicast_received_constructs_and_compares`
- `dispatch_router_actions_surfaces_deliver_locally_as_unicast_received`
- `send_unicast_to_device_emits_send_on_interface_for_known_target`
- `send_unicast_to_device_drops_for_unknown_target`
- `unicast_round_trip_a_to_b_surfaces_as_unicast_received`

Round-trip test placeholder assertion is now `received.0 == None`
instead of `[0u8; 16]`. The seeded path-table value in the outbound
test is unchanged in semantics (the seed key was always a destination
hash by contract — only the field name shifts).

Refs: PR #267 review (Qodo #1, CodeRabbit, CodeAnt).

* fix(zeb-226): rename path_table_mut → path_table_mut_for_tests + doc(hidden)

Task 4 added `pub fn path_table_mut(&mut self) -> &mut PathTable` to
`Node` so harmony-runtime tests can seed routing entries directly.
The doc said "primarily intended for tests / bootstrap" but the
accessor was publicly available in production builds with no
visibility signal beyond the doc comment.

Renamed to `path_table_mut_for_tests` and added `#[doc(hidden)]` so
the test-only intent is loud at the call site and excluded from
rustdoc. The `_for_tests` suffix surfaces the constraint without a
feature flag (which would have required Cargo.toml changes outside
the agreed scope for this branch). Production code that touches the
path table directly (skipping cooperation scoring, reverse-table
bookkeeping, and replay detection) will now read as obviously wrong
during review.

Updated three call sites in harmony-runtime tests:
- `send_unicast_to_device_emits_send_on_interface_for_known_target`
- `unicast_round_trip_a_to_b_surfaces_as_unicast_received`
- (plus their inline doc-comment references)

Refs: PR #267 review (Qodo #4, CodeRabbit).

* docs(zeb-226): plan doc lint cleanups + scope correction

Three minor cleanups from PR #267 review:

1. File Structure section claimed "Single-crate change. All work in
   `crates/harmony-runtime/`" but Task 3's investigation explicitly
   anticipated harmony-reticulum changes (Option C: add a `source`
   field to `NodeAction::DeliverLocally`). The implementer chose
   exactly that path. Reword the claim to match what actually
   happened: primary work in harmony-runtime, companion change in
   harmony-reticulum for the DeliverLocally.source field and the
   path_table_mut_for_tests accessor.

2. MD031 (markdownlint): five fenced code blocks were missing a
   blank line between the closing fence and the following prose
   ("Expected: all three exit 0.") or between preceding prose and
   the opening fence. Added the blank lines.

3. ENGLISH_WORD_REPEAT_BEGINNING_RULE: the Step 3 "If this fails…
   If the issue…" pair triggered the linter. Reworded to "When this
   fails…" / "Should the second cause apply…".

No content changes — only lint and scope-claim corrections.

Refs: PR #267 review (CodeRabbit minor).

* fix(zeb-226): bound unicast drain + defer-on-miss when router has backlog

Round-10 review (CodeRabbit Major + Cursor Low). Two related issues
in the pending_unicast_sends drain inside tick():

1. Race (correctness): if router_max_per_tick caps the router-queue
   drain, an announce that would create the route for a queued
   unicast may not have applied to path_table yet. The unicast was
   dropped as "unknown" even though next tick's announce drain
   would populate the route.

2. Budget (latency): every other Tier 1 queue respects
   router_max_per_tick; the unicast drain was unbounded. A burst of
   1000 unicasts blocked the tick loop.

Fix combines both. After the router queue drain, capture
router_queue_drained, then bound the unicast drain by the same
limit AND defer-on-miss when the router still has work. Drops
also count against the budget so a flood of unknown targets can't
bypass the cap. Deferred entries are re-queued at the front to
preserve arrival-order priority over later-arriving sends.

New tests:
- unicast_drain_respects_router_max_per_tick: 5 known-target
  unicasts with cap=2 → 2/2/1 across three ticks.
- unicast_defers_when_router_queue_has_backlog: 2 TimerTicks +
  cap=1 forces router_queue non-empty at unicast-drain time;
  unknown-target unicast is deferred (NOT dropped) on tick 1,
  then drops cleanly on tick 2 after router fully drained.

Adds a #[doc(hidden)] pending_unicast_sends_len() accessor for
test introspection.

* fix(zeb-226): forward destination_hash through UnicastReceived

Round-10 review (Cursor Medium). The dispatch arm in
dispatch_router_actions previously discarded
NodeAction::DeliverLocally::destination_hash with `_`. But the
client (harmony-client / Phase 3b) registers MULTIPLE destination
types per its spec (DM inbox, voice channel, file sync, ...) and
needs the destination hash to dispatch each inbound packet to
the right handler. Without the hash, the client can't tell which
destination kind a packet was addressed to.

Fix: add `destination_hash: [u8; 16]` to
RuntimeAction::UnicastReceived and forward the field from the
DeliverLocally action. Field order: destination_hash, source,
packet — destination first so the client can match-bind it
without destructuring the whole variant.

Tests:
- runtime_action_unicast_received_constructs_and_compares: pin a
  distinct destination_hash so a future shape change trips it.
- dispatch_router_actions_surfaces_deliver_locally_as_unicast_received:
  assert destination_hash flows through (was previously discarded).
- unicast_round_trip_a_to_b_surfaces_as_unicast_received: assert
  the round-trip carries B's registered destination hash.

* docs(zeb-226): align plan terminology with landed API

Round-10 review (CodeRabbit Minor). The plan still used pre-execution
naming that the landed API has since refined:

- "target" → "destination_hash" everywhere it referred to the
  SendUnicastToDevice field (the field is a Reticulum DESTINATION
  hash, not a raw device identity hash; the doc now flags this
  distinction explicitly).
- "source: [u8; 16]" → "source: Option<[u8; 16]>" everywhere it
  referred to the UnicastReceived field. The Option type was
  tightened in round 9 to make the unknown-source case unforgeable
  — sentinels can be misinterpreted as real identities. Phase 3a
  emits None until ZEB-227 wires link-identity binding.
- Adds destination_hash to UnicastReceived per round 10 — the
  client registers MULTIPLE destination kinds (DM inbox, voice,
  file sync) and needs the dest hash to dispatch by kind.

Open Questions section now records that source-resolution Option C
("add field to DeliverLocally") was chosen during execution, and
the type was tightened to Option<[u8; 16]> per round-9 review.

* fix(zeb-226): handle RuntimeAction::UnicastReceived in harmony-node dispatch

Round 10 follow-up. The variant added in commit e079c6a (Task 2) broke
harmony-node's exhaustive match in event_loop.rs::dispatch_action.
The harmony-runtime test gate caught nothing because it only runs
against harmony-runtime; the workspace check was needed to surface it.

Per "test drift is our fault" — broken builds in the workspace caused
by our changes belong in this PR, not a follow-up. Added a no-op arm
that logs at debug and drops, with a comment explaining harmony-node
has no client-facing IPC channel for unicast delivery. The real
consumer is harmony-client (Phase 3b, ZEB-227) which embeds NodeRuntime
directly and consumes RuntimeAction::UnicastReceived through its own
tick loop, bypassing harmony-node entirely.

`cargo check --workspace` is now clean (one pre-existing dead-code
warning in SpeculationConfig remains — unrelated, present on
origin/main).

* fix(zeb-226): defer branch in unicast drain must consume budget

CodeRabbit Major / Greptile P1 (round 11): the unicast drain loop in
NodeRuntime::tick() was incrementing processed_unicasts ONLY in the
route-success and drop-with-warn branches, NOT in the defer-on-backlog
branch. So when router_queue had backlog (router_queue_drained=false),
deferred unicasts re-queued without consuming budget — the loop could
walk the entire pending_unicast_sends in a single tick regardless of
router_max_per_tick.

The fix moves processed_unicasts += 1 to the TOP of the loop so all
three branches (route, defer, drop) consume one budget slot each.

The existing test unicast_drain_respects_router_max_per_tick exercised
only the route branch (known target, empty router queue) and
unicast_defers_when_router_queue_has_backlog used cap=1 with 1
unicast — neither caught the defer-bypass. New regression test
unicast_drain_defers_count_against_router_max_per_tick exercises the
specific bypass: cap=2, 3 router events queued (forces drained=false),
3 unknown unicasts followed by 1 known unicast. With the fix, the 2
unknown defers consume budget and the known target is never reached
(0 SendOnInterface). Without the fix, the loop walks past all 3
unknowns (defers don't count) and routes the known target (1
SendOnInterface) — bypassing the cap.

Verified by temporarily reverting just the increment move while
keeping the new test: test fails with "left: 1 right: 0" matching the
predicted bug signature exactly.

* fix(zeb-226): tighten test-seam visibility (Greptile P2)

Two test-only methods were `pub fn` + `#[doc(hidden)]`, which only
hides from rustdoc. Downstream crates could still call them in
production builds. Round-11 Greptile P2:

- `NodeRuntime::pending_unicast_sends_len` (same-crate consumers
  only): downgraded to `pub(crate)`. Simplest fix; the seam never
  needed to escape harmony-runtime.

- `Node::path_table_mut_for_tests` (cross-crate — harmony-runtime's
  tests call it): gated behind `cfg(any(test, feature = "test-seams"))`.
  Added a new internal `test-seams` feature on harmony-reticulum and
  activated it via harmony-runtime's `[dev-dependencies]` block.

Cargo's resolver-2 isolation (verified in workspace Cargo.toml) keeps
the dev-only feature activation out of normal builds: `cargo tree -e
features` for harmony-node (which depends on harmony-reticulum but
not harmony-runtime tests) shows no `test-seams`. Workspace
`cargo check`, `cargo build --lib`, and full test suites all pass.

Approach chosen for path_table_mut_for_tests: feature-flag (option 1
in the implementer brief). Considered:
- pub(crate): rejected — cross-crate consumer needs access
- redesign as `seed_path_for_tests` taking owned args: less elegant
  and would force a wider API rewrite for the same outcome
- keep pub + loud `_for_tests` suffix (fallback): not needed; the
  feature-flag approach cleanly compiles and runs in the workspace

* docs(zeb-226): describe defer-then-drop semantics on SendUnicastToDevice

Round-10's race+budget fix changed SendUnicastToDevice from
log+drop-on-miss to defer-then-drop, but the variant doc still claimed
"the request is logged and dropped" on lookup miss. Update to describe:

  - Defer when the router queue still has backlog (announces that would
    create the route may be queued one event behind).
  - Final WARN+drop only after the router queue fully drains.
  - Both branches consume router_max_per_tick budget (round-11 fix), so
    a defer-only tick cannot walk the entire pending queue.
  - Retry/expiration lives in harmony-client's outbox; callers MUST NOT
    assume immediate drop on miss.

Doc-only — no behavior change. Addresses CodeRabbit Minor finding from
round-12 review.

* fix(zeb-226): move PacketDropped logging back to inline unicast call site

Round-9 added a top-level NodeAction::PacketDropped arm in
dispatch_router_actions to surface SendUnicastToDevice failures from
route_packet. Cursor's round-12 review pointed out that arm also fires
on routine inbound process_data_packet drops — broadcast traffic
addressed to unregistered local destinations returns
PacketDropped { NoLocalDestination }, which was silently dropped
pre-round-9 and now spams WARN logs on non-transport nodes.

Switch to Qodo's original alternative (option B): handle PacketDropped
specifically in the SendUnicastToDevice route-success branch where we
have full context (destination_hash), and let the top-level arm fall
through to the silent diagnostic catch-all.

  - Iterate node_actions before dispatch_router_actions, log any
    PacketDropped at WARN with destination_hash_first_4 + reason +
    interface_name, then move into dispatch.
  - Remove the broad NodeAction::PacketDropped match arm in
    dispatch_router_actions; comment notes why.
  - Inner PacketDropped arm inside the AnnounceNeeded branch is
    unchanged — it's announce-specific and was never the noise source.

127 harmony-runtime + 328 harmony-reticulum tests still pass; no
existing tests asserted the removed log message. Addresses Cursor Low
finding from round-12 review.

* fix(zeb-226): cfg(test) gate on pending_unicast_sends_len

Round 12 cleanup. The accessor added in round 10 (Greptile P2 → round
11 made it pub(crate)) has no production callers by design — it exists
solely for in-crate test introspection of the unicast drain. Without
cfg(test), non-test builds fire a dead_code warning.

Adding #[cfg(test)] is the accurate gate (matches the actual usage
shape) and removes the warning without needing #[allow(dead_code)]
on a function that genuinely IS used (just only in tests).

`cargo clippy --manifest-path crates/harmony-runtime/Cargo.toml --no-deps
--lib --tests` is now clean of the dead_code warning. Pre-existing
PR-#266-territory warnings (ReplicaStore, single_match, unused mut,
unused stranger_hash, field_reassign_with_default in test helpers)
remain — outside this PR's scope.

* docs(zeb-226): document per-destination FIFO + cross-destination best-effort ordering

CodeRabbit Major (round 13) flagged that the unicast drain doesn't
preserve strict FIFO across destinations and suggested
break-on-first-defer to enforce it. Pushing back: that change would
be a regression for the actual use case.

Scenario the change would break:
  - harmony-client sends DM 1 to recipient A (offline, route unknown)
  - harmony-client sends DM 2 to recipient B (online, route exists)
  - Current behavior: DM 2 sends immediately; DM 1 defers until A's
    route resolves or drops
  - Proposed strict-FIFO behavior: DM 2 blocks behind DM 1 until A
    resolves OR A drops — deliverable message waits for undeliverable

What we DO preserve and what we DON'T:
  - Per-destination FIFO IS preserved by the existing code: same-dest
    entries that all defer get re-queued at the front in arrival order;
    on the next tick they hit the same path-exists state and process
    in order.
  - Cross-destination ordering is intentionally routability-driven, not
    arrival-driven. The runtime is plumbing; ordering above the runtime
    is the client's concern.

Updated the SendUnicastToDevice doc to make this explicit so future
readers (and bot reviewers) don't re-flag it. Added an "Ordering
contract" section that:
  1. States per-destination FIFO is preserved
  2. States cross-destination is best-effort
  3. Explains WHY (the offline-blocks-online regression)
  4. Tells callers needing cross-destination order where to enforce it
     (harmony-client outbox layer)
  5. Notes the CodeRabbit suggestion was rejected for this reason

Doc-only change. No behavior change.
jenglund added a commit that referenced this pull request May 4, 2026
…le unregister test

Two bot-review findings on PR #268:

1. Cache-before-rate-limit (Qodo correctness bug). process_announce
   was inserting into identity_table BEFORE the transport-mode rate-
   limiting check, so an announce that drops with
   DropReason::AnnounceRateLimited still mutated the cache. Dropped
   traffic should have NO observable side effects. Moved the insert
   to after the rate-limit check (and any other early-drop branches),
   on the function's happy-path acceptance branch. Added regression
   test process_announce_does_not_cache_identity_for_rate_limited_announces
   to lock the ordering.

2. Stronger unregister test (CodeRabbit nitpick). Extended
   node_runtime_unregister_local_destination_returns_bool with an
   observable-behavior assertion: after unregister, inbound packets
   addressed to the dest must NOT surface as RuntimeAction::UnicastReceived.
   Verifies that the boolean return matches reality at the action layer.

Bug #1 from the same review (unbounded identity_table growth) is the
known issue tracked as ZEB-235 (Medium, Backlog). The bot-suggested
fix (path_table.retain coupling) would break the DM use case — DM
identities are stable for years; paths expire in 7d/1d/6h, so coupling
would force contacts to lose verifiability every time a route aged
out. Independent identity-cache expiry (longer TTL) plus cardinality
cap is the right design and stays in ZEB-235.
jenglund added a commit that referenced this pull request May 19, 2026
…B-156 R1)

Round-1 bot finding from Qodo on PR #269: `StorageTier::remove` delegates
to `ContentStore`'s `BookStore::remove` impl, which clears every cache
metadata slot — including the pinned set. So an unguarded burn of a
pinned CID would silently drop user-protected data and unpin it in the
same call, violating the pin-retention contract if the caller forgot to
unpin first.

The doc comment said callers must unpin first, but nothing enforced it.
The harmony-client burn path always unpins before removing, but future
callers could trip the foot-gun.

Fix: add an `is_pinned` guard at the top of `StorageTier::remove`. If
the CID is still pinned, return `None` (safe no-op) without touching
any cache state. Matches Qodo's suggested approach #1 ("safe no-op
on pinned CID"). Doc comment updated to describe the guard and the
"pinned-as-refusal" semantic.

`BookStore::remove` on `ContentStore` is unchanged — it still clears
the pinned set when invoked directly. The guard lives at the
`StorageTier::remove` layer (which is the only public surface anyway,
via `NodeRuntime::remove_content`).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jenglund added a commit that referenced this pull request May 19, 2026
* feat(runtime): narrow remove_content delegate for ZEB-156 burn-with-expunge

ZEB-156 in harmony-client distinguishes Burn from Unpin by adding immediate
cache eviction (W-TinyLFU bypass) for the burned root's descendants. The
existing NodeRuntime API exposes pin_content / unpin_content as narrow
delegates but does not expose a direct cache.remove primitive — &mut
StorageTier access is deliberately gated to prevent callers from reaching
the eviction-buffer + broadcast machinery.

Add two matching narrow delegates following the exact pin_content /
unpin_content pattern:

- StorageTier::remove(cid) — calls ContentStore::remove (cleans up every
  cache metadata slot via the existing BookStore impl). No tier-side
  accounting (disk_index/archive_index) needs touching — the burn path is
  for the in-RAM cache layer that client builds use.

- NodeRuntime::remove_content(cid) — narrow delegate to StorageTier::remove,
  mirrors pin_content / unpin_content's visibility and doc style.

Both return Option<Vec<u8>>: the bytes that were stored, or None if the
CID was not present (no-op safe).

Doc comments explicitly call out the W-TinyLFU bypass — this method is for
the "no really, destroy this" burn verb specifically, NOT for routine
eviction (which runs through the admission/probation/protected dance and
emits EvictionPush actions; this delegate is silent on purpose).

Consumed by harmony-client PR for ZEB-156 (root-pin-set cascade
correctness for Burn after Unpin's keep-set fix lands).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(content): refuse to remove pinned CIDs in StorageTier::remove (ZEB-156 R1)

Round-1 bot finding from Qodo on PR #269: `StorageTier::remove` delegates
to `ContentStore`'s `BookStore::remove` impl, which clears every cache
metadata slot — including the pinned set. So an unguarded burn of a
pinned CID would silently drop user-protected data and unpin it in the
same call, violating the pin-retention contract if the caller forgot to
unpin first.

The doc comment said callers must unpin first, but nothing enforced it.
The harmony-client burn path always unpins before removing, but future
callers could trip the foot-gun.

Fix: add an `is_pinned` guard at the top of `StorageTier::remove`. If
the CID is still pinned, return `None` (safe no-op) without touching
any cache state. Matches Qodo's suggested approach #1 ("safe no-op
on pinned CID"). Doc comment updated to describe the guard and the
"pinned-as-refusal" semantic.

`BookStore::remove` on `ContentStore` is unchanged — it still clears
the pinned set when invoked directly. The guard lives at the
`StorageTier::remove` layer (which is the only public surface anyway,
via `NodeRuntime::remove_content`).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
jenglund added a commit that referenced this pull request May 23, 2026
…clarification, resolver style)

- relay.rs PUT: 5xx/non-429 errors now mark cooldown + continue rotating
  instead of aborting the pool; last HTTP error status is surfaced only
  after all relays are exhausted (finding #3)
- relay.rs GET: empty available_relays() (all on cooldown) now returns
  Err(NoRelaysAvailable) immediately instead of Ok(None), preventing
  false negative-caching of keys that were simply unreachable (finding #4)
- resolver.rs: alloc::vec::Vec::new() → Vec::new() style cleanup (finding #5)
- derive.rs: inline comment on Case B salt arm pointing to security rationale
  in doc comment + spec §5.3 (finding #6 clarification only, no logic change)

Round-1 fixes confirmed landed: publisher collect-then-publish schedule,
wrap_bep44_envelope u64 seq, parse_and_verify 8-byte LE read all verified
correct in b58952b (findings #1 and #2 already resolved).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
jenglund added a commit that referenced this pull request May 24, 2026
…ne DHT HTTP-relay (#270)

* docs(zeb-322): impl plan for harmony-pkarr crate (Phase 2a of cross-WAN connectivity initiative)

* docs(zeb-322): scope gates to -p harmony-pkarr (pre-existing main breakage tracked in ZEB-324)

* feat(zeb-322): harmony-pkarr crate skeleton + PkarrError

* fix(zeb-322): address code-quality review (extern crate alloc cfg + unused workspace pkarr dep)

* feat(zeb-322): week-aligned epoch math + tolerance window

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(zeb-322): HKDF-SHA256 per-case key derivation + reference vectors

Add derive.rs with PkarrCase enum (Invite/Identity/Community), derive_ephemeral_key
via HKDF-SHA256 with domain-separated salts, and 5 unit tests including 3 pinned
reference-vector tests that lock the v1 keying scheme against future drift.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(zeb-322): PkarrRoutingRecord wire format + inner-sig + skew verify + pin test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(zeb-322): mock pkarr-relay axum server behind test-fixtures feature

Adds MockPkarrRelay (PUT/{key} + GET/{key} in-memory store) as an axum
server in src/testing.rs, gated on #[cfg(any(test, feature = "test-fixtures"))].
Updates Cargo.toml with reqwest added to test-fixtures feature, optional
axum/tokio/hyper/reqwest deps with explicit features, and dev-dependencies
for tokio + reqwest so tests run without the feature flag. Two unit tests
pass: round-trip PUT-then-GET and GET-missing-returns-404.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(zeb-322): HTTP relay client + pool rotation + cooldown

* feat(zeb-322): PkarrPublisher background task + BEP44 envelope sign

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(zeb-322): PkarrResolver with LRU cache + parallel epoch-window queries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(zeb-322): add axum+hyper as dev-deps so workspace nextest builds without test-fixtures

testing.rs is compiled under cfg(test), but axum was only available via the
optional test-fixtures feature. cargo nextest run --workspace builds tests
without that feature active, so axum was unresolved. Adding both as
dev-dependencies makes the test build self-contained regardless of features.

* chore(zeb-322): update Cargo.lock for axum+hyper dev-deps in harmony-pkarr

* fix(zeb-322): round-1 review fixes (relay all_404, resolver silent-drop, publisher schedule, seq u64, sign_new key match)

- relay.rs: add polled_any flag so empty available_relays() returns Err(NoRelaysAvailable) instead of Ok(None)
- resolver.rs: RPK1/RPK2 silent-drop discipline — parse_and_verify and verify_inner_sig failures now cache negative and return Ok(None); add verify_inner_sig call; update seq parsing to u64/8-byte
- publisher.rs: defer next_publish_at update until after publish attempt (success→epoch slot, failure→60s backoff); change seq from u32 to u64 with 8-byte LE envelope format
- record.rs: sign_new key-match guard — returns Err(IdentityMismatch) if signing key doesn't match harmony_identity_pub[32..64]; add test
- derive.rs: doc comment clarifying case-B intentional public-key derivation (CodeRabbit misread)
- Fix 6 (InvalidRecord() parens) was already correct — no change needed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(zeb-322): round-2 review fixes (PUT 5xx rotation, relay cooldown clarification, resolver style)

- relay.rs PUT: 5xx/non-429 errors now mark cooldown + continue rotating
  instead of aborting the pool; last HTTP error status is surfaced only
  after all relays are exhausted (finding #3)
- relay.rs GET: empty available_relays() (all on cooldown) now returns
  Err(NoRelaysAvailable) immediately instead of Ok(None), preventing
  false negative-caching of keys that were simply unreachable (finding #4)
- resolver.rs: alloc::vec::Vec::new() → Vec::new() style cleanup (finding #5)
- derive.rs: inline comment on Case B salt arm pointing to security rationale
  in doc comment + spec §5.3 (finding #6 clarification only, no logic change)

Round-1 fixes confirmed landed: publisher collect-then-publish schedule,
wrap_bep44_envelope u64 seq, parse_and_verify 8-byte LE read all verified
correct in b58952b (findings #1 and #2 already resolved).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(zeb-322): round-3 review fixes — RPK4 skew in resolve + GET 5xx cooldown

Two P1 findings from Greptile on a2f251a:

1. RPK4 (±30min skew) was claimed enforced in the PR description but never
   actually called inside `PkarrResolver::resolve`. Stale records were being
   accepted, positively cached for 15min, and returned to callers. Add the
   `verify_skew(now_ms)` check between RPK2 and the positive cache-put;
   negative-cache on skew failure so we don't spam the relay during the
   60s window.

2. `RelayClient::get` did not call `mark_cooldown` on non-success non-429
   responses (5xx), while `put` did. A misbehaving relay returning persistent
   5xx errors would be retried on every GET with no backoff. Mirror the PUT
   path: mark cooldown + continue.

32/32 tests pass.

* fix(zeb-322): round-4 review fixes (resolve_window partial err + publisher unregister/register race)

- resolver: track any_ok_none separately from any_err in resolve_window; prefer Ok(None) when at least one key confirmed absence, even if sibling keys errored transiently — only Err when every key errored
- publisher: add Arc<AtomicBool> cancelled field to ActivePublication; register() cancels prior entry before replacing; unregister() cancels removed entry; publish_one() short-circuits before PUT if cancelled; post-publish next_publish_at write-back skips if cancelled (fixes unregister in-flight race + mid-flight register overwrite)
- relay: verified CodeRabbit finding already addressed (polled_any && all_404 guard in place since round 2) — no change needed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(zeb-322): round-5 review fixes — cache skew re-check + MockPkarrRelay Drop

Two findings from round-4 Cursor review of 1e1edc3:

1. resolver.rs cache_get (MEDIUM): A record cached near the +30min skew edge
   could be served from the 15min positive cache up to 14min after it falls
   outside the RPK4 skew window. Re-check verify_skew on cache hit; on
   failure, evict + treat as miss so the next resolve refetches.

2. testing.rs MockPkarrRelay (LOW): The struct documented "drop to stop the
   server" but had no Drop impl. Add one that aborts the spawned axum task
   for prompt cleanup across test boundaries. (oneshot::Sender drop would
   trigger graceful shutdown via RecvError, but abort() is belt-and-
   suspenders.)

32/32 tests pass.

* fix(zeb-322): round-6 — rotate ephemeral key on every publish (Greptile P1)

Change `PkarrPublisher::register` to take an `EphemeralKeyBuilder`
(`Arc<dyn Fn(u64) -> SigningKey>`) instead of a fixed `SigningKey`. The
key is now re-derived on every publish cycle, so it tracks the current
epoch as time advances.

Before this fix, callers computed `current_epoch_id(now_ms())` once at
registration time and passed the resulting key to `register`. After the
first epoch boundary, the publisher kept writing under the previous
epoch's DHT key while resolvers (which derive against the current epoch
± tolerance) stopped finding the record — silently breaking discovery
for all three pkarr cases (invite, identity, community fallback) after
one or two epoch boundaries.

Greptile flagged this as P1 on harmony-client PR #158; the architectural
fix lives in this crate because `publish_one` is where the key is
actually consumed.

Also fixes Cursor LOW: removed dead `polled_any` variable in
`RelayClient::get` (the empty-relays guard above the loop already
ensures we polled at least one relay, so the variable was logically
dead). Replaced with a comment explaining the remaining `all_404`
guard's purpose.

New test `ephemeral_key_builder_invoked_each_publish` registers a
builder that returns two distinct keys on successive calls and verifies
both surfaces land on the mock relay across forced publish cycles.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant