# Zero-DRAM Recursive Folding Pipeline (SVE2 -> BabyJubJub)

This notebook provides a conceptual walkthrough of the in-register telemetry attestation pipeline implemented in `src/telemetry_fold_sve2.c`.


## 1) Conceptual sketch of recursive folding

Stages: **Ingest Gate -> NTT Furnace -> LWE Accumulator -> Divergence Track -> SNARK Wrapper -> Register Scrub**.

- Ingest loads telemetry polynomials (256 coefficients in Z_3329).
- NTT butterfly and Montgomery reduce run branchlessly.
- Fold accumulator updates `Z16-Z23` equivalent state in-register.
- Integer-point divergence from Gaussian-seeded midpoint is folded every step.
- Tail-call SNARK witness generation only at epoch end (10^8 items).


In [None]:
import numpy as np
import matplotlib.pyplot as plt

Q = 3329
N = 256
MIDPOINT_SEED = 1663

def montgomery_reduce_scalar(z, q=Q, qinv=3327):
    """Python mirror of montgomery_reduce_sve2 lane behavior."""
    m = (z * qinv) & 0xFFFF
    t = (z + m * q) >> 16
    return t - q if t >= q else t

def fold_step(z_in, z_acc, r_i):
    scaled = np.array([montgomery_reduce_scalar(int(v) * int(r_i)) for v in z_in], dtype=np.int64)
    z_acc = (z_acc + scaled) % Q
    delta = (z_acc - MIDPOINT_SEED) % Q
    abs_delta = np.minimum(delta, (Q - delta) % Q)
    divergence = np.array([montgomery_reduce_scalar(int(v) * int(v)) for v in abs_delta], dtype=np.int64)
    z_acc = (z_acc + divergence) % Q
    return z_acc, divergence


### Mapping to SVE2 C implementation
- `montgomery_reduce_scalar` ↔ `montgomery_reduce_sve2`
- `fold_step` first add/mul ↔ `fold_accumulator`
- `abs_delta` and `divergence` ↔ `integer_divergence_track`
- Final limb pack below ↔ `snark_wrap_edwards`


In [None]:
# Simulate a tiny epoch for visualization (10^8 in production).
rng = np.random.default_rng(7)
items = 512
r_i = 23
z_acc = np.zeros(N, dtype=np.int64)
divergence_energy = []

for _ in range(items):
    z_in = rng.integers(0, Q, size=N, dtype=np.int64)
    z_acc, div = fold_step(z_in, z_acc, r_i)
    divergence_energy.append(int(div.sum()))

# BabyJubJub witness-style 512-bit packing concept (4 limbs x 128b).
limbs = [
    int(np.sum(z_acc[0:64]) % Q),
    int(np.sum(z_acc[64:128]) % Q),
    int(np.sum(z_acc[128:192]) % Q),
    int(np.sum(z_acc[192:256]) % Q),
]
witness_512 = ''.join(f'{x:032x}' for x in limbs)
print('Conceptual 512-bit witness:', witness_512)


In [None]:
fig, ax = plt.subplots(1, 2, figsize=(12, 4))
ax[0].plot(divergence_energy)
ax[0].set_title('In-register divergence progression')
ax[0].set_xlabel('Fold step')
ax[0].set_ylabel('Divergence energy (sum of lanes)')

ax[1].plot(z_acc[:64])
ax[1].set_title('Accumulator lanes (first 64 coeffs)')
ax[1].set_xlabel('Coefficient index')
ax[1].set_ylabel('Value mod q')

plt.tight_layout()
plt.show()


## 2) Deterministic timing and zero-DRAM notes
- Every arithmetic step is expressed with branchless vector operations.
- Divergence and fold happen per lane in a fixed schedule.
- Witness emission occurs once per epoch (tail-call SNARK model).
- Final register scrub (`EOR Z, Z, Z`) corresponds to the exit path in C.
