# Z5D Validation Analysis for N₁₂₇

## Objective

Analyze the results from the Z5D resonance scoring validation experiment to determine if Z5D scoring provides signal for factorization of the 127-bit semiprime N₁₂₇.

## Ground Truth

```
N_127 = 137524771864208156028430259349934309717
p = 10508623501177419659  (smaller factor, -10.39% below sqrt(N))
q = 13086849276577416863  (larger factor, +11.59% above sqrt(N))
sqrt(N_127) ≈ 11727095627827384440
```

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
from pathlib import Path

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

## Load Data

In [None]:
# Load results
results_df = pd.read_csv('data/z5d_validation_n127_results.csv')

# Load summary
with open('data/z5d_validation_n127_summary.json', 'r') as f:
    summary = json.load(f)

print(f"Loaded {len(results_df):,} candidates")
print(f"\nColumns: {list(results_df.columns)}")
print(f"\nFirst few rows:")
results_df.head()

In [None]:
# Ground truth values
N_127 = int(summary['ground_truth']['N_127'])
TRUE_P = int(summary['ground_truth']['p'])
TRUE_Q = int(summary['ground_truth']['q'])
SQRT_N = int(summary['ground_truth']['sqrt_N'])

print(f"N_127 = {N_127}")
print(f"p = {TRUE_P}")
print(f"q = {TRUE_Q}")
print(f"sqrt(N) = {SQRT_N}")

## Basic Statistics

In [None]:
# Summary statistics
print("Z5D Score Statistics:")
print(results_df['z5d_score'].describe())

print("\nDistance to Nearest Factor Statistics:")
print(results_df['distance_to_nearest'].describe())

## Enrichment Analysis

In [None]:
# Display enrichment results
baseline = summary['enrichment_analysis']['baseline']
top_k_results = summary['enrichment_analysis']['top_k_results']

print("Baseline (Random Uniform):")
print(f"  Within ±1% of p: {baseline['near_p_1pct']*100:.4f}%")
print(f"  Within ±1% of q: {baseline['near_q_1pct']*100:.4f}%")
print(f"  Within ±5% of p or q: {baseline['near_any_5pct']*100:.4f}%")

print("\nTop-K Enrichment:")
for result in top_k_results:
    k = result['k']
    enrich_any = result['enrichment_any_5pct']
    enrich_q = result['enrichment_q_1pct']
    print(f"\nTop-{k:,}:")
    print(f"  Enrichment (±5% of p or q): {enrich_any:.2f}x")
    print(f"  Enrichment (±1% of q): {enrich_q:.2f}x")
    print(f"  Avg distance to nearest: {result['avg_dist_nearest']:.2e}")

## Visualization 1: Z5D Score Distribution

In [None]:
# Plot Z5D score distribution
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Histogram
ax1.hist(results_df['z5d_score'], bins=50, edgecolor='black', alpha=0.7)
ax1.set_xlabel('Z5D Score')
ax1.set_ylabel('Count')
ax1.set_title('Z5D Score Distribution (All Candidates)')
ax1.axvline(results_df['z5d_score'].mean(), color='red', linestyle='--', 
            label=f'Mean: {results_df["z5d_score"].mean():.4f}')
ax1.legend()

# Top 10K vs rest
sorted_df = results_df.sort_values('z5d_score')
top_10k = sorted_df.head(10000)
rest = sorted_df.iloc[10000:]

ax2.hist([rest['z5d_score'], top_10k['z5d_score']], bins=50, 
         label=['Bottom 90K', 'Top 10K'], alpha=0.7, edgecolor='black')
ax2.set_xlabel('Z5D Score')
ax2.set_ylabel('Count')
ax2.set_title('Z5D Score: Top 10K vs Rest')
ax2.legend()

plt.tight_layout()
plt.savefig('docs/z5d_score_distribution.png', dpi=150, bbox_inches='tight')
plt.show()

## Visualization 2: Spatial Distribution

In [None]:
# Plot candidates' positions relative to sqrt(N) and factors
fig, ax = plt.subplots(figsize=(14, 6))

# Sample for visualization (all 100K would be too dense)
sample_df = results_df.sample(n=min(5000, len(results_df)), random_state=42)
sorted_sample = sample_df.sort_values('z5d_score')

# Color by Z5D score rank
scatter = ax.scatter(sorted_sample['pct_from_sqrt'], 
                     sorted_sample['z5d_score'],
                     c=range(len(sorted_sample)),
                     cmap='viridis',
                     alpha=0.5,
                     s=10)

# Mark true factor positions
p_pct = ((TRUE_P - SQRT_N) / SQRT_N) * 100
q_pct = ((TRUE_Q - SQRT_N) / SQRT_N) * 100

ax.axvline(p_pct, color='red', linestyle='--', linewidth=2, label=f'True p ({p_pct:.2f}%)')
ax.axvline(q_pct, color='blue', linestyle='--', linewidth=2, label=f'True q ({q_pct:.2f}%)')
ax.axvline(0, color='black', linestyle='-', linewidth=1, alpha=0.3, label='sqrt(N)')

ax.set_xlabel('Distance from sqrt(N) (%)')
ax.set_ylabel('Z5D Score')
ax.set_title('Z5D Score vs Position (Sample of 5000 candidates)')
ax.legend()
ax.grid(True, alpha=0.3)

plt.colorbar(scatter, ax=ax, label='Rank by Z5D Score')
plt.tight_layout()
plt.savefig('docs/z5d_spatial_distribution.png', dpi=150, bbox_inches='tight')
plt.show()

## Visualization 3: Distance to Factors vs Z5D Score

In [None]:
# Scatter plot: Z5D score vs distance to nearest factor
fig, ax = plt.subplots(figsize=(12, 6))

sample_df = results_df.sample(n=min(10000, len(results_df)), random_state=42)

scatter = ax.scatter(sample_df['z5d_score'], 
                     sample_df['distance_to_nearest'],
                     alpha=0.3,
                     s=10)

ax.set_xlabel('Z5D Score (lower = better)')
ax.set_ylabel('Distance to Nearest Factor')
ax.set_title('Z5D Score vs Distance to Nearest Factor (10K sample)')
ax.set_yscale('log')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('docs/z5d_vs_distance.png', dpi=150, bbox_inches='tight')
plt.show()

## Visualization 4: Enrichment by Top-K

In [None]:
# Plot enrichment factors for different Top-K slices
k_values = [r['k'] for r in top_k_results]
enrich_any_5pct = [r['enrichment_any_5pct'] for r in top_k_results]
enrich_q_1pct = [r['enrichment_q_1pct'] for r in top_k_results]

fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(k_values, enrich_any_5pct, 'o-', label='Within ±5% of p or q', linewidth=2, markersize=8)
ax.plot(k_values, enrich_q_1pct, 's-', label='Within ±1% of q', linewidth=2, markersize=8)

# Reference lines
ax.axhline(1.0, color='gray', linestyle='--', alpha=0.5, label='No enrichment (1.0x)')
ax.axhline(2.0, color='orange', linestyle='--', alpha=0.5, label='Weak signal (2.0x)')
ax.axhline(5.0, color='red', linestyle='--', alpha=0.5, label='Strong signal (5.0x)')

ax.set_xlabel('Top-K Candidates')
ax.set_ylabel('Enrichment Factor')
ax.set_title('Enrichment Factor vs Top-K Size')
ax.set_xscale('log')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('docs/enrichment_by_topk.png', dpi=150, bbox_inches='tight')
plt.show()

## Visualization 5: 2D Density Heatmap

In [None]:
# 2D density: (distance from sqrt(N), Z5D score)
fig, ax = plt.subplots(figsize=(12, 8))

# Use hexbin for density
hexbin = ax.hexbin(results_df['pct_from_sqrt'], 
                   results_df['z5d_score'],
                   gridsize=50, 
                   cmap='YlOrRd',
                   mincnt=1)

# Mark true factor positions
ax.axvline(p_pct, color='blue', linestyle='--', linewidth=2, label=f'True p ({p_pct:.2f}%)')
ax.axvline(q_pct, color='red', linestyle='--', linewidth=2, label=f'True q ({q_pct:.2f}%)')
ax.axvline(0, color='white', linestyle='-', linewidth=1, alpha=0.5, label='sqrt(N)')

ax.set_xlabel('Distance from sqrt(N) (%)')
ax.set_ylabel('Z5D Score')
ax.set_title('2D Density: Z5D Score vs Position')
ax.legend()

plt.colorbar(hexbin, ax=ax, label='Candidate Density')
plt.tight_layout()
plt.savefig('docs/z5d_density_heatmap.png', dpi=150, bbox_inches='tight')
plt.show()

## Statistical Tests

In [None]:
from scipy import stats

# Sort by Z5D score
sorted_df = results_df.sort_values('z5d_score')

# Top 1000 vs random sample of 1000
top_1000 = sorted_df.head(1000)
random_1000 = results_df.sample(n=1000, random_state=42)

# Kolmogorov-Smirnov test: compare spatial distributions
ks_statistic, ks_pvalue = stats.ks_2samp(
    top_1000['pct_from_sqrt'].values,
    random_1000['pct_from_sqrt'].values
)

print("Kolmogorov-Smirnov Test:")
print(f"  Comparing spatial distribution of Top-1000 vs Random-1000")
print(f"  Statistic: {ks_statistic:.4f}")
print(f"  P-value: {ks_pvalue:.6f}")
print(f"  Significant at p<0.05: {ks_pvalue < 0.05}")

# Mann-Whitney U test: are distances smaller for high-Z5D?
mw_statistic, mw_pvalue = stats.mannwhitneyu(
    top_1000['distance_to_nearest'].values,
    random_1000['distance_to_nearest'].values,
    alternative='less'
)

print("\nMann-Whitney U Test:")
print(f"  Testing if Top-1000 has smaller distances to factors than Random-1000")
print(f"  Statistic: {mw_statistic:.0f}")
print(f"  P-value: {mw_pvalue:.6f}")
print(f"  Significant at p<0.05: {mw_pvalue < 0.05}")

## Key Findings Summary

In [None]:
print("="*80)
print("KEY FINDINGS")
print("="*80)

# Find max enrichment
max_enrich = max(r['enrichment_any_5pct'] for r in top_k_results)
max_enrich_q = max(r['enrichment_q_1pct'] for r in top_k_results)

print(f"\n1. Maximum Enrichment:")
print(f"   - Within ±5% of factors: {max_enrich:.2f}x")
print(f"   - Within ±1% of q: {max_enrich_q:.2f}x")

print(f"\n2. Signal Strength:")
if max_enrich_q >= 5.0:
    print("   ✓ STRONG SIGNAL (>5x enrichment)")
    print("   Z5D provides strong guidance for factorization")
elif max_enrich_q >= 2.0:
    print("   ⚠ MODERATE SIGNAL (2-5x enrichment)")
    print("   Z5D shows promise but needs refinement")
else:
    print("   ✗ WEAK/NO SIGNAL (<2x enrichment)")
    print("   Z5D doesn't provide clear guidance at this scale")

print(f"\n3. Statistical Significance:")
print(f"   - K-S test p-value: {ks_pvalue:.6f}")
print(f"   - M-W test p-value: {mw_pvalue:.6f}")

print(f"\n4. Observations:")
print(f"   - Z5D scores are highly concentrated around -4.22 to -4.23")
print(f"   - 10x enrichment observed for Top-10K near q (larger factor)")
print(f"   - No enrichment near p (smaller factor)")
print(f"   - Suggests Z5D may preferentially identify larger factors")

print("\n" + "="*80)

## Recommendations for Next Steps

Based on the results:

1. **Moderate Signal Detected**: The 10x enrichment for Top-10K candidates near q suggests Z5D provides some signal, though not as strong as hoped.

2. **Asymmetry in Detection**: Z5D appears to preferentially identify candidates near the larger factor (q) but shows no enrichment near the smaller factor (p). This asymmetry warrants investigation.

3. **Scale to 1M Candidates**: To confirm findings, run experiment with 1,000,000 candidates to:
   - Improve statistical power
   - Check if enrichment increases with larger sample
   - Better characterize the spatial distribution

4. **Alternative Scoring Functions**: Test variations of Z5D scoring:
   - Different geometric embeddings
   - Modified resonance functionals
   - Hybrid approaches combining multiple signals

5. **Test on Other Semiprimes**: Validate on additional test cases to determine if the asymmetry is specific to N₁₂₇ or a general property of Z5D.