# DisasterAI: AI Alignment & Echo Chamber Experiments

This notebook runs all three experiment protocols:

| # | Experiment | Duration | Purpose |
|---|-----------|----------|--------|
| C | Agent Improvements | 150 ticks | Verify fixes (Issues 2-5) |
| A | Dual-Timeline Feedback | 150 ticks | Validate learning mechanism |
| B | Filter Bubble Dynamics | 200 ticks | Test echo chamber hypotheses |

**Research question:** Does AI that confirms human beliefs create self-reinforcing echo chambers that degrade decision-making in disaster response?

## 0. Setup

In [None]:
# Install dependencies
!pip install -q mesa numpy matplotlib networkx seaborn scipy

In [None]:
# Clone repo and checkout the correct branch
import os

REPO_URL = "https://github.com/tinacomes/DisasterAI.git"
BRANCH = "claude/ai-alignment-echo-chamber-experiments-WiQWs"
REPO_DIR = "/content/DisasterAI"

try:
    from google.colab import drive
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    if not os.path.exists(REPO_DIR):
        !git clone -b {BRANCH} {REPO_URL} {REPO_DIR}
    else:
        # Ensure we're on the right branch
        !cd {REPO_DIR} && git fetch origin {BRANCH} && git checkout {BRANCH} && git pull origin {BRANCH}
    os.chdir(REPO_DIR)
    # Mount Drive for saving results
    drive.mount('/content/drive', force_remount=False)
    OUTPUT_DIR = "/content/drive/MyDrive/DisasterAI_Results"
else:
    # Local: assume we're already in the repo directory
    OUTPUT_DIR = "test_results"

os.makedirs(OUTPUT_DIR, exist_ok=True)
print(f"Working directory: {os.getcwd()}")
print(f"Output directory:  {OUTPUT_DIR}")
print(f"Branch: {BRANCH}")

In [None]:
# Core imports
import numpy as np
import matplotlib
matplotlib.use('Agg')  # Non-interactive backend for Colab
import matplotlib.pyplot as plt
import math
import gc

from DisasterAI_Model import DisasterModel, HumanAgent

print("Imports successful.")

---
## Experiment C: Agent Improvement Verification (Issues 2-5)

Validates the four fixes work correctly before running the main experiments.

In [None]:
from test_agent_improvements import (
    base_params as agent_params,
    test_weighted_q_reward,
    test_belief_accuracy_reward,
    test_phase_structure,
    test_uncertainty_seeking,
    run_integration_test,
    visualize_results as visualize_agent_results,
)

print("=" * 70)
print("EXPERIMENT C: Agent Improvement Verification (Issues 2-5)")
print("=" * 70)

# Unit tests
test_weighted_q_reward()           # Issue 5
test_belief_accuracy_reward()      # Issue 2
test_phase_structure()             # Issue 3

In [None]:
# Issue 4: Explorer uncertainty-seeking (needs warmed-up model)
print("\n--- Warming up model for uncertainty-seeking test ---")
warmup_params = agent_params.copy()
warmup_params['ai_alignment_level'] = 0.5
warmup_model = DisasterModel(**warmup_params)
for _ in range(20):
    warmup_model.step()
test_uncertainty_seeking(warmup_model)
del warmup_model
gc.collect()

In [None]:
# Integration tests + visualization
results_c_high = run_integration_test(0.9, "Confirming AI")
results_c_low = run_integration_test(0.1, "Truthful AI")

visualize_agent_results(results_c_high, results_c_low)

# Move output to shared results directory
import shutil
src = "test_results/test_agent_improvements.png"
if os.path.exists(src):
    shutil.copy(src, os.path.join(OUTPUT_DIR, "C_agent_improvements.png"))

# Show inline
if os.path.exists(src):
    from IPython.display import Image, display
    display(Image(filename=src))

# Free memory
del results_c_high, results_c_low
gc.collect()
print("\nExperiment C complete.")

---
## Experiment A: Dual-Timeline Feedback Mechanism

Tests whether agents learn to distinguish truthful from confirming AI through fast info-quality feedback (3-15 ticks) and slow relief-outcome feedback (15-25 ticks).

**Key hypothesis:** Exploratory agents should decrease Q(AI) under confirming AI (alignment=0.9) because they weight accuracy 0.8 vs confirmation 0.2.

In [None]:
from test_dual_feedback import (
    run_test as run_dual_feedback_test,
    visualize_results as visualize_dual_feedback,
)

print("=" * 70)
print("EXPERIMENT A: Dual-Timeline Feedback Mechanism")
print("=" * 70)

results_a_high = run_dual_feedback_test(ai_alignment=0.9, test_name="High Alignment (Confirming AI)")
results_a_low = run_dual_feedback_test(ai_alignment=0.1, test_name="Low Alignment (Truthful AI)")

In [None]:
visualize_dual_feedback(results_a_high, results_a_low)

src = "test_results/dual_feedback_test.png"
if os.path.exists(src):
    shutil.copy(src, os.path.join(OUTPUT_DIR, "A_dual_feedback.png"))
    from IPython.display import Image, display
    display(Image(filename=src))

del results_a_high, results_a_low
gc.collect()
print("\nExperiment A complete.")

---
## Experiment B: Filter Bubble Dynamics

Tests whether AI alignment creates, amplifies, or breaks social echo chambers.

**Conditions:** Control (no AI), Truthful (0.1), Mixed (0.5), Confirming (0.9) â€” 200 ticks each.

**Hypotheses:**
- H1: Confirming AI amplifies social filter bubbles (SECI more negative)
- H2: Truthful AI breaks filter bubbles (SECI less negative)
- H3: High AECI + confirming AI = strongest combined bubbles
- H4: Exploratory agents show weaker filter bubble effects

In [None]:
from test_filter_bubbles import (
    run_filter_bubble_experiment,
    visualize_filter_bubble_results,
)

print("=" * 70)
print("EXPERIMENT B: Filter Bubble Dynamics")
print("=" * 70)

conditions = {
    'Control': None,
    'Truthful (0.1)': 0.1,
    'Mixed (0.5)': 0.5,
    'Confirming (0.9)': 0.9,
}

results_b = {}
for name, alignment in conditions.items():
    results_b[name] = run_filter_bubble_experiment(alignment, name)
    gc.collect()  # Free memory between conditions

In [None]:
visualize_filter_bubble_results(results_b)

src = "test_results/filter_bubble_experiment.png"
if os.path.exists(src):
    shutil.copy(src, os.path.join(OUTPUT_DIR, "B_filter_bubbles.png"))
    from IPython.display import Image, display
    display(Image(filename=src))

del results_b
gc.collect()
print("\nExperiment B complete.")

---
## Results Summary

All outputs saved to the results directory. Check the following files:

| File | Experiment | Contents |
|------|-----------|----------|
| `C_agent_improvements.png` | Agent fixes | Q-values, self-action, query distance |
| `A_dual_feedback.png` | Dual feedback | Q-values, trust, feedback events, MAE |
| `B_filter_bubbles.png` | Filter bubbles | SECI, AECI, belief accuracy, hypotheses |

In [None]:
print("All experiments complete.")
print(f"Results saved to: {OUTPUT_DIR}")
print()
for f in sorted(os.listdir(OUTPUT_DIR)):
    if f.endswith('.png'):
        path = os.path.join(OUTPUT_DIR, f)
        size_kb = os.path.getsize(path) / 1024
        print(f"  {f} ({size_kb:.0f} KB)")