# Dicee Simulation Data Exploration

This notebook demonstrates how to load and explore simulation results from `@dicee/simulation`.

## Setup

First, ensure you have run some simulations:
```bash
# From the dicee root directory
pnpm sim:run --profiles professor,carmen,riley --games 1000 --output ./results
```

In [None]:
# Import the analysis library
import polars as pl
from dicee_analysis import (
    load_games,
    describe_scores,
    plot_score_distribution,
    plot_score_boxplot,
)

# Configure matplotlib for notebook display
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')

## Load Data

Load game results from NDJSON or Parquet format.

In [None]:
# Update this path to your results directory
RESULTS_PATH = "../../../results/games.ndjson"

# Load games (set progress=True for large files)
games = load_games(RESULTS_PATH, progress=True)

print(f"Loaded {games.n_unique('game_id')} games")
print(f"Profiles: {games['profile_id'].unique().to_list()}")

In [None]:
# Preview the data
games.head(10)

## Basic Statistics

In [None]:
# Overall statistics
overall_stats = describe_scores(games)
print("Overall Score Statistics:")
print(f"  N: {overall_stats.n}")
print(f"  Mean: {overall_stats.mean:.2f} ± {overall_stats.std:.2f}")
print(f"  Median: {overall_stats.median:.1f}")
print(f"  Range: [{overall_stats.min:.0f}, {overall_stats.max:.0f}]")
print(f"  95% CI: [{overall_stats.ci95_lower:.2f}, {overall_stats.ci95_upper:.2f}]")

In [None]:
# Statistics by profile
profile_stats = describe_scores(games, by_profile=True)

for profile, stats in sorted(profile_stats.items(), key=lambda x: x[1].mean, reverse=True):
    print(f"\n{profile.upper()}:")
    print(f"  Mean: {stats.mean:.2f} ± {stats.std:.2f}")
    print(f"  95% CI: [{stats.ci95_lower:.2f}, {stats.ci95_upper:.2f}]")

## Score Distributions

In [None]:
# Histogram by profile
fig = plot_score_distribution(games, by_profile=True, title="Score Distribution by AI Profile")
plt.show()

In [None]:
# Box plot comparison
fig = plot_score_boxplot(games, title="Score Distribution by Profile")
plt.show()

## Quick Polars Queries

Use Polars for fast data exploration.

In [None]:
# Score summary by profile
games.group_by("profile_id").agg([
    pl.col("final_score").mean().alias("mean_score"),
    pl.col("final_score").std().alias("std_score"),
    pl.col("final_score").median().alias("median_score"),
    pl.col("upper_bonus").mean().alias("bonus_rate"),
    pl.len().alias("n_games"),
]).sort("mean_score", descending=True)

In [None]:
# Best games
games.sort("final_score", descending=True).head(10).select([
    "game_id", "profile_id", "final_score", "upper_bonus", "dicee_count"
])

## Next Steps

- See `02_profile_comparison.ipynb` for statistical comparisons
- See `03_calibration_analysis.ipynb` for AI calibration validation