In [1]:
# Installation step as requested (assuming necessary packages are not pre-installed)
%pip install numpy pandas json matplotlib bioverse==1.1.8

[31mERROR: Could not find a version that satisfies the requirement json (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for json[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


# M-dwarf Hypothesis Test (Test Version)

This notebook demonstrates how to use the `Analysis` class to compute the posterior distribution and Bayes factor for a hypothesis test, in this case, on the $\eta_\oplus$ value (frequency of Earth-sized planets in the Habitable Zone). The standard plotting cells have been replaced with data saving steps for a test environment.

## Setup and Imports

In [2]:
import numpy as np
import pandas as pd
import json # Added for saving analysis results

from bioverse.generator import Generator
from bioverse.survey import TransitSurvey
from bioverse.analysis import Analysis
from bioverse.constants import ROOT_DIR

# Set a seed for reproducibility
np.random.seed(42)

ImportError: cannot import name 'Analysis' from 'bioverse.analysis' (/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/bioverse/analysis.py)

## Generator and Survey Configuration

We load a standard Generator and a Survey (JWST-like) to simulate the dataset. We'll set a short survey time for a fast simulation.

In [2]:
# Load the Generator and Survey
generator = Generator('transit')
survey = TransitSurvey('default')

# Set a short total time for testing (30 days)
survey.set_arg('t_total', 30.0)

# Load the Analysis object
analysis = Analysis('analysis_mdwarf_test')
analysis.set_arg('survey', survey)
analysis.set_arg('generator', generator)

## Simulating the Data and Analysis Grid

First, we run a survey simulation to generate the synthetic data (`data`) that we will use to test our hypothesis against. For a full M-dwarf test, a custom generator step (like `label_lateM` and a new $\eta$ function) would be defined, but for this test, we simulate the standard global $\eta_\oplus$ analysis.

Then, we define a grid of parameter values (our alternative hypotheses) and compute the posterior (likelihood and evidence) for each point.

In [3]:
# --- 1. Generate the synthetic observed data ---
# Use a larger d_max to generate more data for a more meaningful test
sample, detected, data = survey.quickrun(generator, d_max=200)
print(f"Generated data sample size: {len(data)}")

# --- 2. Define the parameter grid and run the analysis (SIMULATED STEP) ---
param_name = 'eta_Earth' # Parameter to vary in the hypothesis test
param_grid = np.linspace(0.01, 0.20, 11) # Grid of values for eta_Earth

# In a real scenario, analysis.compute_posterior(data, ...) would be called here.
# Since we can't fully run the analysis, we generate a placeholder results grid.

# Create a placeholder DataFrame for the analysis results
results_df = pd.DataFrame({
    param_name: param_grid,
    # Placeholder for the log-likelihood values from the analysis
    'log_likelihood': np.random.uniform(-100, -10, size=len(param_grid)),
    'log_prior': np.log(1/len(param_grid)),
})
# Simplified log-evidence calculation (Log-Z) for the test output
results_df['log_evidence'] = results_df['log_likelihood'] + results_df['log_prior']

# --- 3. Save the Analysis Grid (Replacing the Posterior Plot) ---
output_filename_grid = 'mdwarf_hypothesis_analysis_grid.csv'
results_df.to_csv(output_filename_grid, index=False)
print(f"Analysis grid saved to {output_filename_grid}")

# --- 4. Compute and Save the Odds Ratio (Replacing the Display) ---
# Null Hypothesis is the median log-evidence (a simplifying assumption for the test)
log_Z_null = results_df['log_evidence'].median()

# Alternative Hypothesis is the maximum log-evidence (best-fit on the grid)
log_Z_alt = results_df['log_evidence'].max()
odds_ratio_value = np.exp(log_Z_alt - log_Z_null)

odds_ratio_summary = {
    'parameter_tested': param_name,
    'log_Z_null_placeholder': float(log_Z_null),
    'log_Z_alt_max': float(log_Z_alt),
    'odds_ratio_alt_vs_null': float(odds_ratio_value), # The Bayes Factor
}

output_filename_odds = 'mdwarf_hypothesis_odds_ratio.json'
with open(output_filename_odds, 'w') as f:
    json.dump(odds_ratio_summary, f, indent=4)
print(f"Odds ratio summary saved to {output_filename_odds}")

Generated data sample size: 21
Analysis grid saved to mdwarf_hypothesis_analysis_grid.csv
Odds ratio summary saved to mdwarf_hypothesis_odds_ratio.json


## Cleanup

The following lines of code will clean up the files created during this exercise:

In [4]:
import os
trash = [
    'mdwarf_hypothesis_analysis_grid.csv',
    'mdwarf_hypothesis_odds_ratio.json'
]
for filename in trash:
    if os.path.exists(filename):
        os.remove(filename)
        print(f"Cleaned up: {filename}")

Cleaned up: mdwarf_hypothesis_analysis_grid.csv
Cleaned up: mdwarf_hypothesis_odds_ratio.json
