# SciTeX Stats Server - Statistical Analysis Examples

This notebook demonstrates the statistical analysis capabilities of the SciTeX Stats MCP server.

## Overview

The Stats server provides enhanced statistical functions including:
- Comprehensive hypothesis testing
- Effect size calculations
- Multiple comparison corrections
- Time series analysis
- Power analysis

## Example 1: Basic Statistical Comparison

In [None]:
# Original scipy/statsmodels code
original_stats = '''
import numpy as np
from scipy import stats
import pandas as pd

# Load experimental data
control = np.array([23.1, 25.3, 24.8, 22.9, 26.2, 24.5, 23.7, 25.1])
treatment = np.array([28.3, 30.1, 29.5, 31.2, 29.8, 30.5, 28.9, 29.7])

# Basic t-test
t_stat, p_value = stats.ttest_ind(control, treatment)
print(f"T-statistic: {t_stat:.3f}, p-value: {p_value:.3f}")

# Check normality
_, p_norm_control = stats.normaltest(control)
_, p_norm_treatment = stats.normaltest(treatment)

# If not normal, use Mann-Whitney U
if p_norm_control < 0.05 or p_norm_treatment < 0.05:
    u_stat, p_value_mw = stats.mannwhitneyu(control, treatment)
    print(f"Mann-Whitney U: {u_stat}, p-value: {p_value_mw:.3f}")
'''

print("ORIGINAL STATISTICAL ANALYSIS:")
print(original_stats)

In [None]:
# SciTeX enhanced analysis
scitex_stats = '''
import scitex as stx
import numpy as np

def main():
    """Statistical analysis with enhanced features."""
    # Load experimental data
    control = np.array([23.1, 25.3, 24.8, 22.9, 26.2, 24.5, 23.7, 25.1])
    treatment = np.array([28.3, 30.1, 29.5, 31.2, 29.8, 30.5, 28.9, 29.7])
    
    # Comprehensive comparison with auto-selection of appropriate test
    results = stx.stats.compare_groups(
        control, treatment,
        tests=['auto'],  # Automatically selects appropriate tests
        calculate_effect_size=True,
        confidence_level=0.95
    )
    
    # Display comprehensive results
    stx.stats.print_comparison_summary(results)
    
    # Save detailed report
    stx.io.save(results, './results/statistical_comparison.json', 
                symlink_from_cwd=True)
    
    # Create visualization
    fig = stx.stats.plot_comparison(
        control, treatment,
        labels=['Control', 'Treatment'],
        show_stats=True,
        show_effect_size=True
    )
    stx.io.save(fig, './figures/group_comparison.png', 
                symlink_from_cwd=True)
    
    return results
'''

print("SCITEX ENHANCED ANALYSIS:")
print(scitex_stats)

In [None]:
# Expected results structure
expected_results = {
    "normality": {
        "control": {"statistic": 0.892, "p_value": 0.234, "is_normal": True},
        "treatment": {"statistic": 0.913, "p_value": 0.381, "is_normal": True}
    },
    "parametric": {
        "t_test": {
            "statistic": -7.234,
            "p_value": 0.00003,
            "df": 14,
            "confidence_interval": [-6.83, -3.67]
        },
        "welch_test": {
            "statistic": -7.234,
            "p_value": 0.00003,
            "df": 13.97
        }
    },
    "non_parametric": {
        "mann_whitney": {
            "statistic": 0.0,
            "p_value": 0.0002,
            "effect_size_r": 0.884
        }
    },
    "effect_sizes": {
        "cohens_d": 3.621,
        "hedges_g": 3.489,
        "glass_delta": 3.754,
        "interpretation": "large"
    },
    "descriptive": {
        "control": {"mean": 24.45, "std": 1.32, "se": 0.467, "n": 8},
        "treatment": {"mean": 29.75, "std": 0.98, "se": 0.346, "n": 8}
    },
    "recommendation": "Use parametric t-test (data is normally distributed)"
}

print("COMPREHENSIVE RESULTS STRUCTURE:")
import json
print(json.dumps(expected_results, indent=2))

## Example 2: Multiple Comparisons with Corrections

In [None]:
# Multiple group comparisons
multiple_groups = '''
import scitex as stx
import numpy as np

def analyze_multiple_groups():
    """Analyze multiple experimental conditions."""
    # Simulated data for 4 conditions
    groups = {
        'Control': np.random.normal(100, 15, 30),
        'Drug_A': np.random.normal(110, 12, 28),
        'Drug_B': np.random.normal(108, 14, 32),
        'Combined': np.random.normal(120, 10, 29)
    }
    
    # One-way ANOVA with post-hoc tests
    anova_results = stx.stats.one_way_anova(
        groups,
        post_hoc=['tukey', 'bonferroni', 'holm'],
        check_assumptions=True,
        plot_diagnostics=True
    )
    
    # Display ANOVA table
    print("ANOVA Results:")
    print(anova_results['anova_table'])
    
    # Display post-hoc comparisons with corrections
    print("\nPost-hoc Comparisons (Tukey HSD):")
    print(anova_results['post_hoc']['tukey'])
    
    # Create comprehensive visualization
    fig = stx.stats.plot_anova_results(
        groups,
        anova_results,
        show_means=True,
        show_ci=True,
        show_pairwise=True,
        annotate_significance=True
    )
    
    stx.io.save(fig, './figures/anova_results.png', 
                dpi=300, symlink_from_cwd=True)
    
    # Power analysis
    power = stx.stats.anova_power_analysis(
        effect_size=anova_results['effect_size'],
        n_groups=len(groups),
        n_per_group=[len(v) for v in groups.values()],
        alpha=0.05
    )
    
    print(f"\nStatistical Power: {power:.3f}")
    
    return anova_results
'''

print("MULTIPLE COMPARISONS WITH SCITEX:")
print(multiple_groups)

## Example 3: Time Series Analysis

In [None]:
# Time series statistical analysis
time_series_analysis = '''
import scitex as stx
import numpy as np

def analyze_time_series():
    """Analyze longitudinal experimental data."""
    # Load time series data
    data = stx.io.load('./data/longitudinal_measurements.csv')
    
    # Repeated measures ANOVA
    rm_results = stx.stats.repeated_measures_anova(
        data,
        subject_col='subject_id',
        within_cols=['time'],
        between_cols=['treatment'],
        dv_col='measurement',
        sphericity_correction='greenhouse-geisser'
    )
    
    # Trend analysis
    trends = stx.stats.polynomial_contrasts(
        data,
        time_col='time',
        value_col='measurement',
        group_col='treatment',
        max_degree=3
    )
    
    # Change point detection
    change_points = stx.stats.detect_change_points(
        data['measurement'].values,
        method='cusum',
        confidence_level=0.95
    )
    
    # Autocorrelation analysis
    acf_results = stx.stats.autocorrelation_analysis(
        data['measurement'].values,
        max_lag=20,
        confidence_level=0.95
    )
    
    # Create multi-panel figure
    fig = stx.plt.create_figure(
        n_panels=4,
        layout='2x2',
        figsize=(12, 10)
    )
    
    # Panel 1: Time series with change points
    fig.panels[0].plot_time_series(
        data,
        highlight_changes=change_points,
        title='Longitudinal Measurements'
    )
    
    # Panel 2: Group means over time
    fig.panels[1].plot_rm_anova(
        rm_results,
        show_individual=True,
        title='Repeated Measures Analysis'
    )
    
    # Panel 3: Trend components
    fig.panels[2].plot_trends(
        trends,
        title='Polynomial Trends'
    )
    
    # Panel 4: Autocorrelation
    fig.panels[3].plot_acf(
        acf_results,
        title='Autocorrelation Function'
    )
    
    stx.io.save(fig, './figures/time_series_analysis.png', 
                symlink_from_cwd=True)
    
    return {
        'rm_anova': rm_results,
        'trends': trends,
        'change_points': change_points,
        'autocorrelation': acf_results
    }
'''

print("TIME SERIES ANALYSIS WITH SCITEX:")
print(time_series_analysis)

## Example 4: Regression and Model Comparison

In [None]:
# Advanced regression analysis
regression_analysis = '''
import scitex as stx

def regression_model_comparison():
    """Compare different regression models."""
    # Load data
    data = stx.io.load('./data/experiment_data.csv')
    
    # Define models to compare
    models = {
        'linear': 'outcome ~ predictor1 + predictor2',
        'interaction': 'outcome ~ predictor1 * predictor2',
        'polynomial': 'outcome ~ predictor1 + I(predictor1**2) + predictor2',
        'full': 'outcome ~ predictor1 * predictor2 + I(predictor1**2) + I(predictor2**2)'
    }
    
    # Fit and compare models
    comparison = stx.stats.compare_models(
        data,
        models,
        method='ols',
        criteria=['aic', 'bic', 'r2_adj', 'rmse'],
        cross_validate=True,
        n_folds=5
    )
    
    # Best model selection
    best_model = comparison['best_model']
    print(f"Best model: {best_model['name']}")
    print(f"AIC: {best_model['aic']:.2f}")
    print(f"Cross-validated R²: {best_model['cv_r2']:.3f}")
    
    # Detailed diagnostics for best model
    diagnostics = stx.stats.regression_diagnostics(
        best_model['fitted_model'],
        data,
        tests=['normality', 'homoscedasticity', 'independence', 'linearity']
    )
    
    # Create diagnostic plots
    fig = stx.stats.plot_regression_diagnostics(
        best_model['fitted_model'],
        data,
        plots=['residuals', 'qq', 'leverage', 'partial']
    )
    
    stx.io.save(fig, './figures/regression_diagnostics.png',
                symlink_from_cwd=True)
    
    # Model coefficients with confidence intervals
    coef_table = stx.stats.format_regression_table(
        best_model['fitted_model'],
        confidence_level=0.95,
        include_vif=True,
        include_standardized=True
    )
    
    print("\nRegression Coefficients:")
    print(coef_table)
    
    # Save comprehensive report
    report = {
        'model_comparison': comparison,
        'diagnostics': diagnostics,
        'coefficients': coef_table
    }
    
    stx.io.save(report, './results/regression_analysis.json',
                symlink_from_cwd=True)
    
    return report
'''

print("REGRESSION ANALYSIS WITH SCITEX:")
print(regression_analysis)

## Example 5: Power Analysis and Sample Size Calculation

In [None]:
# Power analysis examples
power_analysis = '''
import scitex as stx

def power_and_sample_size():
    """Calculate power and required sample sizes."""
    
    # 1. Post-hoc power analysis
    existing_data = stx.io.load('./data/pilot_study.csv')
    
    power_results = stx.stats.power_analysis(
        data=existing_data,
        test='t-test',
        groups=['control', 'treatment'],
        alpha=0.05
    )
    
    print(f"Observed effect size: {power_results['effect_size']:.3f}")
    print(f"Statistical power: {power_results['power']:.3f}")
    
    # 2. A priori sample size calculation
    sample_size = stx.stats.calculate_sample_size(
        test='anova',
        effect_size='medium',  # or specific value like 0.25
        alpha=0.05,
        power=0.80,
        n_groups=4
    )
    
    print(f"\nRequired sample size per group: {sample_size['n_per_group']}")
    print(f"Total N: {sample_size['total_n']}")
    
    # 3. Power curve visualization
    fig = stx.stats.plot_power_curves(
        test='t-test',
        effect_sizes=[0.2, 0.5, 0.8],  # small, medium, large
        n_range=(10, 100),
        alpha=0.05,
        alternative='two-sided'
    )
    
    stx.io.save(fig, './figures/power_curves.png',
                symlink_from_cwd=True)
    
    # 4. Sensitivity analysis
    sensitivity = stx.stats.power_sensitivity(
        n=30,
        alpha=0.05,
        power=0.80,
        test='correlation',
        parameter_range='effect_size'
    )
    
    print(f"\nMinimum detectable effect: r = {sensitivity['min_effect']:.3f}")
    
    # 5. Multi-stage power analysis for complex designs
    complex_power = stx.stats.complex_power_analysis(
        design='2x3_mixed',  # 2 between, 3 within
        effect_sizes={
            'between': 0.25,
            'within': 0.15,
            'interaction': 0.10
        },
        correlation_within=0.7,
        alpha=0.05,
        power_target=0.80
    )
    
    print("\nComplex Design Power Analysis:")
    print(f"Required N per cell: {complex_power['n_per_cell']}")
    print(f"Power for main effects: {complex_power['power_main']:.3f}")
    print(f"Power for interaction: {complex_power['power_interaction']:.3f}")
    
    return {
        'post_hoc': power_results,
        'a_priori': sample_size,
        'sensitivity': sensitivity,
        'complex_design': complex_power
    }
'''

print("POWER ANALYSIS WITH SCITEX:")
print(power_analysis)

## Example 6: Bayesian Analysis Integration

In [None]:
# Bayesian statistical analysis
bayesian_example = '''
import scitex as stx

def bayesian_analysis():
    """Perform Bayesian statistical analysis."""
    # Load data
    data = stx.io.load('./data/clinical_trial.csv')
    
    # Bayesian t-test
    bayes_result = stx.stats.bayesian_t_test(
        data[data['group'] == 'control']['outcome'],
        data[data['group'] == 'treatment']['outcome'],
        prior='cauchy',  # or custom prior
        prior_scale=0.707,
        n_samples=10000
    )
    
    print(f"Bayes Factor (BF10): {bayes_result['bf10']:.2f}")
    print(f"Posterior probability of H1: {bayes_result['prob_h1']:.3f}")
    print(f"95% Credible Interval: [{bayes_result['ci_lower']:.3f}, {bayes_result['ci_upper']:.3f}]")
    
    # Bayesian regression
    bayes_reg = stx.stats.bayesian_regression(
        data,
        formula='outcome ~ treatment + age + baseline',
        priors={
            'treatment': 'normal(0, 1)',
            'age': 'normal(0, 0.1)',
            'baseline': 'normal(1, 0.5)'
        },
        chains=4,
        iter=2000,
        warmup=1000
    )
    
    # Posterior distributions
    fig = stx.stats.plot_posterior(
        bayes_reg,
        parameters=['treatment', 'age'],
        show_rope=True,  # Region of Practical Equivalence
        rope_limits=[-0.1, 0.1],
        show_prior=True
    )
    
    stx.io.save(fig, './figures/posterior_distributions.png',
                symlink_from_cwd=True)
    
    # Model comparison
    models = {
        'null': 'outcome ~ 1',
        'treatment_only': 'outcome ~ treatment',
        'full': 'outcome ~ treatment + age + baseline'
    }
    
    model_comparison = stx.stats.bayesian_model_comparison(
        data,
        models,
        criterion='waic',  # or 'loo'
        plot_weights=True
    )
    
    print("\nModel Comparison (WAIC):")
    for model_name, metrics in model_comparison.items():
        print(f"{model_name}: WAIC={metrics['waic']:.1f}, "
              f"Weight={metrics['weight']:.3f}")
    
    return {
        'bayes_t_test': bayes_result,
        'bayes_regression': bayes_reg,
        'model_comparison': model_comparison
    }
'''

print("BAYESIAN ANALYSIS WITH SCITEX:")
print(bayesian_example)

## Summary

The SciTeX Stats Server provides comprehensive statistical analysis capabilities:

### 1. **Enhanced Hypothesis Testing**
   - Automatic test selection based on data properties
   - Comprehensive effect size calculations
   - Assumption checking and diagnostics
   - Multiple comparison corrections

### 2. **Advanced Analysis Methods**
   - Repeated measures and mixed models
   - Time series and longitudinal analysis
   - Change point detection
   - Polynomial contrasts and trends

### 3. **Model Comparison**
   - Multiple regression models
   - Cross-validation
   - Information criteria (AIC, BIC)
   - Diagnostic plots

### 4. **Power Analysis**
   - Post-hoc power calculations
   - A priori sample size determination
   - Power curves and sensitivity analysis
   - Complex design calculations

### 5. **Bayesian Integration**
   - Bayesian hypothesis testing
   - Posterior distributions
   - Model comparison with WAIC/LOO
   - Credible intervals

### 6. **Automated Reporting**
   - Publication-ready tables
   - Comprehensive visualizations
   - Effect size interpretations
   - Assumption validation

## Benefits Over Standard Libraries

- **Unified Interface**: Single API for multiple statistical methods
- **Automatic Selection**: Chooses appropriate tests based on data
- **Comprehensive Output**: All relevant statistics in one call
- **Visualization Integration**: Automatic plot generation
- **Best Practices**: Enforces proper statistical procedures
- **Reproducibility**: Saves all results with metadata