# SciTeX PD Server - Enhanced Pandas Operations

This notebook demonstrates the enhanced pandas functionality provided by the SciTeX PD MCP server.

## Overview

The PD server enhances pandas with:
- Automatic type inference and conversion
- Statistical analysis helpers
- Data cleaning utilities
- Time series enhancements
- Multi-dataframe operations

## Example 1: Enhanced DataFrame Creation and Type Handling

In [None]:
# Standard pandas approach
standard_pandas = '''
import pandas as pd
import numpy as np

# Create dataframe with mixed types
data = {
    'subject_id': ['S001', 'S002', 'S003', 'S004', 'S005'],
    'age': ['25', '30', '28', '35', '32'],  # Strings that should be numeric
    'group': ['Control', 'Treatment', 'Control', 'Treatment', 'Control'],
    'score': [85.5, 92.3, 78.9, 88.2, 91.1],
    'date': ['2024-01-15', '2024-01-16', '2024-01-15', '2024-01-17', '2024-01-16']
}

df = pd.DataFrame(data)

# Manual type conversions
df['age'] = pd.to_numeric(df['age'])
df['date'] = pd.to_datetime(df['date'])
df['group'] = pd.Categorical(df['group'])

# Calculate statistics manually
mean_by_group = df.groupby('group')['score'].mean()
std_by_group = df.groupby('group')['score'].std()
'''

print("STANDARD PANDAS APPROACH:")
print(standard_pandas)

In [None]:
# SciTeX enhanced approach
scitex_pandas = '''
import scitex as stx
import numpy as np

def enhanced_dataframe_operations():
    """Demonstrate enhanced pandas operations."""
    # Create dataframe with automatic type inference
    data = {
        'subject_id': ['S001', 'S002', 'S003', 'S004', 'S005'],
        'age': ['25', '30', '28', '35', '32'],
        'group': ['Control', 'Treatment', 'Control', 'Treatment', 'Control'],
        'score': [85.5, 92.3, 78.9, 88.2, 91.1],
        'date': ['2024-01-15', '2024-01-16', '2024-01-15', '2024-01-17', '2024-01-16'],
        'response_time': [250, 180, 320, 195, 210],
        'accuracy': [0.95, 0.98, 0.87, 0.92, 0.96]
    }
    
    # Force DataFrame with automatic type optimization
    df = stx.pd.force_df(
        data,
        infer_types=True,      # Automatically detect and convert types
        parse_dates=True,      # Find and parse date columns
        categorical_threshold=0.5,  # Convert to categorical if < 50% unique
        optimize_memory=True   # Use optimal dtypes for memory efficiency
    )
    
    # Enhanced statistical summary
    summary = stx.pd.describe_enhanced(
        df,
        include_categorical=True,
        include_missing=True,
        include_outliers=True,
        percentiles=[0.05, 0.25, 0.5, 0.75, 0.95],
        by_group='group'  # Separate statistics by group
    )
    
    # Find p-values for group comparisons
    p_values = stx.pd.find_pval(
        df,
        value_col='score',
        group_col='group',
        test='auto',  # Automatically select appropriate test
        correct_multiple=True,  # Apply multiple comparison correction
        include_effect_size=True
    )
    
    # Advanced groupby with multiple aggregations
    grouped = stx.pd.groupby_enhanced(
        df,
        by='group',
        agg={
            'score': ['mean', 'std', 'sem', 'ci95'],
            'response_time': ['median', 'iqr'],
            'accuracy': ['mean', 'min', 'max']
        },
        add_counts=True,
        add_missing=True
    )
    
    # Pivot with enhanced features
    pivoted = stx.pd.pivot_enhanced(
        df,
        index='subject_id',
        columns='date',
        values='score',
        fill_method='interpolate',  # Smart filling of missing values
        add_margins=True,  # Add row/column totals
        add_statistics=True  # Add mean, std rows
    )
    
    # Time series resampling with multiple methods
    ts_data = stx.pd.resample_enhanced(
        df.set_index('date'),
        rule='D',  # Daily
        agg_numeric='mean',
        agg_categorical='mode',
        interpolate_missing=True,
        add_rolling_stats=True,
        window=3
    )
    
    # Save with enhanced options
    stx.io.save(
        df, 
        './data/experiment_data.csv',
        include_metadata=True,  # Save data types and descriptions
        compress=True,  # Automatic compression
        symlink_from_cwd=True
    )
    
    return df, summary, p_values, grouped
'''

print("SCITEX ENHANCED APPROACH:")
print(scitex_pandas)

## Example 2: Data Cleaning and Preprocessing

In [None]:
# Data cleaning with SciTeX
data_cleaning = '''
import scitex as stx
import numpy as np

def advanced_data_cleaning():
    """Demonstrate advanced data cleaning capabilities."""
    # Create messy data
    messy_data = {
        'id': ['001', '002', '003', '004', '005', '006'],
        'value': [10.5, -999, 12.3, np.nan, 15.2, 1000],  # With missing and outliers
        'category': ['A', 'B', 'A', 'C', None, 'B'],
        'text': ['  Hello  ', 'WORLD', '  test\\n', 'Data', None, 'Analysis '],
        'date': ['2024-01-01', '2024/01/02', '01-03-2024', 'invalid', '2024-01-05', None]
    }
    
    df = stx.pd.DataFrame(messy_data)
    
    # Comprehensive cleaning
    cleaned = stx.pd.clean_dataframe(
        df,
        # Missing value handling
        missing_values=[-999, 'invalid', 'N/A'],  # Custom missing indicators
        fill_strategy={
            'numeric': 'interpolate',  # or 'mean', 'median', 'forward_fill'
            'categorical': 'mode',
            'datetime': 'forward_fill'
        },
        
        # Outlier detection and handling
        outlier_method='iqr',  # or 'zscore', 'isolation_forest'
        outlier_threshold=1.5,
        outlier_action='clip',  # or 'remove', 'nan'
        
        # Text cleaning
        text_operations=[
            'strip',           # Remove whitespace
            'lowercase',       # Convert to lowercase
            'remove_special',  # Remove special characters
            'normalize_whitespace'  # Fix multiple spaces
        ],
        
        # Date parsing
        parse_dates=True,
        date_formats=['%Y-%m-%d', '%Y/%m/%d', '%d-%m-%Y'],  # Try multiple formats
        
        # Duplicate handling
        remove_duplicates=True,
        duplicate_subset=None,  # Check all columns
        keep='first'
    )
    
    # Data validation
    validation_report = stx.pd.validate_dataframe(
        cleaned,
        rules={
            'id': {'type': 'string', 'unique': True, 'not_null': True},
            'value': {'type': 'numeric', 'range': (0, 100)},
            'category': {'type': 'categorical', 'values': ['A', 'B', 'C']},
            'date': {'type': 'datetime', 'after': '2024-01-01'}
        },
        return_clean=True,  # Return only valid rows
        report_format='detailed'  # Get detailed violation report
    )
    
    # Feature engineering
    engineered = stx.pd.feature_engineering(
        cleaned,
        operations=[
            # Temporal features from date
            {'type': 'extract_date_features', 
             'column': 'date',
             'features': ['year', 'month', 'day', 'dayofweek', 'quarter']},
            
            # Categorical encoding
            {'type': 'encode_categorical',
             'column': 'category',
             'method': 'onehot',  # or 'label', 'target', 'ordinal'
             'handle_unknown': 'ignore'},
            
            # Numeric transformations
            {'type': 'transform_numeric',
             'column': 'value',
             'methods': ['log1p', 'sqrt', 'reciprocal'],
             'keep_original': True},
            
            # Interaction features
            {'type': 'create_interactions',
             'columns': ['value', 'category_A', 'category_B'],
             'degree': 2}
        ]
    )
    
    # Data quality report
    quality_report = stx.pd.generate_quality_report(
        original=df,
        cleaned=cleaned,
        include_plots=True,
        save_path='./reports/data_quality_report.html'
    )
    
    print("Data Cleaning Summary:")
    print(f"Original shape: {df.shape}")
    print(f"Cleaned shape: {cleaned.shape}")
    print(f"Valid rows: {validation_report['n_valid']}")
    print(f"Features after engineering: {engineered.shape[1]}")
    
    return cleaned, engineered, quality_report
'''

print("DATA CLEANING WITH SCITEX:")
print(data_cleaning)

## Example 3: Advanced Merging and Joining

In [None]:
# Advanced merging operations
advanced_merging = '''
import scitex as stx

def advanced_merge_operations():
    """Demonstrate enhanced merging and joining capabilities."""
    # Create sample datasets
    subjects = stx.pd.DataFrame({
        'subject_id': ['S001', 'S002', 'S003', 'S004'],
        'age': [25, 30, 28, 35],
        'group': ['Control', 'Treatment', 'Control', 'Treatment']
    })
    
    measurements = stx.pd.DataFrame({
        'subject_id': ['S001', 'S001', 'S002', 'S003', 'S003', 'S005'],
        'timepoint': [1, 2, 1, 1, 2, 1],
        'value': [10.5, 12.3, 11.8, 9.7, 10.2, 15.3]
    })
    
    demographics = stx.pd.DataFrame({
        'id': ['S001', 'S002', 'S004'],  # Note: different column name
        'gender': ['M', 'F', 'M'],
        'education': ['Bachelor', 'Master', 'PhD']
    })
    
    # Smart merge with automatic key detection
    merged = stx.pd.merge_smart(
        [subjects, measurements, demographics],
        keys='auto',  # Automatically detect common keys
        how='outer',  # Preserve all data
        indicator=True,  # Track merge source
        validate='many_to_many',  # Validate merge type
        fuzzy_match=True,  # Handle slight key differences
        fuzzy_threshold=0.9
    )
    
    # Merge with conflict resolution
    merged_conflicts = stx.pd.merge_with_conflicts(
        subjects,
        demographics.rename(columns={'id': 'subject_id'}),
        on='subject_id',
        conflict_resolution={
            'prefer_left': ['age'],  # Keep left DataFrame values
            'prefer_right': ['gender'],  # Keep right DataFrame values
            'combine': ['group', 'education']  # Combine into single column
        },
        track_conflicts=True
    )
    
    # Time-aware merge for longitudinal data
    time_series1 = stx.pd.DataFrame({
        'subject_id': ['S001', 'S001', 'S002'],
        'date': ['2024-01-01', '2024-01-03', '2024-01-02'],
        'measure1': [10, 12, 11]
    })
    time_series1['date'] = stx.pd.to_datetime(time_series1['date'])
    
    time_series2 = stx.pd.DataFrame({
        'subject_id': ['S001', 'S001', 'S002'],
        'date': ['2024-01-01', '2024-01-04', '2024-01-02'],
        'measure2': [20, 25, 22]
    })
    time_series2['date'] = stx.pd.to_datetime(time_series2['date'])
    
    time_merged = stx.pd.merge_asof_enhanced(
        time_series1,
        time_series2,
        on='date',
        by='subject_id',
        tolerance='1D',  # Allow 1 day tolerance
        direction='nearest',  # or 'backward', 'forward'
        interpolate_missing=True,
        add_time_diff=True  # Add column showing time difference
    )
    
    # Multi-level merge for hierarchical data
    hierarchical = stx.pd.merge_hierarchical(
        {'level1': subjects, 'level2': measurements, 'level3': demographics},
        hierarchy=['subject_id', 'timepoint'],
        propagate_down=True,  # Propagate parent values to children
        aggregate_up={'value': 'mean'},  # Aggregate child values to parent
        maintain_structure=True
    )
    
    # Merge with data quality checks
    quality_merge = stx.pd.merge_with_quality(
        subjects,
        measurements,
        on='subject_id',
        quality_checks=[
            'no_duplicate_keys',
            'no_null_keys',
            'consistent_types',
            'value_ranges'
        ],
        report_issues=True,
        clean_before_merge=True
    )
    
    # Save merge report
    merge_report = stx.pd.generate_merge_report(
        original_dfs=[subjects, measurements, demographics],
        merged_df=merged,
        report_path='./reports/merge_analysis.html'
    )
    
    return merged, time_merged, hierarchical
'''

print("ADVANCED MERGING WITH SCITEX:")
print(advanced_merging)

## Example 4: Statistical Operations and Hypothesis Testing

In [None]:
# Statistical operations with pandas
statistical_pandas = '''
import scitex as stx
import numpy as np

def pandas_statistical_operations():
    """Enhanced statistical operations with pandas."""
    # Generate experimental data
    np.random.seed(42)
    n_subjects = 100
    
    data = stx.pd.DataFrame({
        'subject_id': [f'S{i:03d}' for i in range(n_subjects)],
        'group': np.random.choice(['Control', 'Drug_A', 'Drug_B'], n_subjects),
        'baseline': np.random.normal(100, 15, n_subjects),
        'week_1': np.random.normal(105, 18, n_subjects),
        'week_2': np.random.normal(110, 20, n_subjects),
        'week_4': np.random.normal(115, 22, n_subjects),
        'age': np.random.randint(20, 60, n_subjects),
        'gender': np.random.choice(['M', 'F'], n_subjects)
    })
    
    # Add group effects
    data.loc[data['group'] == 'Drug_A', ['week_1', 'week_2', 'week_4']] += 10
    data.loc[data['group'] == 'Drug_B', ['week_1', 'week_2', 'week_4']] += 15
    
    # Comprehensive statistical summary
    stats_summary = stx.pd.statistical_summary(
        data,
        value_cols=['baseline', 'week_1', 'week_2', 'week_4'],
        group_cols=['group', 'gender'],
        statistics=[
            'mean', 'std', 'sem', 'median', 'iqr',
            'ci95', 'skew', 'kurtosis', 'shapiro_p'
        ],
        add_change_scores=True,  # Calculate change from baseline
        add_effect_sizes=True    # Cohen's d, Hedge's g
    )
    
    # Multiple hypothesis testing
    hypothesis_results = stx.pd.hypothesis_testing(
        data,
        hypotheses=[
            # Between-group comparisons
            {'type': 'between', 'groups': ['Control', 'Drug_A'], 
             'variable': 'week_4', 'test': 'auto'},
            
            # Within-subject comparisons
            {'type': 'within', 'timepoints': ['baseline', 'week_4'],
             'by_group': True, 'test': 'paired_t'},
            
            # Correlation analysis
            {'type': 'correlation', 'x': 'age', 'y': 'week_4',
             'method': 'pearson', 'by_group': True},
            
            # ANOVA
            {'type': 'anova', 'factor': 'group', 'dependent': 'week_4',
             'covariates': ['age', 'baseline'], 'post_hoc': 'tukey'}
        ],
        alpha=0.05,
        correction='fdr_bh',  # False discovery rate correction
        report_format='comprehensive'
    )
    
    # Longitudinal analysis
    long_format = stx.pd.melt_enhanced(
        data,
        id_vars=['subject_id', 'group', 'age', 'gender'],
        value_vars=['baseline', 'week_1', 'week_2', 'week_4'],
        var_name='timepoint',
        value_name='measurement',
        add_numeric_time=True,  # Convert timepoint to numeric
        time_mapping={'baseline': 0, 'week_1': 1, 'week_2': 2, 'week_4': 4}
    )
    
    # Mixed effects model preparation
    model_data = stx.pd.prepare_for_mixed_model(
        long_format,
        fixed_effects=['timepoint', 'group', 'age'],
        random_effects=['subject_id'],
        interactions=[('timepoint', 'group')],
        center_predictors=True,
        scale_predictors=True,
        create_dummy_coding=True
    )
    
    # Effect size calculations
    effect_sizes = stx.pd.calculate_effect_sizes(
        data,
        group_col='group',
        value_cols=['week_4'],
        baseline_col='baseline',
        effect_types=['cohens_d', 'hedges_g', 'glass_delta', 'odds_ratio'],
        confidence_level=0.95,
        bootstrap_ci=True,
        n_bootstrap=1000
    )
    
    # Power analysis for future studies
    power_analysis = stx.pd.power_analysis_from_data(
        data,
        primary_outcome='week_4',
        group_col='group',
        target_power=0.80,
        alpha=0.05,
        include_dropouts=True,
        dropout_rate=0.15
    )
    
    # Generate statistical report
    report = stx.pd.generate_statistical_report(
        data=data,
        stats_summary=stats_summary,
        hypothesis_results=hypothesis_results,
        effect_sizes=effect_sizes,
        output_format='latex',  # or 'html', 'markdown'
        include_figures=True,
        save_path='./reports/statistical_analysis.tex'
    )
    
    return stats_summary, hypothesis_results, effect_sizes
'''

print("STATISTICAL OPERATIONS WITH PANDAS:")
print(statistical_pandas)

## Example 5: Time Series and Panel Data

In [None]:
# Time series operations
time_series_pandas = '''
import scitex as stx
import numpy as np

def time_series_panel_operations():
    """Advanced time series and panel data operations."""
    # Generate time series data
    dates = stx.pd.date_range('2024-01-01', periods=365, freq='D')
    n_series = 5
    
    # Multi-variate time series
    data = stx.pd.DataFrame({
        'date': np.tile(dates, n_series),
        'series_id': np.repeat([f'Series_{i}' for i in range(n_series)], len(dates)),
        'value': np.concatenate([
            100 + 10*np.sin(2*np.pi*np.arange(len(dates))/365) + 
            5*np.random.randn(len(dates)) + 20*i
            for i in range(n_series)
        ]),
        'temperature': np.tile(20 + 10*np.sin(2*np.pi*np.arange(len(dates))/365) + 
                              2*np.random.randn(len(dates)), n_series),
        'event': np.random.choice([0, 1], size=len(dates)*n_series, p=[0.95, 0.05])
    })
    
    # Convert to panel data structure
    panel = stx.pd.to_panel(
        data,
        index=['series_id', 'date'],
        verify_integrity=True,
        sort_index=True
    )
    
    # Time series decomposition for each series
    decomposition = stx.pd.time_series_decompose(
        panel,
        value_col='value',
        method='stl',  # or 'x11', 'seats'
        period='auto',  # Automatically detect seasonality
        robust=True,
        by_group='series_id'
    )
    
    # Advanced rolling operations
    rolling_stats = stx.pd.rolling_enhanced(
        panel,
        window='30D',  # 30-day window
        operations={
            'value': ['mean', 'std', 'skew', 'zscore'],
            'temperature': ['mean', 'correlation:value']
        },
        min_periods=15,
        center=True,
        by_group='series_id'
    )
    
    # Lag and lead features
    lagged = stx.pd.create_lags_and_leads(
        panel,
        columns=['value', 'temperature'],
        lags=[1, 7, 30],  # Previous day, week, month
        leads=[1, 7],     # Next day, week
        by_group='series_id',
        fill_method='interpolate'
    )
    
    # Change point detection
    change_points = stx.pd.detect_change_points(
        panel,
        value_col='value',
        method='pelt',  # or 'binary_segmentation', 'window_sliding'
        penalty='bic',
        min_segment_length=30,
        by_group='series_id'
    )
    
    # Anomaly detection
    anomalies = stx.pd.detect_anomalies(
        panel,
        value_col='value',
        methods=['isolation_forest', 'local_outlier_factor', 'seasonal_decompose'],
        contamination='auto',  # Automatically estimate contamination
        ensemble_method='voting',  # Combine multiple methods
        by_group='series_id'
    )
    
    # Cross-correlation analysis
    cross_corr = stx.pd.cross_correlation(
        panel,
        x='value',
        y='temperature',
        max_lag=30,
        method='pearson',
        by_group='series_id',
        plot=True
    )
    
    # Granger causality testing
    causality = stx.pd.granger_causality(
        panel,
        cause='temperature',
        effect='value',
        max_lag=10,
        by_group='series_id',
        include_instantaneous=True
    )
    
    # Forecasting preparation
    forecast_data = stx.pd.prepare_for_forecasting(
        panel,
        target='value',
        features=['temperature', 'event'],
        horizon=30,  # 30-day forecast
        train_size=0.8,
        gap=0,  # No gap between train and test
        include_calendar_features=True,
        include_fourier_features=True,
        by_group='series_id'
    )
    
    # Create time series visualization
    fig = stx.pd.plot_time_series_panel(
        panel,
        value_col='value',
        group_col='series_id',
        show_decomposition=True,
        show_anomalies=anomalies,
        show_change_points=change_points,
        facet_scales='free_y',
        figsize=(15, 10)
    )
    
    stx.io.save(fig, './figures/time_series_analysis.png',
                symlink_from_cwd=True)
    
    return panel, decomposition, anomalies, forecast_data
'''

print("TIME SERIES AND PANEL DATA OPERATIONS:")
print(time_series_pandas)

## Example 6: DataFrame Export and Reporting

In [None]:
# Enhanced export and reporting
export_reporting = '''
import scitex as stx

def dataframe_export_reporting():
    """Advanced DataFrame export and reporting capabilities."""
    # Create sample analysis results
    results = stx.pd.DataFrame({
        'Model': ['Linear', 'Random Forest', 'XGBoost', 'Neural Net'],
        'Accuracy': [0.85, 0.92, 0.94, 0.93],
        'Precision': [0.83, 0.91, 0.93, 0.92],
        'Recall': [0.87, 0.93, 0.95, 0.94],
        'F1_Score': [0.85, 0.92, 0.94, 0.93],
        'Training_Time': [0.5, 12.3, 8.7, 45.2],
        'Parameters': [10, 500, 300, 1000]
    })
    
    # Export to multiple formats with formatting
    stx.pd.export_formatted(
        results,
        base_path='./results/model_comparison',
        formats=['excel', 'latex', 'html', 'markdown'],
        
        # Excel-specific options
        excel_options={
            'sheet_name': 'Model Results',
            'index': False,
            'freeze_panes': (1, 1),
            'conditional_format': {
                'Accuracy': {'type': 'data_bar', 'color': 'green'},
                'Training_Time': {'type': 'color_scale', 'palette': 'RdYlGn_r'}
            },
            'column_widths': 'auto',
            'add_chart': True
        },
        
        # LaTeX-specific options
        latex_options={
            'caption': 'Model Performance Comparison',
            'label': 'tab:model_comparison',
            'column_format': 'l|rrrr|rr',
            'bold_max': True,  # Bold best values
            'escape': False,
            'position': 'htbp'
        },
        
        # HTML-specific options
        html_options={
            'table_id': 'model-comparison',
            'classes': 'table table-striped table-hover',
            'include_css': True,
            'sortable': True,
            'searchable': True
        },
        
        # Markdown-specific options
        markdown_options={
            'tablefmt': 'github',
            'floatfmt': '.3f',
            'numalign': 'right'
        }
    )
    
    # Create comprehensive report
    report = stx.pd.create_analysis_report(
        title='Model Comparison Analysis',
        sections=[
            {
                'title': 'Executive Summary',
                'type': 'text',
                'content': 'XGBoost showed the best overall performance...'
            },
            {
                'title': 'Performance Metrics',
                'type': 'dataframe',
                'data': results,
                'styling': {
                    'highlight_max': ['Accuracy', 'F1_Score'],
                    'highlight_min': ['Training_Time'],
                    'precision': 3
                }
            },
            {
                'title': 'Performance Visualization',
                'type': 'plot',
                'plot_type': 'radar',
                'data': results[['Model', 'Accuracy', 'Precision', 'Recall', 'F1_Score']]
            },
            {
                'title': 'Statistical Comparison',
                'type': 'statistics',
                'tests': [
                    {'type': 'friedman', 'metrics': ['Accuracy', 'F1_Score']},
                    {'type': 'nemenyi', 'metric': 'F1_Score'}
                ]
            }
        ],
        output_format='html',  # or 'pdf', 'docx'
        template='academic',   # or 'business', 'technical'
        include_code=True,
        include_data=True,
        save_path='./reports/model_comparison_report.html'
    )
    
    # Create interactive dashboard
    dashboard = stx.pd.create_dashboard(
        data={'results': results},
        layout=[
            {'type': 'metric_cards', 'metrics': ['Best Accuracy', 'Fastest Model']},
            {'type': 'comparison_table', 'data': 'results'},
            {'type': 'bar_chart', 'x': 'Model', 'y': ['Accuracy', 'F1_Score']},
            {'type': 'scatter', 'x': 'Training_Time', 'y': 'Accuracy', 'size': 'Parameters'}
        ],
        filters=['Model'],
        export_options=['png', 'pdf', 'pptx'],
        save_path='./dashboards/model_comparison.html'
    )
    
    # Generate presentation slides
    slides = stx.pd.create_presentation(
        title='Model Performance Analysis',
        data=results,
        slide_templates=[
            'title_slide',
            'overview_table',
            'comparison_chart',
            'winner_announcement',
            'recommendations'
        ],
        theme='corporate',
        save_path='./presentations/model_comparison.pptx'
    )
    
    return report, dashboard
'''

print("DATAFRAME EXPORT AND REPORTING:")
print(export_reporting)

## Summary

The SciTeX PD Server provides comprehensive enhancements to pandas:

### 1. **Smart Data Handling**
   - Automatic type inference and optimization
   - Intelligent missing value handling
   - Outlier detection and treatment
   - Data validation with rules

### 2. **Enhanced Operations**
   - Smart merging with conflict resolution
   - Time-aware joins
   - Hierarchical data support
   - Advanced groupby operations

### 3. **Statistical Integration**
   - Built-in hypothesis testing
   - Effect size calculations
   - Power analysis
   - Multiple comparison corrections

### 4. **Time Series Features**
   - Panel data structures
   - Decomposition and seasonality
   - Anomaly detection
   - Forecasting preparation

### 5. **Export and Reporting**
   - Multi-format export with styling
   - Automated report generation
   - Interactive dashboards
   - Presentation creation

### 6. **Data Quality**
   - Comprehensive cleaning pipelines
   - Validation and quality reports
   - Feature engineering
   - Data profiling

## Key Benefits

- **Reduced Code**: Common operations require fewer lines
- **Better Defaults**: Intelligent parameter selection
- **Integrated Workflow**: Statistical analysis built-in
- **Production Ready**: Validation and quality checks
- **Reproducible**: All operations tracked and logged