# Data Analysis with ostruct in Jupyter

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yaniv-golan/ostruct/blob/main/examples/data-science/notebooks/ostruct_data_analysis.ipynb)

This notebook demonstrates how to use ostruct for data analysis within Jupyter notebooks, combining the power of AI-driven analysis with interactive data science workflows.

## What You'll Learn

- 📊 Run ostruct analysis from Jupyter cells
- 🔄 Integrate ostruct results with pandas workflows
- 📈 Generate visualizations using AI + Code Interpreter
- 🚀 Build automated analysis pipelines
- 💡 Best practices for production data science

## Setup and Installation

First, let's install ostruct and set up our environment:

In [None]:
# Install ostruct (run this once)
# NOTE: Using release candidate v1.6.0rc1 to test Code Interpreter file download fix
# TODO: Revert to stable version after testing: !pip install ostruct-cli
!pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ ostruct-cli==1.6.0rc1

# Import required libraries
import json
import subprocess
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from IPython.display import Image, display, HTML

# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("✅ Setup complete!")

In [None]:
# Set up OpenAI API key (required)
import os

# Option 1: Set via environment variable
# os.environ['OPENAI_API_KEY'] = 'your-api-key-here'

# Option 2: Use getpass for secure input
import getpass
if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('Enter your OpenAI API key: ')

print("🔑 API key configured")

## Core ostruct Integration Functions

Let's create helper functions for running ostruct from Jupyter:

In [None]:
def run_ostruct_analysis(template_file, schema_file, data_file=None, model='gpt-4o-mini', 
                        enable_tools=None, web_query=None, output_file=None):
    """
    Run ostruct analysis from Jupyter and return results.
    
    Args:
        template_file: Path to Jinja2 template
        schema_file: Path to JSON schema
        data_file: Optional data file path
        model: OpenAI model to use
        enable_tools: List of tools to enable ['code-interpreter', 'web-search', 'file-search']
        web_query: Web search query if web-search enabled
        output_file: Optional output file path
    
    Returns:
        Dictionary with analysis results
    """
    cmd = ['ostruct', 'run', template_file, schema_file, '--model', model]
    
    # Add data file if provided
    if data_file:
        cmd.extend(['--file', f'ci:data', data_file])
    
    # Enable tools
    if enable_tools:
        for tool in enable_tools:
            cmd.extend(['--enable-tool', tool])
    
    # Add web query
    if web_query:
        cmd.extend(['--web-query', web_query])
    
    # Add output file
    if output_file:
        cmd.extend(['--output-file', output_file])
    
    print(f"🚀 Running: {' '.join(cmd)}")
    
    try:
        result = subprocess.run(cmd, capture_output=True, text=True, check=True)
        
        if output_file:
            # Read from output file
            with open(output_file, 'r') as f:
                return json.load(f)
        else:
            # Parse stdout
            return json.loads(result.stdout)
            
    except subprocess.CalledProcessError as e:
        print(f"❌ Error: {e}")
        print(f"Stderr: {e.stderr}")
        raise
    except json.JSONDecodeError as e:
        print(f"❌ JSON parsing error: {e}")
        print(f"Raw output: {result.stdout}")
        raise

def display_analysis_summary(results):
    """
    Display a formatted summary of analysis results.
    """
    if 'summary' in results:
        summary = results['summary']
        print("📊 ANALYSIS SUMMARY")
        print("=" * 40)
        for key, value in summary.items():
            if isinstance(value, (int, float)):
                if 'sales' in key.lower() or 'revenue' in key.lower():
                    print(f"{key.replace('_', ' ').title():<20} ${value:,.2f}")
                else:
                    print(f"{key.replace('_', ' ').title():<20} {value:,}")
            else:
                print(f"{key.replace('_', ' ').title():<20} {value}")
        print()
    
    # Display chart info if available
    if 'chart_info' in results:
        chart = results['chart_info']
        print(f"📈 Generated Chart: {chart.get('filename', 'N/A')}")
        print(f"   Description: {chart.get('description', 'N/A')}")
        
        # Try to display the chart if it exists
        chart_path = Path('downloads') / chart.get('filename', '')
        if chart_path.exists():
            display(Image(str(chart_path)))

print("✅ Helper functions defined")

## Example 1: Basic Data Analysis

Let's start with a simple CSV analysis using the data science template:

In [None]:
# Create sample data for analysis
sample_data = {
    'date': ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'],
    'product': ['Widget A', 'Widget B', 'Widget A', 'Widget C', 'Widget B'],
    'quantity': [10, 15, 8, 12, 20],
    'price': [25.50, 30.00, 25.50, 45.00, 30.00]
}

df = pd.DataFrame(sample_data)
df['revenue'] = df['quantity'] * df['price']

# Save to CSV for ostruct analysis
df.to_csv('sample_sales.csv', index=False)

print("📋 Sample Data Created:")
display(df)

# Create basic template and schema for this demo
# (In practice, you'd have these files prepared)

basic_template = """
You are a data analyst. Analyze the provided sales data and generate insights.

## Sales Data Analysis

**Data to analyze:**
{{ data.content }}

## Analysis Requirements:
1. **Summary Statistics**: Calculate total sales, average price, transaction count
2. **Product Performance**: Identify top-performing products
3. **Trends**: Analyze sales patterns and trends
4. **Data Quality**: Assess data completeness and note any issues
5. **Visualization**: Create a chart showing sales by product

## Output Format:
Provide analysis in the structured JSON format specified in the schema.
"""

with open('basic_template.j2', 'w') as f:
    f.write(basic_template)

basic_schema = {
    "type": "object",
    "properties": {
        "summary": {
            "type": "object",
            "properties": {
                "total_sales": {"type": "number"},
                "average_price": {"type": "number"},
                "product_count": {"type": "integer"},
                "total_transactions": {"type": "integer"}
            }
        },
        "sales_by_product": {
            "type": "object",
            "description": "Sales totals by product name"
        },
        "chart_info": {
            "type": "object",
            "properties": {
                "filename": {"type": "string"},
                "description": {"type": "string"},
                "chart_type": {"type": "string"}
            }
        },
        "data_quality": {
            "type": "object",
            "properties": {
                "rows_processed": {"type": "integer"},
                "missing_values": {"type": "integer"},
                "data_issues": {"type": "array", "items": {"type": "string"}}
            }
        }
    },
    "required": ["summary", "chart_info", "data_quality"]
}

with open('basic_schema.json', 'w') as f:
    json.dump(basic_schema, f, indent=2)

print("✅ Template and schema files created for demo")

In [None]:
# Run ostruct analysis with better error handling and debugging
print("🚀 Starting ostruct analysis...")

try:
    # First, let's verify our files exist
    print("📋 Checking required files:")
    required_files = ['basic_template.j2', 'basic_schema.json', 'sample_sales.csv']
    for file in required_files:
        if Path(file).exists():
            print(f"  ✅ {file} ({Path(file).stat().st_size} bytes)")
        else:
            print(f"  ❌ {file} - MISSING!")
            
    print("\n🔑 Checking API key...")
    if 'OPENAI_API_KEY' in os.environ:
        key = os.environ['OPENAI_API_KEY']
        print(f"  ✅ API key set ({key[:10]}...{key[-4:]})")
    else:
        print("  ❌ OPENAI_API_KEY not found!")
        
    print("\n⏰ Running analysis (this may take 30-60 seconds)...")
    
    # Run with timeout and verbose output
    import subprocess
    import signal
    
    cmd = [
        'ostruct', 'run', 
        'basic_template.j2', 
        'basic_schema.json',
        '--file', 'ci:data', 'sample_sales.csv',
        '--model', 'gpt-4o-mini',
        '--enable-tool', 'code-interpreter',
        '--output-file', 'analysis_results.json',
        '--verbose'  # Add verbose output
    ]
    
    print(f"🔧 Command: {' '.join(cmd)}")
    
    # Run with timeout
    try:
        result = subprocess.run(
            cmd, 
            capture_output=True, 
            text=True, 
            timeout=120,  # 2 minute timeout
            check=True
        )
        
        print("✅ Command completed successfully!")
        print(f"📤 Return code: {result.returncode}")
        
        if result.stdout:
            print("📄 Output:")
            print(result.stdout[:1000])  # First 1000 chars
            
        # Load results
        if Path('analysis_results.json').exists():
            with open('analysis_results.json', 'r') as f:
                results = json.load(f)
            print("✅ Results loaded successfully!")
            display_analysis_summary(results)
        else:
            print("❌ Output file not created")
            
    except subprocess.TimeoutExpired:
        print("⏰ Command timed out after 2 minutes")
        print("This might indicate:")
        print("  - API is slow to respond")
        print("  - Network connectivity issues")
        print("  - API key problems")
        
    except subprocess.CalledProcessError as e:
        print(f"❌ Command failed with exit code {e.returncode}")
        print(f"📄 Error output:")
        print(e.stderr)
        
except Exception as e:
    print(f"❌ Unexpected error: {e}")
    import traceback
    traceback.print_exc()

print("\n📁 Current directory contents:")
for item in sorted(Path('.').iterdir()):
    if item.is_file():
        print(f"  📄 {item.name} ({item.stat().st_size} bytes)")
    else:
        print(f"  📁 {item.name}/")

In [ ]:
# Load and display the analysis results
print("📊 Loading analysis results...")

try:
    # Load the results file that was created
    with open('analysis_results.json', 'r') as f:
        results = json.load(f)
    
    print("✅ Results loaded successfully!")
    
    # Display the full results first
    print("\n📋 COMPLETE ANALYSIS RESULTS:")
    print(json.dumps(results, indent=2))
    
    # Use our display function
    display_analysis_summary(results)
    
    # Look for any generated chart files with the RC fix
    print("\n🔍 Looking for generated charts (should work with v1.6.0rc1)...")
    
    # Check for image files in current directory and downloads
    image_extensions = ['.png', '.jpg', '.jpeg', '.svg', '.gif']
    
    # Search locations where charts might be saved
    search_locations = [
        Path('.'),  # Current directory
        Path('downloads'),  # Default download location
    ]
    
    all_found_images = []
    
    for location in search_locations:
        if location.exists() and location.is_dir():
            print(f"\n📁 Checking {location}:")
            try:
                items = list(location.iterdir())
                image_files = [f for f in items if f.suffix.lower() in image_extensions]
                
                if image_files:
                    print(f"  🎯 Found {len(image_files)} image(s):")
                    for img in image_files:
                        print(f"    📊 {img.name} ({img.stat().st_size} bytes)")
                        all_found_images.append(img)
                else:
                    all_files = [f for f in items if f.is_file()]
                    print(f"  📄 {len(all_files)} files, no images")
                        
            except Exception as e:
                print(f"  ❌ Error: {e}")
        else:
            print(f"📁 {location}: doesn't exist")
    
    # Display all found images
    if all_found_images:
        print(f"\n🎨 Displaying {len(all_found_images)} chart(s):")
        for img in all_found_images:
            print(f"\n📊 Chart: {img}")
            try:
                display(Image(str(img)))
                print(f"✅ Successfully displayed {img.name}")
            except Exception as e:
                print(f"❌ Could not display {img}: {e}")
    else:
        print("\n📈 Creating backup chart with matplotlib...")
        
        # Create a fallback chart from the data
        import pandas as pd
        import matplotlib.pyplot as plt
        
        df = pd.read_csv('sample_sales.csv')
        
        # Create sales by product chart
        sales_by_product = df.groupby('product')['revenue'].sum().sort_values(ascending=False)
        
        plt.figure(figsize=(10, 6))
        bars = plt.bar(sales_by_product.index, sales_by_product.values, 
                      color=['#1f77b4', '#ff7f0e', '#2ca02c'])
        plt.title('Sales by Product', fontsize=16, fontweight='bold')
        plt.xlabel('Product', fontsize=12)
        plt.ylabel('Revenue ($)', fontsize=12)
        plt.xticks(rotation=45)
        
        # Add value labels on bars
        for bar, value in zip(bars, sales_by_product.values):
            plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 5, 
                    f'${value:.0f}', ha='center', va='bottom')
        
        plt.tight_layout()
        plt.grid(axis='y', alpha=0.3)
        plt.show()
        
        print("✅ Backup chart displayed!")

except Exception as e:
    print(f"❌ Error: {e}")
    import traceback
    traceback.print_exc()

In [None]:
# Create a more complex template for multi-tool analysis
multi_tool_template = """
You are a senior data analyst. Perform comprehensive analysis combining internal data with external market research.

# Multi-Source Analysis Task

## Internal Data Analysis
{{ data.content }}

## Market Context (Web Research)
{% if web_search_results %}
{{ web_search_results }}
{% else %}
No web research data available. Focus on internal data analysis.
{% endif %}

## Analysis Requirements
1. **Internal Performance**: Analyze sales trends, top products, growth patterns
2. **Market Context**: Compare with industry trends and competitor performance  
3. **Strategic Insights**: Identify opportunities and risks
4. **Recommendations**: Provide 3-5 actionable recommendations
5. **Visualization**: Create a professional chart showing key insights

## Output Format
Provide comprehensive analysis in the specified JSON structure with business insights and actionable recommendations.
"""

# Save template
with open('multi_tool_template.j2', 'w') as f:
    f.write(multi_tool_template)

print("✅ Multi-tool template created")

In [None]:
# Create enhanced schema for multi-tool analysis
enhanced_schema = {
    "type": "object",
    "properties": {
        "internal_analysis": {
            "type": "object",
            "properties": {
                "total_revenue": {"type": "number"},
                "top_product": {"type": "string"},
                "growth_trend": {"type": "string"},
                "key_metrics": {"type": "object"}
            }
        },
        "market_insights": {
            "type": "object",
            "properties": {
                "industry_trends": {"type": "array", "items": {"type": "string"}},
                "competitive_position": {"type": "string"},
                "market_opportunities": {"type": "array", "items": {"type": "string"}}
            }
        },
        "recommendations": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "recommendation": {"type": "string"},
                    "priority": {"type": "string", "enum": ["high", "medium", "low"]},
                    "expected_impact": {"type": "string"}
                }
            }
        },
        "visualization": {
            "type": "object",
            "properties": {
                "chart_type": {"type": "string"},
                "filename": {"type": "string"},
                "insights": {"type": "string"}
            }
        }
    },
    "required": ["internal_analysis", "recommendations"]
}

# Save schema
with open('enhanced_schema.json', 'w') as f:
    json.dump(enhanced_schema, f, indent=2)

print("✅ Enhanced schema created")

In [None]:
# Run multi-tool analysis
enhanced_results = run_ostruct_analysis(
    template_file='multi_tool_template.j2',
    schema_file='enhanced_schema.json', 
    data_file='sample_sales.csv',
    model='gpt-4o',
    enable_tools=['code-interpreter', 'web-search'],
    web_query='widget sales industry trends 2024 market analysis',
    output_file='enhanced_results.json'
)

print("✅ Enhanced analysis complete!")
print(json.dumps(enhanced_results, indent=2))

## Example 3: Interactive Data Science Workflow

Let's create an interactive workflow that combines pandas analysis with AI insights:

In [None]:
def interactive_analysis_workflow(dataframe, analysis_question):
    """
    Interactive workflow combining pandas analysis with AI insights.
    """
    print(f"🔍 Analyzing: {analysis_question}")
    print("=" * 50)
    
    # Step 1: Basic pandas analysis
    print("📊 Step 1: Basic Statistics")
    print(dataframe.describe())
    print()
    
    # Step 2: Save data for AI analysis
    temp_file = 'temp_analysis.csv'
    dataframe.to_csv(temp_file, index=False)
    
    # Step 3: Create dynamic template based on question
    dynamic_template = f"""
You are a data scientist. Answer this specific question about the provided dataset:

**Question**: {analysis_question}

**Dataset**: Analyze the provided CSV data to answer the question.

## Analysis Requirements:
1. Load and examine the data thoroughly
2. Perform relevant statistical analysis to answer the question
3. Create appropriate visualizations
4. Provide clear, data-driven insights
5. Include confidence levels and any caveats

Focus specifically on answering: "{analysis_question}"

Provide your analysis in the structured format below.
"""
    
    with open('dynamic_template.j2', 'w') as f:
        f.write(dynamic_template)
    
    # Dynamic schema
    dynamic_schema = {
        "type": "object",
        "properties": {
            "question": {"type": "string"},
            "answer": {"type": "string"},
            "supporting_evidence": {"type": "array", "items": {"type": "string"}},
            "confidence_level": {"type": "string", "enum": ["high", "medium", "low"]},
            "key_insights": {"type": "array", "items": {"type": "string"}},
            "chart_info": {
                "type": "object",
                "properties": {
                    "filename": {"type": "string"},
                    "description": {"type": "string"}
                }
            }
        },
        "required": ["question", "answer", "confidence_level"]
    }
    
    with open('dynamic_schema.json', 'w') as f:
        json.dump(dynamic_schema, f, indent=2)
    
    # Step 4: Run AI analysis
    print("🤖 Step 2: AI Analysis")
    ai_results = run_ostruct_analysis(
        template_file='dynamic_template.j2',
        schema_file='dynamic_schema.json',
        data_file=temp_file,
        model='gpt-4o',
        enable_tools=['code-interpreter']
    )
    
    # Step 5: Display results
    print(f"\n🎯 Answer: {ai_results['answer']}")
    print(f"🔒 Confidence: {ai_results['confidence_level']}")
    
    if 'key_insights' in ai_results:
        print("\n💡 Key Insights:")
        for insight in ai_results['key_insights']:
            print(f"  • {insight}")
    
    # Clean up
    Path(temp_file).unlink()
    
    return ai_results

print("✅ Interactive workflow function defined")

In [None]:
# Test the interactive workflow
question = "Which product has the highest profit margin and what factors contribute to its success?"

workflow_results = interactive_analysis_workflow(df, question)

## Example 4: Batch Processing Multiple Datasets

For production scenarios, you often need to analyze multiple datasets:

In [None]:
def batch_analysis(file_list, template_file, schema_file, output_dir='batch_results'):
    """
    Analyze multiple datasets in batch using ostruct.
    """
    Path(output_dir).mkdir(exist_ok=True)
    batch_results = {}
    
    for i, file_path in enumerate(file_list):
        print(f"\n📊 Processing {i+1}/{len(file_list)}: {file_path}")
        
        try:
            output_file = Path(output_dir) / f"{Path(file_path).stem}_analysis.json"
            
            results = run_ostruct_analysis(
                template_file=template_file,
                schema_file=schema_file,
                data_file=file_path,
                model='gpt-4o-mini',
                enable_tools=['code-interpreter'],
                output_file=str(output_file)
            )
            
            batch_results[file_path] = {
                'status': 'success',
                'results': results,
                'output_file': str(output_file)
            }
            
            print(f"  ✅ Success: {output_file}")
            
        except Exception as e:
            print(f"  ❌ Error: {e}")
            batch_results[file_path] = {
                'status': 'error',
                'error': str(e)
            }
    
    return batch_results

# Create multiple sample datasets
datasets = []
for month in ['Jan', 'Feb', 'Mar']:
    monthly_data = df.copy()
    monthly_data['month'] = month
    monthly_data['quantity'] = monthly_data['quantity'] * (1 + 0.1 * len(datasets))  # Simulate growth
    
    filename = f'sales_{month.lower()}.csv'
    monthly_data.to_csv(filename, index=False)
    datasets.append(filename)

print(f"✅ Created {len(datasets)} datasets for batch processing")

In [None]:
# Run batch analysis
batch_results = batch_analysis(
    file_list=datasets,
    template_file='../analysis/templates/main.j2',
    schema_file='../analysis/schemas/main.json'
)

# Summary of batch results
successful = sum(1 for r in batch_results.values() if r['status'] == 'success')
print(f"\n📈 Batch Analysis Complete: {successful}/{len(datasets)} successful")

# Display summary of results
for file_path, result in batch_results.items():
    if result['status'] == 'success':
        summary = result['results'].get('summary', {})
        total_sales = summary.get('total_sales', 0)
        print(f"  {Path(file_path).stem}: ${total_sales:,.2f} total sales")

## Example 5: Real-time Analysis Dashboard

Create a simple dashboard that updates with new analysis:

In [None]:
from IPython.display import clear_output
import time

def create_analysis_dashboard(data_files, refresh_interval=30):
    """
    Create a simple analysis dashboard that refreshes periodically.
    """
    def update_dashboard():
        clear_output(wait=True)
        
        print("📊 OSTRUCT ANALYSIS DASHBOARD")
        print("=" * 50)
        print(f"Last Updated: {time.strftime('%Y-%m-%d %H:%M:%S')}")
        print()
        
        total_revenue = 0
        total_transactions = 0
        
        for file_path in data_files:
            try:
                # Quick analysis for dashboard
                results = run_ostruct_analysis(
                    template_file='../analysis/templates/main.j2',
                    schema_file='../analysis/schemas/main.json',
                    data_file=file_path,
                    model='gpt-4o-mini',
                    enable_tools=['code-interpreter']
                )
                
                summary = results.get('summary', {})
                revenue = summary.get('total_sales', 0)
                transactions = summary.get('total_transactions', 0)
                
                total_revenue += revenue
                total_transactions += transactions
                
                print(f"📈 {Path(file_path).stem.upper()}:")
                print(f"   Revenue: ${revenue:,.2f}")
                print(f"   Transactions: {transactions:,}")
                print()
                
            except Exception as e:
                print(f"❌ Error analyzing {file_path}: {e}")
        
        print("🎯 TOTALS:")
        print(f"   Total Revenue: ${total_revenue:,.2f}")
        print(f"   Total Transactions: {total_transactions:,}")
        print(f"   Average per Transaction: ${total_revenue/total_transactions if total_transactions > 0 else 0:.2f}")
        
        print(f"\n⏰ Next refresh in {refresh_interval} seconds...")
    
    # Run initial update
    update_dashboard()
    
    return update_dashboard

# Create dashboard (run once for demo)
dashboard = create_analysis_dashboard(datasets[:2])  # Use first 2 datasets for demo
print("✅ Dashboard created (static version for demo)")

## Best Practices and Tips

Here are some best practices for using ostruct in Jupyter notebooks:

In [None]:
# Best Practices Demo

def data_science_best_practices():
    """
    Demonstrate best practices for ostruct in data science workflows.
    """
    print("🎯 OSTRUCT DATA SCIENCE BEST PRACTICES")
    print("=" * 50)
    
    practices = [
        {
            "category": "🔧 Performance Optimization",
            "tips": [
                "Use gpt-4o-mini for exploratory analysis, gpt-4o for complex insights",
                "Cache results using --output-file to avoid re-running expensive analyses",
                "Sample large datasets for development, full data for production",
                "Use --dry-run for template validation before API calls"
            ]
        },
        {
            "category": "💰 Cost Management",
            "tips": [
                "Start with cheaper models and upgrade only when needed",
                "Use batch processing to reduce per-request overhead",
                "Monitor token usage with verbose output",
                "Reuse schemas across similar analyses"
            ]
        },
        {
            "category": "🛡️ Reliability & Security",
            "tips": [
                "Always validate schemas before production use",
                "Handle API errors gracefully with try/catch blocks",
                "Don't commit API keys to notebooks",
                "Use environment variables for configuration"
            ]
        },
        {
            "category": "📊 Analysis Quality",
            "tips": [
                "Design schemas that capture business value, not just technical metrics",
                "Include confidence levels and caveats in your schemas",
                "Combine AI insights with traditional statistical validation",
                "Document assumptions and limitations in templates"
            ]
        }
    ]
    
    for practice in practices:
        print(f"\n{practice['category']}")
        for tip in practice['tips']:
            print(f"  ✓ {tip}")
    
    print("\n🚀 Ready to build amazing data science workflows with ostruct!")

data_science_best_practices()

## Example 6: Advanced Workflows from Data Science Guide

Let's implement the complete workflows from the Data Science Integration Guide, including Financial Analysis, Research Synthesis, Business Intelligence, and Market Research examples.

In [None]:
# Financial Analysis Workflow Example

def create_financial_analysis_example():
    """Create complete financial analysis workflow from integration guide."""
    
    # Create sample financial data
    financial_data = {
        'date': pd.date_range('2024-01-01', periods=12, freq='M'),
        'revenue': [1500000, 1620000, 1580000, 1750000, 1690000, 1820000,
                   1950000, 1880000, 2100000, 2050000, 2200000, 2350000],
        'expenses': [1200000, 1250000, 1180000, 1300000, 1220000, 1350000,
                    1400000, 1380000, 1450000, 1420000, 1500000, 1550000],
        'market_segment': ['Consumer'] * 6 + ['Enterprise'] * 6
    }
    
    df_financial = pd.DataFrame(financial_data)
    df_financial['net_income'] = df_financial['revenue'] - df_financial['expenses']
    df_financial['profit_margin'] = (df_financial['net_income'] / df_financial['revenue']) * 100
    
    # Save financial data
    df_financial.to_csv('quarterly_financial_data.csv', index=False)
    
    print("📊 Financial Data Created:")
    display(df_financial.head())
    
    # Create financial analysis template (from integration guide)
    financial_template = """
You are a senior financial analyst. Perform comprehensive analysis of the provided financial data.

## Financial Analysis for Company - 2024

### Market Data Analysis
Analyze the following financial data and provide comprehensive insights:

**Raw Data:**
{{ quarterly_data.content }}

### Analysis Requirements:
1. **Performance Metrics**: Calculate key ratios (ROE, EBITDA margin, profit margins)
2. **Trend Analysis**: Compare performance across time periods
3. **Market Position**: Analyze segment performance 
4. **Risk Assessment**: Identify potential financial risks
5. **Growth Projection**: Forecast trends based on current data

### Regulatory Compliance Check:
Review all metrics and flag any concerning trends for stakeholder reporting.

Create professional visualization showing key financial trends.
"""
    
    with open('financial_analysis_template.j2', 'w') as f:
        f.write(financial_template)
    
    # Financial analysis schema (from integration guide)
    financial_schema = {
        "type": "object",
        "properties": {
            "executive_summary": {
                "type": "string",
                "description": "2-3 sentence summary of financial health"
            },
            "key_metrics": {
                "type": "object",
                "properties": {
                    "total_revenue": {"type": "number"},
                    "net_income": {"type": "number"},
                    "average_profit_margin": {"type": "number"},
                    "revenue_growth_rate": {"type": "number"}
                },
                "required": ["total_revenue", "net_income", "average_profit_margin"]
            },
            "trend_analysis": {
                "type": "object",
                "properties": {
                    "revenue_trend": {"type": "string"},
                    "profit_margin_trend": {"type": "string"},
                    "quarter_over_quarter_change": {"type": "number"}
                }
            },
            "risk_factors": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "risk_type": {"type": "string"},
                        "severity": {"type": "string", "enum": ["low", "medium", "high", "critical"]},
                        "description": {"type": "string"},
                        "mitigation_suggestions": {"type": "string"}
                    },
                    "required": ["risk_type", "severity", "description"]
                }
            },
            "growth_forecast": {
                "type": "object",
                "properties": {
                    "next_quarter_revenue_estimate": {"type": "number"},
                    "confidence_level": {"type": "string", "enum": ["low", "medium", "high"]},
                    "key_assumptions": {"type": "array", "items": {"type": "string"}}
                }
            }
        },
        "required": ["executive_summary", "key_metrics", "risk_factors"]
    }
    
    with open('financial_analysis_schema.json', 'w') as f:
        json.dump(financial_schema, f, indent=2)
    
    print("✅ Financial analysis template and schema created")
    
    # Run financial analysis
    financial_results = run_ostruct_analysis(
        template_file='financial_analysis_template.j2',
        schema_file='financial_analysis_schema.json',
        data_file='quarterly_financial_data.csv',
        model='gpt-4o',
        enable_tools=['code-interpreter', 'web-search'],
        web_query='financial market trends Q4 2024 analysis',
        output_file='financial_analysis_results.json'
    )
    
    print("✅ Financial Analysis Complete!")
    
    # Display key results
    print("\n💼 FINANCIAL ANALYSIS SUMMARY:")
    print(f"📊 Executive Summary: {financial_results['executive_summary']}")
    
    if 'key_metrics' in financial_results:
        metrics = financial_results['key_metrics']
        print(f"💰 Total Revenue: ${metrics.get('total_revenue', 0):,.2f}")
        print(f"💸 Net Income: ${metrics.get('net_income', 0):,.2f}")
        print(f"📈 Avg Profit Margin: {metrics.get('average_profit_margin', 0):.1f}%")
    
    if 'risk_factors' in financial_results:
        print("\n⚠️ Risk Factors:")
        for risk in financial_results['risk_factors'][:3]:  # Show top 3
            severity = risk.get('severity', 'unknown').upper()
            print(f"  • [{severity}] {risk.get('risk_type', 'Unknown')}: {risk.get('description', 'N/A')}")
    
    return financial_results

# Run financial analysis example
financial_results = create_financial_analysis_example()

In [None]:
# Business Intelligence Report Generation Example

def create_business_intelligence_example():
    """Create Business Intelligence workflow from integration guide."""
    
    # Create sample business data
    business_data = {
        'date': pd.date_range('2024-01-01', periods=100, freq='D'),
        'customer_segment': np.random.choice(['Enterprise', 'SMB', 'Consumer'], 100),
        'product_category': np.random.choice(['Software', 'Hardware', 'Services'], 100),
        'revenue': np.random.normal(50000, 15000, 100),
        'customer_satisfaction': np.random.normal(4.2, 0.8, 100),
        'market_share': np.random.normal(0.15, 0.05, 100)
    }
    
    df_business = pd.DataFrame(business_data)
    df_business['revenue'] = np.maximum(df_business['revenue'], 1000)  # Ensure positive revenue
    df_business['customer_satisfaction'] = np.clip(df_business['customer_satisfaction'], 1, 5)
    df_business['market_share'] = np.clip(df_business['market_share'], 0.01, 0.5)
    
    # Save business data
    df_business.to_csv('business_intelligence_data.csv', index=False)
    
    print("📊 Business Intelligence Data Created:")
    display(df_business.head())
    
    # Create BI analysis template (from integration guide)
    bi_template = """
You are a senior business analyst. Perform comprehensive competitive analysis and business intelligence.

## Business Intelligence Report - Q4 2024

### Internal Performance Analysis
**Sales Data:**
{{ sales_data.content }}

### Analysis Requirements:
1. **Market Position**: Analyze our position vs competitors across key metrics
2. **Growth Opportunities**: Identify untapped segments and expansion possibilities  
3. **Competitive Threats**: Assess emerging competitors and market disruptions
4. **Pricing Analysis**: Evaluate price positioning and optimization opportunities
5. **Strategic Recommendations**: Provide actionable next steps with ROI projections

### Executive Briefing Elements:
- Top 3 strategic priorities
- Revenue impact projections
- Resource requirements
- Timeline for implementation

Create professional visualizations showing competitive positioning and market trends.
"""
    
    with open('bi_analysis_template.j2', 'w') as f:
        f.write(bi_template)
    
    # BI analysis schema (from integration guide)
    bi_schema = {
        "type": "object",
        "properties": {
            "executive_summary": {
                "type": "string",
                "description": "CEO-ready 2-3 sentence summary of strategic position"
            },
            "market_position": {
                "type": "object",
                "properties": {
                    "market_share": {"type": "number"},
                    "competitive_ranking": {"type": "integer"},
                    "differentiation_strengths": {"type": "array", "items": {"type": "string"}},
                    "competitive_gaps": {"type": "array", "items": {"type": "string"}}
                }
            },
            "growth_opportunities": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "opportunity": {"type": "string"},
                        "market_size": {"type": "number"},
                        "revenue_potential": {"type": "number"},
                        "time_to_market": {"type": "string"},
                        "investment_required": {"type": "number"},
                        "risk_level": {"type": "string", "enum": ["low", "medium", "high"]}
                    },
                    "required": ["opportunity", "revenue_potential", "risk_level"]
                }
            },
            "strategic_recommendations": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "recommendation": {"type": "string"},
                        "priority": {"type": "string", "enum": ["critical", "high", "medium", "low"]},
                        "expected_roi": {"type": "number"},
                        "implementation_timeline": {"type": "string"},
                        "resource_requirements": {"type": "array", "items": {"type": "string"}},
                        "success_metrics": {"type": "array", "items": {"type": "string"}}
                    },
                    "required": ["recommendation", "priority", "expected_roi"]
                }
            },
            "competitive_analysis": {
                "type": "object",
                "properties": {
                    "top_competitors": {"type": "array", "items": {"type": "string"}},
                    "competitive_advantages": {"type": "array", "items": {"type": "string"}},
                    "market_threats": {"type": "array", "items": {"type": "string"}}
                }
            }
        },
        "required": ["executive_summary", "market_position", "growth_opportunities", "strategic_recommendations"]
    }
    
    with open('bi_analysis_schema.json', 'w') as f:
        json.dump(bi_schema, f, indent=2)
    
    print("✅ Business Intelligence template and schema created")
    
    # Run BI analysis
    bi_results = run_ostruct_analysis(
        template_file='bi_analysis_template.j2',
        schema_file='bi_analysis_schema.json',
        data_file='business_intelligence_data.csv',
        model='gpt-4o',
        enable_tools=['code-interpreter', 'web-search'],
        web_query='business intelligence market trends 2024 competitive analysis',
        output_file='bi_analysis_results.json'
    )
    
    print("✅ Business Intelligence Analysis Complete!")
    
    # Display key results
    print("\n🏢 BUSINESS INTELLIGENCE SUMMARY:")
    print(f"📊 Executive Summary: {bi_results['executive_summary']}")
    
    if 'market_position' in bi_results:
        position = bi_results['market_position']
        print(f"📈 Market Share: {position.get('market_share', 0):.1%}")
        print(f"🏆 Competitive Ranking: #{position.get('competitive_ranking', 'N/A')}")
    
    if 'strategic_recommendations' in bi_results:
        print("\n💡 TOP STRATEGIC RECOMMENDATIONS:")
        for i, rec in enumerate(bi_results['strategic_recommendations'][:3], 1):
            priority = rec.get('priority', 'medium').upper()
            recommendation = rec.get('recommendation', 'N/A')
            roi = rec.get('expected_roi', 0)
            print(f"  {i}. [{priority}] {recommendation}")
            print(f"     Expected ROI: {roi:.1%}")
    
    return bi_results

# Run business intelligence example
bi_results = create_business_intelligence_example()

## Cleanup

Clean up temporary files created during this notebook:

In [None]:
# Cleanup temporary files
import glob

temp_files = [
    '*.csv', '*.json', '*.j2', 'downloads/*', 'batch_results/*'
]

for pattern in temp_files:
    for file in glob.glob(pattern):
        try:
            Path(file).unlink()
            print(f"🗑️ Removed: {file}")
        except:
            pass  # Ignore errors for directories or non-existent files

# Remove directories
for dir_name in ['downloads', 'batch_results']:
    try:
        import shutil
        shutil.rmtree(dir_name)
        print(f"🗑️ Removed directory: {dir_name}")
    except:
        pass

print("✅ Cleanup complete!")

## Next Steps

🎉 **Congratulations!** You've learned how to integrate ostruct with Jupyter notebooks for powerful data science workflows.

### What to try next:

1. **🔄 Adapt for your data**: Replace the sample data with your own datasets
2. **🎨 Custom templates**: Create domain-specific templates for your analysis needs
3. **📊 Advanced schemas**: Design schemas that capture your business metrics
4. **🚀 Production deployment**: Build automated pipelines using these patterns
5. **🔗 Tool integration**: Combine with other data science tools in your stack

### Resources:

- [ostruct Documentation](https://ostruct.readthedocs.io/)
- [Data Science Integration Guide](https://ostruct.readthedocs.io/en/latest/user-guide/data_science_integration.html)
- [More Examples](../)
- [GitHub Repository](https://github.com/yaniv-golan/ostruct)

Happy analyzing! 🚀📊