# Task 1: Exploratory Data Analysis (EDA)
## Insurance Risk Analytics & Predictive Modeling

This notebook performs comprehensive EDA on the insurance claim data to:
- Understand data structure and quality
- Discover patterns in risk and profitability
- Answer key business questions
- Prepare for hypothesis testing and modeling


## 1. Setup and Imports


In [None]:
import sys
sys.path.append('..')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings

from src.data_loader import load_insurance_data, prepare_data_for_analysis
from src.utils import calculate_loss_ratio, detect_outliers_iqr, get_data_summary

warnings.filterwarnings('ignore')

# Set plotting style
try:
    plt.style.use('seaborn-v0_8-darkgrid')
except OSError:
    try:
        plt.style.use('seaborn-darkgrid')
    except OSError:
        plt.style.use('ggplot')
sns.set_palette("husl")

%matplotlib inline


## 2. Load Data

Note: Use `sample_size` parameter for faster initial exploration. Remove it to use the full dataset.


In [None]:
# Load data (use sample_size parameter for faster initial exploration)
# Remove sample_size to use full dataset
df = load_insurance_data(sample_size=50000)  # Adjust as needed
df = prepare_data_for_analysis(df)

print(f"Data shape: {df.shape}")
print(f"Date range: {df['TransactionMonth'].min()} to {df['TransactionMonth'].max()}")


## 3. Run Full EDA

You can run the complete EDA pipeline using the InsuranceEDA class:


In [None]:
from src.eda import InsuranceEDA

# Initialize and run full EDA
eda = InsuranceEDA()
eda.run_full_eda()
