# Reservoir AI - Industrial Forecasting Analysis
## SPE9 Benchmark Dataset - Production Forecasting

## 1. ðŸŽ¯ Project Overview
**Business Problem:** Reservoir production forecasting for optimal field management  
**Dataset:** SPE9 Benchmark - Industry standard reservoir model  
**Objective:** Predict oil production rates using ensemble machine learning

In [None]:
# Cell 1: Imports and Setup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from src.data_loader import OPMDataLoader
from src.config import config

print("ðŸš€ Reservoir AI - Data Analysis")

In [None]:
# Cell 2: Load SPE9 Data
loader = OPMDataLoader()
production_data = loader.load_production_data()

print("ðŸ“Š Dataset Overview:")
print(f"   â€¢ Records: {len(production_data):,}")
print(f"   â€¢ Wells: {production_data['well_id'].nunique()}")
print(f"   â€¢ Producers: {(production_data['well_type'] == 'PRODUCER').sum() // production_data['time_index'].nunique()}")
print(f"   â€¢ Time Range: {production_data['years'].min():.1f} - {production_data['years'].max():.1f} years")

In [None]:
# Cell 3: Basic Visualizations
plt.figure(figsize=(12, 4))

# Oil Rate Distribution
plt.subplot(1, 2, 1)
production_data['oil_rate'].hist(bins=50, alpha=0.7)
plt.title('Oil Rate Distribution')
plt.xlabel('Oil Rate (bbl/day)')
plt.ylabel('Frequency')

# Pressure Distribution
plt.subplot(1, 2, 2)
production_data['bottomhole_pressure'].hist(bins=50, alpha=0.7, color='orange')
plt.title('Pressure Distribution')
plt.xlabel('Pressure (psi)')
plt.ylabel('Frequency')

plt.tight_layout()
plt.show()

## 2. ðŸ“ˆ Key Insights
- Data successfully loaded and visualized
- Production rates show realistic distributions
- Ready for advanced analysis and modeling