# Week 1: Revenue Intelligence Platform - Foundation

**Goal**: Build demand + revenue forecasting with feature engineering and what-if simulator

## Deliverables:
1. ✅ Demand forecasting model with enhanced features
2. ✅ Revenue prediction model  
3. ✅ Profit calculation layer
4. ✅ Price elasticity estimator
5. ✅ What-if simulator for pricing scenarios

---

**Timeline**: Days 1-7  
**Target Accuracy**: Demand MAPE < 12%, Revenue MAPE < 18%

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import yaml
import pickle
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Machine Learning
from sklearn.cluster import MiniBatchKMeans
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import xgboost as xgb

# Display settings
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')
pd.set_option('display.max_columns', None)

print('✅ Libraries loaded successfully')

In [None]:
# Create copy for processing
df = df_train.copy()

# Convert timestamps
if 'tpep_pickup_datetime' in df.columns:
    df['pickup_datetime'] = pd.to_datetime(df['tpep_pickup_datetime'])
    df['dropoff_datetime'] = pd.to_datetime(df['tpep_dropoff_datetime'])
elif 'pickup_datetime' in df.columns:
    df['pickup_datetime'] = pd.to_datetime(df['pickup_datetime'])
    df['dropoff_datetime'] = pd.to_datetime(df['dropoff_datetime'])

# Calculate trip duration
df['trip_duration_minutes'] = (df['dropoff_datetime'] - df['pickup_datetime']).dt.total_seconds() / 60

# Calculate speed
df['speed_mph'] = df['trip_distance'] / (df['trip_duration_minutes'] / 60)
df['speed_mph'] = df['speed_mph'].replace([np.inf, -np.inf], np.nan)

print(f"✅ Time features created")
print(f"Records before cleaning: {len(df):,}")

## 3. Data Cleaning & Preprocessing

Following the original pipeline but with cleaner implementation:

In [None]:
# Load training data (2015-01)
data_dir = config['data']['raw_data_dir']
train_file = config['data']['train_files'][0]

print(f"Loading {train_file}...")
df_train = pd.read_parquet(f"{data_dir}/{train_file}")

print(f"✅ Loaded {len(df_train):,} records")
print(f"Columns: {list(df_train.columns)}")
print(f"\\nData shape: {df_train.shape}")
df_train.head()

## 2. Load Data

In [None]:
# Load configuration
with open('../config/config.yaml', 'r') as f:
    config = yaml.safe_load(f)

print('Configuration loaded:')
print(f"Clusters: {config['clustering']['n_clusters']}")
print(f"Time bin size: {config['timeseries']['bin_size_seconds']}s")
print(f"Platform margin: {config['business']['platform_margin']*100}%")
print(f"Default elasticity: {config['business']['default_elasticity']}")

## 1. Configuration & Setup