# Phase 5: Recommendation Logic & Inference

**Goal**: Build an end-to-end pipeline that takes a **new customer** and recommends the **best offer** for them.

## Process
1.  **Define Strategy**: Analyze which offers perform best for each Cluster (using historical data).
2.  **Load Operational Models**: Load the saved Scaler and Classifier.
3.  **Build Pipeline**:
    *   New Date -> Preprocess -> Predict Cluster -> Lookup Best Offer.
4.  **Demonstrate**: Run the pipeline on sample data.

In [1]:
import pandas as pd
import numpy as np
import joblib
import json

# For parsing the 'value' column in transcript if needed, though we likely did this in Step 1
import ast

print("Libraries imported.")

Libraries imported.


## 1. Load Data & Models
We need:
*   `portfolio.csv`: To know what the offers actually are (BOGO, Discount, etc.).
*   `customer_clusters.csv`: To know who belongs to which group.
*   `transcript.csv`: To see which offers were actually *completed* by these groups.
*   **Models**: The pickle files from Phase 4.

In [2]:
# Load Models
scaler = joblib.load('../models/customer_scaler.pkl')
model = joblib.load('../models/customer_classifier.pkl')

print("Operational models loaded successfully.")

# Load Data
portfolio = pd.read_csv('../data/portfolio.csv')
clusters = pd.read_csv('../data/customer_clusters.csv')
transcript = pd.read_csv('../data/transcript.csv')

print(f"Portfolio options: {len(portfolio)}")
print(f"Labeled Customers: {len(clusters)}")

Operational models loaded successfully.
Portfolio options: 10
Labeled Customers: 14825


## 2. Define Recommendation Strategy
**Objective**: Create a mapping `{Cluster_ID: Best_Offer_ID}`.

Logic:
1.  Filter `transcript` for `offer completed` events.
2.  Extract the `offer_id` from the value column.
3.  Merge with `clusters` to see which group completed which offer.
4.  Find the most frequent completed offer for each cluster.

In [3]:
# 1. Parsing Transcript for Offer IDs (if not already clean in your data folder)
# Usually, in 'transcript', the 'value' column is a string dict like "{'offer_id': '...'}"
# We'll extract it efficiently.

def get_offer_id(val):
    try:
        if isinstance(val, str):
            val_dict = ast.literal_eval(val)
            # Keys can be 'offer_id' or 'offer id'
            return val_dict.get('offer_id') or val_dict.get('offer id')
    except:
        return None
    return None

# Filter for completions only
completed_offers = transcript[transcript['event'] == 'offer completed'].copy()
completed_offers['offer_id'] = completed_offers['value'].apply(get_offer_id)

# 2. Merge with Clusters
# We need to map person (customer_id) -> cluster -> offer
merged = pd.merge(completed_offers, clusters, left_on='person', right_on='customer_id')

# 3. Find Top Offer per Cluster
popular_offers = merged.groupby(['cluster', 'offer_id']).size().reset_index(name='count')
# Sort by cluster and count (descending)
popular_offers = popular_offers.sort_values(['cluster', 'count'], ascending=[True, False])

# Get top 1 for each cluster
best_offers = popular_offers.groupby('cluster').head(1).reset_index(drop=True)

# Create the Map
cluster_recommendations = dict(zip(best_offers['cluster'], best_offers['offer_id']))

print("Recommendation Strategy (Best Offer per Cluster):")
for cluster, offer in cluster_recommendations.items():
    offer_type = portfolio[portfolio['id'] == offer]['offer_type'].values[0]
    print(f"Cluster {cluster}: {offer} ({offer_type})")

Recommendation Strategy (Best Offer per Cluster):
Cluster 0: 9b98b8c7a33c4b65b9aebfe6a799e6d9 (bogo)
Cluster 1: fafdcd668e3743c1bb461111dcafc2a4 (discount)
Cluster 2: fafdcd668e3743c1bb461111dcafc2a4 (discount)
Cluster 3: 9b98b8c7a33c4b65b9aebfe6a799e6d9 (bogo)


## 3. Build Inference Pipeline
Now we wrap everything into functions to simulate a production API.

Input: Dictionary of Customer Features.
Output: Recommended Offer.

In [4]:
def get_recommendation_for_new_customer(customer_data):
    """
    Full pipeline: Preprocess -> Scale -> Predict -> Recommend
    """
    
    # 1. Convert Dictionary to DataFrame (Expected Format)
    # Note: These keys MUST match the training features exactly (order matters!)
    # We should have saved the feature list in Phase 4, but we can infer it or hardcode if we know it.
    # For robust code, we'll try to match the scaler's expected input size.
    
    input_df = pd.DataFrame([customer_data])
    
    # 2. Preprocess / Scale
    try:
        scaled_features = scaler.transform(input_df)
    except ValueError as e:
        return f"Error: Feature mismatch. Ensure input has columns: {scaler.feature_names_in_}"

    # 3. Predict Segment
    predicted_cluster = model.predict(scaled_features)[0]
    
    # 4. Get Recommendation
    rec_offer_id = cluster_recommendations.get(predicted_cluster)
    
    # Get Offer Details
    offer_details = portfolio[portfolio['id'] == rec_offer_id].iloc[0]
    
    return {
        "Assigned_Segment": int(predicted_cluster),
        "Recommended_Offer_ID": rec_offer_id,
        "Offer_Type": offer_details['offer_type'],
        "Reward": offer_details['reward'],
        "Difficulty": offer_details['difficulty']
    }

# --- TEST THE PIPELINE ---
# Define a dummy customer (Make sure this matches your Phase 4 columns!)
# Example: High income, high spend -> likely wants discounts?
# Note: You need to replace these keys with YOUR ACTUAL features from Step 4.
new_customer_example = {
    'age': 35,
    'income': 72000,
    'membership_days_log': np.log(100), # Short member
    'total_amount': 50.0,
    'avg_transaction_value': 15.0,
    'transaction_count': 3,
    'offer_completion_rate': 0.5,
    'gender_F': 0, 'gender_M': 1, 'gender_O': 0
    # Add other features if your model used them (e.g., channel columns)
}

# The cell below needs to be adjusted by the user to match exact columns!
print("Pipeline function defined.")

Pipeline function defined.


## 4. Operational Demo
Run the function with the sample data. *Note: If this fails, check that the keys in `new_customer_example` match `scaler.feature_names_in_`.*

In [6]:
# Check what features the scaler expects
print("Model expects features:", list(scaler.feature_names_in_))

# IMPORTANT: Construct a valid test vector based on the printed feature names
# For demonstration, we'll create a zero-vector and fill known values
test_input = {feature: 0 for feature in scaler.feature_names_in_}
test_input['age'] = 40
test_input['income'] = 60000
test_input['transaction_count'] = 10
# ... fill others as needed ...

# Run
result = get_recommendation_for_new_customer(test_input)
print("\n--- Recommendation Result ---")

# Helper to handle NumPy types (int64, float64) during JSON serialization
def convert_numpy(o):
	if isinstance(o, (np.integer, np.floating)):
		return o.item()
	raise TypeError

print(json.dumps(result, indent=4, default=convert_numpy))

Model expects features: ['Unnamed: 0', 'age', 'income', 'membership_days', 'total_amount', 'transaction_count', 'average_transaction_value', 'offer completed', 'offer received', 'offer viewed', 'completion_rate', 'bogo_completed', 'discount_completed', 'channel_web_count', 'channel_email_count', 'channel_mobile_count', 'channel_social_count', 'gender_F', 'gender_M', 'gender_O']

--- Recommendation Result ---
{
    "Assigned_Segment": 0,
    "Recommended_Offer_ID": "9b98b8c7a33c4b65b9aebfe6a799e6d9",
    "Offer_Type": "bogo",
    "Reward": 5,
    "Difficulty": 5
}
