# Objective: 
Learn how to define ML problems correctly, choose the right approach,and connect business problems to data and ML goals.

### 1. What is Problem Framing?
ML success depends on asking the *right question*.
Example: Are we predicting a number (regression) or a category (classification)?


In [None]:
# Let's look at examples and decide the ML type:

examples = [
    ("Predict house price based on features", "?"),
    ("Detect if an email is spam or not", "?"),
    ("Group customers by shopping patterns", "?"),
    ("Teach a robot to walk", "?")
]

for problem, _ in examples:
    print(problem)

In [None]:
Example business problem: Reduce customer churn.

# Step 1: Define Objective
#     "Predict if a customer will leave next month"
# Step 2: Identify Data
#     Customer activity, payment history, support tickets
# Step 3: Define Target Variable
#     churn = 1 if customer left, else 0
# Step 4: Choose ML Type
#     Classification problem
# Step 5: Define Evaluation Metric
#     Accuracy? Precision? Recall?

In [None]:
# Let's simulate a small churn dataset and see how framing works.

import pandas as pd
import numpy as np

np.random.seed(42)

# Create dummy churn data
data = {
    'monthly_spend': np.random.randint(20, 200, 20),
    'months_active': np.random.randint(1, 24, 20),
    'support_calls': np.random.randint(0, 10, 20),
    'churned': np.random.choice([0, 1], size=20)
}
df = pd.DataFrame(data)
df.head()

# ## 3. Identify Inputs (Features) and Output (Label)

X = df[['monthly_spend', 'months_active', 'support_calls']]
y = df['churned']

print("Features (X):", list(X.columns))
print("Label (y): churned")


In [None]:
# ## 4. Choose ML Category
# We have labels (0 or 1) → Classification Problem.

# Let's train a quick model to validate framing.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression()
model.fit(X_train, y_train)

print("Training accuracy:", model.score(X_train, y_train))
print("Testing accuracy:", model.score(X_test, y_test))

# ## 5. Feature-Target Relationship Visualization
import matplotlib.pyplot as plt
plt.scatter(df['months_active'], df['monthly_spend'], c=df['churned'], cmap='cool')
plt.xlabel('Months Active')
plt.ylabel('Monthly Spend ($)')
plt.title('Customer Churn Visualization')
plt.show()

# ## 6. Reframing the Problem
# Can the same dataset be used for other ML goals?
# Examples:
# - Predict expected monthly spend (Regression)
# - Cluster customers by usage pattern (Unsupervised)
# - Recommend retention actions (Reinforcement Learning)


In [None]:
# Quick unsupervised demo: Cluster customers
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(X)
plt.scatter(df['monthly_spend'], df['months_active'], c=kmeans.labels_, cmap='Set1')
plt.xlabel('Monthly Spend')
plt.ylabel('Months Active')
plt.title('Customer Segments via Clustering')
plt.show()

# ## 7. Evaluation Metric Choice
# Let's compare Accuracy vs Recall for churn problem.

from sklearn.metrics import accuracy_score, recall_score, confusion_matrix, ConfusionMatrixDisplay

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
ConfusionMatrixDisplay.from_estimator(model, X_test, y_test)
plt.show()

In [None]:
# - If missing a churned customer is costly → Recall is more important.
# - If false alarms annoy customers → Precision matters more.



In [None]:
# ## 8. Recap
# Problem framing = Translating goals → ML-ready formulation.
# Steps:
# 1. Define objective
# 2. Identify data
# 3. Choose target variable
# 4. Select ML type
# 5. Pick evaluation metric
# 6. Validate assumptions
