# Stage 09 — Homework Starter Notebook

In the lecture, we learned how to create engineered features. Now it’s your turn to apply those ideas to your own project data.

In [1]:
import pandas as pd
import numpy as np

# Example synthetic data (replace with your project dataset)
np.random.seed(0)
n = 100
df = pd.DataFrame({
    'income': np.random.normal(60000, 15000, n).astype(int),
    'monthly_spend': np.random.normal(2000, 600, n).astype(int),
    'credit_score': np.random.normal(680, 50, n).astype(int)
})
df.head()

Unnamed: 0,income,monthly_spend,credit_score
0,86460,3129,661
1,66002,1191,668
2,74681,1237,734
3,93613,2581,712
4,88013,1296,712


## TODO: Implement at least 2 engineered features here

In [None]:
# Example template:
df['spend_income_ratio'] = df['monthly_spend'] / df['income']  # TODO: Your feature
# Add rationale in markdown below

In [None]:
# Feature 1: log1p transform on the first numeric column (robust to skew; safe for non-positive via shift)
num_cols = df.select_dtypes(include="number").columns.tolist()
assert len(num_cols) >= 1, "No numeric columns found."
c0 = num_cols[0]

x = df[c0]
x_shift = x - x.min() + 1 if x.min() <= 0 else x  # ensure >=0 for log1p
df[f"{c0}_log1p"] = np.log1p(x_shift)
df[[c0, f"{c0}_log1p"]].head(3)


**Feature 1 — `log1p` of first numeric column**

- **What:** `df["<c0>_log1p"] = log1p(<c0> shifted to non-negative if needed)`
- **Why:** Reduces right-skew and dampens extreme values observed in EDA, improving linear models’ stability and interpretability.
- **When helpful:** Skewed/heavy-tailed variables; keeps all rows (no dropping).


In [None]:
# Feature 2: z-score standardization of the same column (units-free comparability)
mu = df[c0].mean()
sigma = df[c0].std(ddof=0)
df[f"{c0}_z"] = (df[c0] - mu) / sigma if sigma != 0 else 0.0
df[[c0, f"{c0}_z"]].head(3)


**Feature 2 — z-score of the first numeric column**

- **What:** `(<c0> - mean)/std`
- **Why:** Centers and scales to unit variance; useful when combining features on different scales or using regularized models.
- **Note:** If std = 0 (constant column), returns 0s to avoid division errors.


### Rationale for Feature 1
Explain why this feature may help a model. Reference your EDA.

In [None]:
# TODO: Add another feature
# Example: df['rolling_spend_mean'] = df['monthly_spend'].rolling(3).mean()

### Rationale for Feature 2
Explain why this feature may help a model. Reference your EDA.