In [1]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Customer Churn Feature Engineering\n",
    "\n",
    "This notebook focuses on transforming and creating features to improve our churn prediction model. We'll use insights from our exploratory data analysis to engineer meaningful features that capture important patterns in the data.\n",
    "\n",
    "## Feature Engineering Objectives\n",
    "\n",
    "1. **Data Preprocessing**: Handle missing values and convert data types\n",
    "2. **Categorical Feature Encoding**: Convert categorical variables to numerical format\n",
    "3. **Service Usage Features**: Create metrics of service adoption and engagement\n",
    "4. **Customer Value Features**: Develop features related to customer lifetime value\n",
    "5. **Tenure-Based Features**: Create meaningful tenure groupings and related metrics\n",
    "6. **Interaction Features**: Capture relationships between different variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Import libraries\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
    "\n",
    "# Set plot style\n",
    "plt.style.use('seaborn-whitegrid')\n",
    "sns.set_palette('colorblind')\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Import modular code from our package\n",
    "import sys\n",
    "import os\n",
    "sys.path.append('..')\n",
    "from src.data_processor import load_data, handle_missing_values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Load and Preprocess Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Load the dataset\n",
    "file_path = '../data/WA_Fn-UseC_-Telco-Customer-Churn.csv'\n",
    "df = load_data(file_path)\n",
    "\n",
    "# Handle missing values\n",
    "df = handle_missing_values(df)\n",
    "\n",
    "# Check data after handling missing values\n",
    "print(\"Missing values after preprocessing:\")\n",
    "print(df.isNone().sum())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Categorical Feature Encoding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Binary variable encoding\n",
    "binary_vars = ['Partner', 'Dependents', 'PhoneService', 'PaperlessBilling', 'Churn']\n",
    "for var in binary_vars:\n",
    "    df[var] = df[var].map({'Yes': 1, 'No': 0})\n",
    "\n",
    "# Verify binary encoding\n",
    "df[binary_vars].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Categorical columns that will need one-hot encoding for modeling\n",
    "categorical_cols = ['gender', 'MultipleLines', 'InternetService', 'OnlineSecurity', \n",
    "                     'OnlineBackup', 'DeviceProtection', 'TechSupport', \n",
    "                     'StreamingTV', 'StreamingMovies', 'Contract', 'PaymentMethod']\n",
    "\n",
    "# Preview categorical columns\n",
    "for col in categorical_cols[:5]:  # Show just the first 5 for brevity\n",
    "    print(f\"\\nUnique values in {col}:\")\n",
    "    print(df[col].value_counts())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Service Usage Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Calculate total number of services\n",
    "df['TotalServices'] = df[['OnlineSecurity', 'OnlineBackup', 'DeviceProtection',\n",
    "                           'TechSupport', 'StreamingTV', 'StreamingMovies']].apply(\n",
    "    lambda row: sum(1 for item in row if item == 'Yes'), axis=1\n",
    ")\n",
    "\n",
    "# Create service flags\n",
    "df['HasTechSupport'] = df['TechSupport'].apply(lambda x: 1 if x == 'Yes' else 0)\n",
    "df['HasOnlineSecurity'] = df['OnlineSecurity'].apply(lambda x: 1 if x == 'Yes' else 0)\n",
    "df['HasStreamingServices'] = ((df['StreamingTV'] == 'Yes') & \n",
    "                             (df['StreamingMovies'] == 'Yes')).astype(int)\n",
    "\n",
    "# Service feature categories\n",
    "df['HasBasicProtection'] = ((df['OnlineSecurity'] == 'Yes') | \n",
    "                           (df['DeviceProtection'] == 'Yes')).astype(int)\n",
    "\n",
    "# Visualize the relationship between total services and churn\n",
    "plt.figure(figsize=(10, 6))\n",
    "service_churn = df.groupby('TotalServices')['Churn'].mean() * 100\n",
    "service_churn.plot(kind='bar')\n",
    "plt.title('Churn Rate by Number of Additional Services')\n",
    "plt.xlabel('Number of Services')\n",
    "plt.ylabel('Churn Rate (%)')\n",
    "plt.xticks(rotation=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Calculate service adoption rate as percentage of possible services\n",
    "# For customers with internet service\n",
    "internet_customers = df[df['InternetService'] != 'No']\n",
    "internet_customers['ServiceAdoptionRate'] = internet_customers['TotalServices'] / 6 * 100\n",
    "\n",
    "# Fill ServiceAdoptionRate for non-internet customers with 0\n",
    "df['ServiceAdoptionRate'] = df['TotalServices'] / 6 * 100\n",
    "\n",
    "# Visualize service adoption rate distribution\n",
    "plt.figure(figsize=(10, 6))\n",
    "sns.histplot(data=internet_customers, x='ServiceAdoptionRate', hue='Churn', bins=7, \n",
    "             multiple='stack')\n",
    "plt.title('Service Adoption Rate Distribution by Churn Status (Internet Customers)')\n",
    "plt.xlabel('Service Adoption Rate (%)')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Customer Value Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Customer Lifetime Value (CLV) approximation\n",
    "df['CLV'] = df['tenure'] * df['MonthlyCharges']\n",
    "\n",
    "# Average monthly spend\n",
    "df['AvgMonthlySpend'] = df['TotalCharges'] / df['tenure'].replace(0, 1)\n",
    "\n",
    "# Monthly-to-annual ratio (indicates if charges are increasing over time)\n",
    "df['MonthlyToAnnualRatio'] = df['MonthlyCharges'] * 12 / df['TotalCharges'].replace(0, 1)\n",
    "df['MonthlyToAnnualRatio'] = df['MonthlyToAnnualRatio'].clip(0, 5)  # Cap extreme values\n",
    "\n",
    "# Normalize CLV by tenure for better comparison\n",
    "df['NormalizedCLV'] = df['CLV'] / df['tenure'].replace(0, 1)\n",
    "\n",
    "# Visualize CLV distribution by churn\n",
    "plt.figure(figsize=(12, 6))\n",
    "plt.subplot(1, 2, 1)\n",
    "sns.histplot(data=df, x='CLV', hue='Churn', bins=20, kde=True)\n",
    "plt.title('Customer Lifetime Value by Churn Status')\n",
    "\n",
    "plt.subplot(1, 2, 2)\n",
    "sns.boxplot(x='Churn', y='CLV', data=df)\n",
    "plt.title('CLV Distribution by Churn Status')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Create CLV segments\n",
    "df['CLVSegment'] = pd.qcut(df['CLV'], 4, labels=['Low', 'Medium-Low', 'Medium-High', 'High'])\n",
    "\n",
    "# Churn rate by CLV segment\n",
    "plt.figure(figsize=(10, 6))\n",
    "clv_segment_churn = df.groupby('CLVSegment')['Churn'].mean() * 100\n",
    "clv_segment_churn.plot(kind='bar')\n",
    "plt.title('Churn Rate by Customer Lifetime Value Segment')\n",
    "plt.ylabel('Churn Rate (%)')\n",
    "plt.xticks(rotation=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Tenure-Based Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Create tenure groups\n",
    "df['TenureGroup'] = pd.cut(df['tenure'], \n",
    "                           bins=[0, 12, 24, 36, 48, 60, np.inf], \n",
    "                           labels=['0-1 year', '1-2 years', '2-3 years', \n",
    "                                  '3-4 years', '4-5 years', '5+ years'])\n",
    "\n",
    "# Identify new customers (less than 6 months)\n",
    "df['NewCustomer'] = (df['tenure'] <= 6).astype(int)\n",
    "\n",
    "# Identify loyal customers (more than 2 years)\n",
    "df['LoyalCustomer'] = (df['tenure'] > 24).astype(int)\n",
    "\n",
    "# Visualize churn rate by tenure group\n",
    "plt.figure(figsize=(10, 6))\n",
    "tenure_group_churn = df.groupby('TenureGroup')['Churn'].mean() * 100\n",
    "tenure_group_churn.plot(kind='bar')\n",
    "plt.title('Churn Rate by Tenure Group')\n",
    "plt.ylabel('Churn Rate (%)')\n",
    "plt.ylim(0, 50)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Create tenure-based payment ratio (monthly charges as percentage of total charges)\n",
    "df['PaymentToTenureRatio'] = (df['MonthlyCharges'] * df['tenure']) / df['TotalCharges'].replace(0, 1)\n",
    "# Values close to 1 indicate stable monthly payments, values > 1 may indicate increasing payments\n",
    "\n",
    "# Clip extreme values\n",
    "df['PaymentToTenureRatio'] = df['PaymentToTenureRatio'].clip(0.8, 1.2)\n",
    "\n",
    "# Analyze payment ratio distribution\n",
    "plt.figure(figsize=(10, 6))\n",
    "sns.histplot(data=df, x='PaymentToTenureRatio', hue='Churn', bins=20, kde=True)\n",
    "plt.title('Payment to Tenure Ratio by Churn Status')\n",
    "plt.axvline(x=1, color='red', linestyle='--')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Contract Risk Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Contract risk factor (higher for month-to-month)\n",
    "df['ContractRiskFactor'] = df['Contract'].map({\n",
    "    'Month-to-month': 2, \n",
    "    'One year': 1, \n",
    "    'Two year': 0\n",
    "})\n",
    "\n",
    "# Payment method risk (electronic check has highest risk)\n",
    "df['PaymentRiskFactor'] = df['PaymentMethod'].map({\n",
    "    'Electronic check': 3,\n",
    "    'Mailed check': 2,\n",
    "    'Bank transfer (automatic)': 1,\n",
    "    'Credit card (automatic)': 0\n",
    "})\n",
    "\n",
    "# Composite risk score\n",
    "df['CompositeRiskScore'] = (\n",
    "    df['ContractRiskFactor'] + \n",
    "    df['PaymentRiskFactor'] * 0.5 +\n",
    "    (1 - df['HasOnlineSecurity']) * 0.5 +\n",
    "    (1 - df['HasTechSupport']) * 0.5 +\n",
    "    (df['NewCustomer']) * 1.5\n",
    ")\n",
    "\n",
    "# Normalize to 0-10 scale\n",
    "min_score = df['CompositeRiskScore'].min()\n",
    "max_score = df['CompositeRiskScore'].max()\n",
    "df['CompositeRiskScore'] = ((df['CompositeRiskScore'] - min_score) / \n",
    "                           (max_score - min_score)) * 10\n",
    "\n",
    "# Visualize composite risk score\n",
    "plt.figure(figsize=(10, 6))\n",
    "sns.histplot(data=df, x='CompositeRiskScore', hue='Churn', bins=20, kde=True)\n",
    "plt.title('Composite Risk Score Distribution by Churn Status')\n",
    "plt.xlabel('Composite Risk Score (0-10)')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Create risk segments\n",
    "df['RiskSegment'] = pd.qcut(df['CompositeRiskScore'], 4, \n",
    "                            labels=['Low Risk', 'Medium-Low Risk', \n",
    "                                   'Medium-High Risk', 'High Risk'])\n",
    "\n",
    "# Analyze churn rate by risk segment\n",
    "plt.figure(figsize=(10, 6))\n",
    "risk_segment_churn = df.groupby('RiskSegment')['Churn'].mean() * 100\n",
    "risk_segment_churn.plot(kind='bar')\n",
    "plt.title('Churn Rate by Risk Segment')\n",
    "plt.ylabel('Churn Rate (%)')\n",
    "plt.xticks(rotation=0)\n",
    "plt.show()\n",
    "\n",
    "# Print churn rates\n",
    "print(\"Churn Rate by Risk Segment:\")\n",
    "print(risk_segment_churn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Interaction Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Create interaction features\n",
    "\n",
    "# Interaction between contract type and tenure\n",
    "df['ContractTenureInteraction'] = df['ContractRiskFactor'] * df['tenure']\n",
    "\n",
    "# High-value customer with short-term contract (high risk)\n",
    "df['HighValueShortContract'] = ((df['MonthlyCharges'] > df['MonthlyCharges'].median()) & \n",
    "                               (df['Contract'] == 'Month-to-month')).astype(int)\n",
    "\n",
    "# Senior citizen with Fiber service (high risk)\n",
    "df['SeniorWithFiber'] = ((df['SeniorCitizen'] == 1) & \n",
    "                         (df['InternetService'] == 'Fiber optic')).astype(int)\n",
    "\n",
    "# New customer with high monthly charge (high risk)\n",
    "df['NewCustomerHighCharge'] = ((df['NewCustomer'] == 1) & \n",
    "                              (df['MonthlyCharges'] > df['MonthlyCharges'].median())).astype(int)\n",
    "\n",
    "# Fiber without protection (high risk)\n",
    "df['FiberNoProtection'] = ((df['InternetService'] == 'Fiber optic') & \n",
    "                          (df['HasBasicProtection'] == 0)).astype(int)\n",
    "\n",
    "# Analyze some of these interaction features\n",
    "interaction_features = ['HighValueShortContract', 'SeniorWithFiber', \n",
    "                        'NewCustomerHighCharge', 'FiberNoProtection']\n",
    "\n",
    "fig, axes = plt.subplots(2, 2, figsize=(15, 10))\n",
    "axes = axes.flatten()\n",
    "\n",
    "for i, feature in enumerate(interaction_features):\n",
    "    feature_churn = df.groupby(feature)['Churn'].mean() * 100\n",
    "    feature_counts = df[feature].value_counts()\n",
    "    \n",
    "    ax = axes[i]\n",
    "    bars = ax.bar([0, 1], feature_churn, color=['skyblue', 'coral'])\n",
    "    \n",
    "    # Add count labels\n",
    "    for j, bar in enumerate(bars):\n",
    "        count = feature_counts.get(j, 0)\n",
    "        ax.text(bar.get_x() + bar.get_width()/2., 5, \n",
    "                f'n={count}', ha='center', color='white', fontweight='bold')\n",
    "    \n",
    "    ax.set_title(f'Churn Rate by {feature}')\n",
    "    ax.set_xticks([0, 1])\n",
    "    ax.set_xticklabels(['No', 'Yes'])\n",
    "    ax.set_ylabel('Churn Rate (%)')\n",
    "    ax.set_ylim(0, 100)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Final Feature Set Summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Display all created features\n",
    "engineered_features = [\n",
    "    # Service Features\n",
    "    'TotalServices', 'HasTechSupport', 'HasOnlineSecurity', 'HasStreamingServices',\n",
    "    'HasBasicProtection', 'ServiceAdoptionRate',\n",
    "    \n",
    "    # Customer Value Features\n",
    "    'CLV', 'AvgMonthlySpend', 'MonthlyToAnnualRatio', 'NormalizedCLV', 'CLVSegment',\n",
    "    \n",
    "    # Tenure Features\n",
    "    'TenureGroup', 'NewCustomer', 'LoyalCustomer', 'PaymentToTenureRatio',\n",
    "    \n",
    "    # Risk Features\n",
    "    'ContractRiskFactor', 'PaymentRiskFactor', 'CompositeRiskScore', 'RiskSegment',\n",
    "    \n",
    "    # Interaction Features\n",
    "    'ContractTenureInteraction', 'HighValueShortContract', 'SeniorWithFiber',\n",
    "    'NewCustomerHighCharge', 'FiberNoProtection'\n",
    "]\n",
    "\n",
    "print(f\"Number of engineered features: {len(engineered_features)}\")\n",
    "print(\"\\nEngineered features:\")\n",
    "for i, feature in enumerate(engineered_features):\n",
    "    print(f\"{i+1}. {feature}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Save the preprocessed and feature-engineered dataframe\n",
    "df.to_csv('../data/telco_churn_engineered.csv', index=False)\n",
    "print(\"Engineered dataset saved to '../data/telco_churn_engineered.csv'\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Preliminary Feature Importance Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Calculate correlation with churn for numerical features\n",
    "numerical_features = df.select_dtypes(include=['int64', 'float64']).columns.tolist()\n",
    "churn_corr = df[numerical_features].corr()['Churn'].sort_values(ascending=False)\n",
    "\n",
    "# Display top correlations\n",
    "print(\"Top 10 Positive Correlations with Churn:\")\n",
    "print(churn_corr.head(11))  # 11 to include Churn itself\n",
    "\n",
    "print(\"\\nTop 10 Negative Correlations with Churn:\")\n",
    "print(churn_corr.tail(10))\n",
    "\n",
    "# Visualize top correlations\n",
    "plt.figure(figsize=(12, 8))\n",
    "churn_corr_filtered = churn_corr[churn_corr.index != 'Churn']\n",
    "top_features = churn_corr_filtered.abs().nlargest(15).index\n",
    "churn_corr_plot = churn_corr_filtered[top_features]\n",
    "\n",
    "# Sort by absolute correlation\n",
    "churn_corr_plot = churn_corr_plot.reindex(churn_corr_plot.abs().sort_values().index)\n",
    "\n",
    "colors = ['red' if x < 0 else 'green' for x in churn_corr_plot]\n",
    "plt.barh(churn_corr_plot.index, churn_corr_plot, color=colors)\n",
    "plt.title('Top 15 Features by Correlation with Churn')\n",
    "plt.xlabel('Correlation Coefficient')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10. Categorical Feature Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "source": [
    "# Analyze churn rate by categorical features\n",
    "categorical_features = df.select_dtypes(include=['object']).columns.tolist()\n",
    "categorical_features = [f for f in categorical_features if f != 'customerID']\n",
    "\n",
    "# Calculate churn rate and count for each category\n",
    "cat_analysis = []\n",
    "for feature in categorical_features:\n",
    "    grouped = df.groupby(feature)['Churn'].agg(['mean', 'count'])\n",
    "    grouped['mean'] = grouped['mean'] * 100  # Convert to percentage\n",
    "    grouped['feature'] = feature\n",
    "    cat_analysis.append(grouped.reset_index().rename(columns={'mean': 'churn_rate', 'index': 'category'}))\n",
    "\n",
    "cat_df = pd.concat(cat_analysis)\n",
    "\n",
    "# Sort by churn rate for visualization\n",
    "cat_df = cat_df.sort_values('churn_rate', ascending=False)\n",
    "\n",
    "# Display top categories by churn rate\n",
    "print(\"Top 10 Categories by Churn Rate:\")\n",
    "print(cat_df.head(10)[['feature', 'category', 'churn_rate', 'count']])\n",
    "\n",
    "# Visualize top categories by churn rate\n",
    "plt.figure(figsize=(12, 8))\n",
    "top_cat = cat_df.head(15)\n",
    "bars = plt.barh(top_cat['feature'] + ' - ' + top_cat['category'].astype(str), top_cat['churn_rate'])\n",
    "\n",
    "# Add count labels\n",
    "for i, bar in enumerate(bars):\n",
    "    plt.text(bar.get_width() + 1, bar.get_y() + bar.get_height()/2, \n",
    "             f'n={top_cat.iloc[i][\"count\"]}', va='center')\n",
    "\n",
    "plt.title('Top 15 Categories by Churn Rate')\n",
    "plt.xlabel('Churn Rate (%)')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Feature Engineering Summary\n",
    "\n",
    "We've successfully engineered a comprehensive set of features for our churn prediction model. Here's a summary of the key feature types created:\n",
    "\n",
    "1. **Service Usage Features**:\n",
    "   - Total number of services\n",
    "   - Service adoption rate\n",
    "   - Specific service flags\n",
    "\n",
    "2. **Customer Value Features**:\n",
    "   - Customer Lifetime Value (CLV)\n",
    "   - Average monthly spend\n",
    "   - Normalized CLV and segmentation\n",
    "\n",
    "3. **Tenure-Based Features**:\n",
    "   - Tenure groups\n",
    "   - New customer flag\n",
    "   - Loyal customer flag\n",
    "\n",
    "4. **Risk Features**:\n",
    "   - Contract risk factor\n",
    "   - Payment method risk factor\n",
    "   - Composite risk score\n",
    "\n",
    "5. **Interaction Features**:\n",
    "   - High-value customers with short-term contracts\n",
    "   - Seniors with fiber service\n",
    "   - New customers with high charges\n",
    "   - Fiber customers without protection services\n",
    "\n",
    "### Key Findings:\n",
    "\n",
    "- The **Composite Risk Score** shows strong separation between churned and non-churned customers, suggesting it will be valuable for prediction.\n",
    "- **Contract type** and its interaction with other features remains one of the strongest predictors of churn.\n",
    "- **Tenure** and related features show strong negative correlation with churn.\n",
    "- **Service adoption**, particularly security and support services, correlates strongly with customer retention.\n",
    "- Several **interaction features** identify specific high-risk customer segments.\n",
    "\n",
    "### Next Steps:\n",
    "\n",
    "1. Prepare the data for model training (one-hot encoding of remaining categorical variables)\n",
    "2. Train and evaluate various machine learning models\n",
    "3. Analyze feature importance from the trained models\n",
    "4. Develop targeted retention strategies based on model insights"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

{'cells': [{'cell_type': 'markdown',
   'metadata': {},
   'source': ['# Customer Churn Feature Engineering\n',
    '\n',
    "This notebook focuses on transforming and creating features to improve our churn prediction model. We'll use insights from our exploratory data analysis to engineer meaningful features that capture important patterns in the data.\n",
    '\n',
    '## Feature Engineering Objectives\n',
    '\n',
    '1. **Data Preprocessing**: Handle missing values and convert data types\n',
    '2. **Categorical Feature Encoding**: Convert categorical variables to numerical format\n',
    '3. **Service Usage Features**: Create metrics of service adoption and engagement\n',
    '4. **Customer Value Features**: Develop features related to customer lifetime value\n',
    '5. **Tenure-Based Features**: Create meaningful tenure groupings and related metrics\n',
    '6. **Interaction Features**: Capture relationships between different variables']},
  {'cell_type': 'code',
   'execut