### Project Overview: Customer Churn Prediction

#### Objective:
The main goal of this project is to predict customer churn for a telecommunications company. Customer churn refers to the phenomenon where customers stop doing business with a company, in this case, canceling their services with the telecom provider. Predicting churn is crucial for businesses to take proactive measures to retain customers and reduce revenue loss.

1. **Dataset Creation:** 
   - We generated a synthetic dataset that simulates customer data for a telecommunications company. This dataset includes various features such as customer demographics (age, gender), services subscribed (internet service type, contract duration), and usage patterns (tenure, monthly charges).

In [1]:
!pip install faker


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [12]:
from faker import Faker
import random
import pandas as pd

fake = Faker()

# Generate synthetic dataset with logical dependencies
def generate_logical_dataset(num_samples=1000):
    data = []
    for _ in range(num_samples):
        age = random.randint(18, 85)
        gender = random.choice(['Male', 'Female'])
        
        # Internet service influences monthly charges and possibly churn
        internet_service = random.choices(
            ['DSL', 'Fiber optic', 'None'], 
            weights=[0.3, 0.5, 0.2], k=1
        )[0]
        
        # Contract type influences churn likelihood
        contract = random.choices(
            ['Month-to-month', 'One year', 'Two year'], 
            weights=[0.5, 0.3, 0.2], k=1
        )[0]
        
        # Tenure influences contract type
        if contract == 'Month-to-month':
            tenure = random.randint(1, 12)
        elif contract == 'One year':
            tenure = random.randint(12, 24)
        else:
            tenure = random.randint(24, 72)
        
        # Monthly charges depend on the type of internet service
        if internet_service == 'DSL':
            monthly_charges = random.uniform(25, 75)
        elif internet_service == 'Fiber optic':
            monthly_charges = random.uniform(50, 100)
        else:
            monthly_charges = random.uniform(0, 25)
        
        # Total charges are calculated based on tenure and monthly charges
        total_charges = monthly_charges * tenure
        
        # Churn probability is influenced by tenure, contract type, and monthly charges
        if contract == 'Month-to-month':
            churn_prob = 0.5
        elif contract == 'One year':
            churn_prob = 0.2
        else:
            churn_prob = 0.1
        
        # Increase churn probability for very high monthly charges or low tenure
        if monthly_charges > 80:
            churn_prob += 0.1
        if tenure < 12:
            churn_prob += 0.1
        
        churn = 1 if random.random() < churn_prob else 0
        
        data.append([age, gender, internet_service, contract, tenure, monthly_charges, total_charges, churn])
    
    columns = ['Age', 'Gender', 'InternetService', 'Contract', 'Tenure', 'MonthlyCharges', 'TotalCharges', 'Churn']
    return pd.DataFrame(data, columns=columns)

# Generate dataset with 1000 samples
dataset = generate_logical_dataset(num_samples=1000)

# Display the first few rows of the dataset
dataset.head()


Unnamed: 0,Age,Gender,InternetService,Contract,Tenure,MonthlyCharges,TotalCharges,Churn
0,36,Male,Fiber optic,Two year,69,95.413218,6583.512073,0
1,48,Female,DSL,Two year,67,36.942517,2475.148639,0
2,75,Male,DSL,Month-to-month,9,36.247359,326.226229,1
3,18,Female,DSL,One year,17,31.539804,536.176662,0
4,70,Female,,Month-to-month,7,12.915698,90.409883,1


2. **Data Preprocessing:**
   - We preprocessed the dataset to prepare it for machine learning modeling. This included converting categorical variables (e.g., gender, internet service type, contract duration) into numerical representations using one-hot encoding. 

In [13]:
# Convert categorical variables to numerical using one-hot encoding
dataset_encoded = pd.get_dummies(dataset, columns=['Gender', 'InternetService', 'Contract'], drop_first=True)

# Separate features and target variable
X = dataset_encoded.drop('Churn', axis=1)
y = dataset_encoded['Churn']

3. **Model Building:**
   - We trained a machine learning model specifically a `RandomForestClassifier` to predict whether a customer is likely to churn based on the provided features. 
   - During training, the model learns patterns in the data that indicate whether customers are likely to churn or not.

In [14]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

4. **Model Evaluation:**
   - We evaluated the trained model's performance using metrics such as accuracy score and classification report.
   - The accuracy score tells us how well the model predicts churn compared to the actual outcomes, while the classification report provides insights into precision, recall, and F1-score for churn prediction.

In [15]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Display classification report
print(classification_report(y_test, y_pred))

Accuracy: 0.67
              precision    recall  f1-score   support

           0       0.70      0.71      0.71       112
           1       0.63      0.61      0.62        88

    accuracy                           0.67       200
   macro avg       0.66      0.66      0.66       200
weighted avg       0.67      0.67      0.67       200



5. **User Interface (UI) Development:**
   - We created an interactive UI using `ipywidgets` within a Jupyter Notebook.
   - The UI allows users to input customer information (e.g., age, gender, contract details) through sliders and dropdowns.
   - Upon clicking the "Predict" button, the model uses the provided input to predict whether the customer is likely to churn and displays the prediction along with the probability.

In [16]:
import ipywidgets as widgets
from IPython.display import display

# UI components
age_input = widgets.IntSlider(min=18, max=85, description='Age:')
gender_dropdown = widgets.Dropdown(options=['Male', 'Female'], description='Gender:')
internet_service_dropdown = widgets.Dropdown(options=['DSL', 'Fiber optic', 'None'], description='Internet Service:')
contract_dropdown = widgets.Dropdown(options=['Month-to-month', 'One year', 'Two year'], description='Contract:')
tenure_input = widgets.IntSlider(min=1, max=72, description='Tenure:')
monthly_charges_input = widgets.FloatSlider(min=20.0, max=150.0, step=1.0, description='Monthly Charges:')
predict_button = widgets.Button(description='Predict')

output_label = widgets.Label()

# Function to predict churn and update output
def predict_churn(sender):
    # Assuming input data structure based on your model's expected features
    input_data = {
    'Age': [age_input.value],
    'Gender_Male': [1 if gender_dropdown.value == 'Male' else 0],  # Adjust to match model's training
    'InternetService_Fiber optic': [1 if internet_service_dropdown.value == 'Fiber optic' else 0],
    'InternetService_None': [1 if internet_service_dropdown.value == 'None' else 0],
    'Contract_One year': [1 if contract_dropdown.value == 'One year' else 0],
    'Contract_Two year': [1 if contract_dropdown.value == 'Two year' else 0],
    'Tenure': [tenure_input.value],
    'MonthlyCharges': [monthly_charges_input.value],
    'TotalCharges': [0.0]  # Add if 'TotalCharges' was part of the model's training data
    }

    # Create input_df with correct columns
    input_df = pd.DataFrame(input_data, columns=X_train.columns)

    # Predict churn
    churn_prediction = model.predict(input_df)[0]
    churn_prob = model.predict_proba(input_df)[0][1]

    if churn_prediction == 1:
        output_label.value = f'Predicted Churn: Yes (Probability: {churn_prob:.2f})'
    else:
        output_label.value = f'Predicted Churn: No (Probability: {1 - churn_prob:.2f})'


predict_button.on_click(predict_churn)

# Display UI
ui = widgets.VBox([age_input, gender_dropdown, internet_service_dropdown, contract_dropdown, tenure_input,
                   monthly_charges_input, predict_button, output_label])
display(ui)

VBox(children=(IntSlider(value=18, description='Age:', max=85, min=18), Dropdown(description='Gender:', option…

#### Conclusion:
This project demonstrates a practical application of machine learning in the telecommunications industry. By predicting customer churn, telecom companies can take proactive measures such as targeted marketing campaigns or personalized retention strategies to reduce churn rates and improve customer satisfaction.