# PROJECT OVERVIEW

## 1. Objective
The primary goal of this project is to build a classifier that predicts whether a customer will soon stop doing business with SyriaTel, a telecommunications company. This is a binary classification problem, where the target variable indicates whether a customer churns or not. The insights generated from this project will help SyriaTel to take proactive measures to reduce customer churn, which directly impacts the company’s revenue and profitability.

## 2. Business Understanding:
Customer churn is a critical issue for telecommunications companies like SyriaTel, as acquiring new customers is often more costly than retaining existing ones. By identifying customers who are likely to churn, SyriaTel can implement targeted retention strategies, such as offering discounts or personalized services, to keep those customers from leaving. The aim is to minimize revenue loss and enhance customer satisfaction.

The project focuses on identifying patterns in customer behavior that indicate the likelihood of churn. These patterns can include factors like customer service interactions, usage metrics, and contract details. The goal is to create a predictive model that can classify customers as churners or non-churners with high accuracy, enabling the company to intervene before a customer leaves.

## 3.Data Understanding:
The dataset, titled "Churn in Telecom's dataset," contains information on customers, including various features related to customer demographics, account information, usage metrics, and more. The key challenge is to understand how these features contribute to the likelihood of a customer churning.

Features: These include demographic data (e.g., age, gender), service usage metrics (e.g., number of calls, data usage), customer service interactions, contract details (e.g., contract type, tenure), and payment methods.
Target Variable (y): The target variable is binary, indicating whether a customer has churned or not.
Class Distribution: It's essential to assess the distribution of the target variable to understand if the dataset is balanced or if one class (e.g., churners) is underrepresented.

## 4.Data Preparation:



4.1 Importing the necessary libraries

In [3]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.dummy import DummyClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
import joblib
import matplotlib.pyplot as plt
import seaborn as sns

4.2 Load the Dataset

In [15]:
df = pd.read_csv('Data/Churn_in_Telecoms_dataset.csv')

4.3 Display the First Few Rows

In [16]:
print("First few rows of the dataset:")
print(df.head())

First few rows of the dataset:
  state  account length  area code phone number international plan  \
0    KS             128        415     382-4657                 no   
1    OH             107        415     371-7191                 no   
2    NJ             137        415     358-1921                 no   
3    OH              84        408     375-9999                yes   
4    OK              75        415     330-6626                yes   

  voice mail plan  number vmail messages  total day minutes  total day calls  \
0             yes                     25              265.1              110   
1             yes                     26              161.6              123   
2              no                      0              243.4              114   
3              no                      0              299.4               71   
4              no                      0              166.7              113   

   total day charge  ...  total eve calls  total eve charge  \
0   

4.4 Check for Missing Values

In [17]:
print("\nMissing values in each column:")
print(df.isnull().sum())


Missing values in each column:
state                     0
account length            0
area code                 0
phone number              0
international plan        0
voice mail plan           0
number vmail messages     0
total day minutes         0
total day calls           0
total day charge          0
total eve minutes         0
total eve calls           0
total eve charge          0
total night minutes       0
total night calls         0
total night charge        0
total intl minutes        0
total intl calls          0
total intl charge         0
customer service calls    0
churn                     0
dtype: int64


This shows that there no missing values in our data

4.5 Summary Statistics

 This is to provides summary statistics for numerical columns, including count, mean, standard deviation, min, max, and quartiles. 

In [18]:
print("\nSummary statistics of the dataset:")
print(df.describe())


Summary statistics of the dataset:
       account length    area code  number vmail messages  total day minutes  \
count     3333.000000  3333.000000            3333.000000        3333.000000   
mean       101.064806   437.182418               8.099010         179.775098   
std         39.822106    42.371290              13.688365          54.467389   
min          1.000000   408.000000               0.000000           0.000000   
25%         74.000000   408.000000               0.000000         143.700000   
50%        101.000000   415.000000               0.000000         179.400000   
75%        127.000000   510.000000              20.000000         216.400000   
max        243.000000   510.000000              51.000000         350.800000   

       total day calls  total day charge  total eve minutes  total eve calls  \
count      3333.000000       3333.000000        3333.000000      3333.000000   
mean        100.435644         30.562307         200.980348       100.114311   
std

4.6 Check Data Types

To list all columns in the dataset and the data type of each column

In [19]:
print("\nData types of each column:")
print(df.dtypes)


Data types of each column:
state                      object
account length              int64
area code                   int64
phone number               object
international plan         object
voice mail plan            object
number vmail messages       int64
total day minutes         float64
total day calls             int64
total day charge          float64
total eve minutes         float64
total eve calls             int64
total eve charge          float64
total night minutes       float64
total night calls           int64
total night charge        float64
total intl minutes        float64
total intl calls            int64
total intl charge         float64
customer service calls      int64
churn                        bool
dtype: object


# Modeling