# Projeto Previsão de Churn Rate

### A pergunta inicial e se existe uma correlação entre as variaveis do data-set e com o qual precisão o modelo pode prever a porbabilidade de um cliente evadir

### Variaveis
- Churn: Yes or Not 
- Contract: Month-to-month , One year, Two year
- Dependents: Yes or Not
- DeviceProtection: Yes or Not
- InternetService: Fiber Optic, DSL, No 
- MonthlyCharges= How much the customer spent per month
- MultipleLines: Yes or Not
- OnlineBackup: Yes or Not 
- OnlineSecurity: Yes or Not
- PaperlessBilling: Yes or Not
- Partner: Yes or Not
- PaymentMethod: Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic)
- PhoneService: Yes or Not 
- SeniorCitizen: 0 or 1
- StreamingMovies: Yes or Not
- StreamingTV: Yes or Not
- TechSupport: Yes or Not
- TotalCharges= how much the customer spent in total
- customerID: ID
- gender: The customer is a male or a female
- tenure = Months The customer has stayed with the company

Importando bibliotecas

In [38]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import OneHotEncoder

Chamando arquivo

In [39]:
df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')

Vendo tamanho do arquivo e informações do arquivo

In [40]:
df.shape

(7043, 21)

In [41]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


Convertendo total charges em numerico e encontrando se há valores missing e os removendo

In [42]:
df.TotalCharges = pd.to_numeric(df.TotalCharges, errors='coerce')
df.isnull().sum()

customerID           0
gender               0
SeniorCitizen        0
Partner              0
Dependents           0
tenure               0
PhoneService         0
MultipleLines        0
InternetService      0
OnlineSecurity       0
OnlineBackup         0
DeviceProtection     0
TechSupport          0
StreamingTV          0
StreamingMovies      0
Contract             0
PaperlessBilling     0
PaymentMethod        0
MonthlyCharges       0
TotalCharges        11
Churn                0
dtype: int64

In [43]:
df = df.dropna()

In [44]:
df.shape

(7032, 21)

In [45]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7032 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7032 non-null   object 
 1   gender            7032 non-null   object 
 2   SeniorCitizen     7032 non-null   int64  
 3   Partner           7032 non-null   object 
 4   Dependents        7032 non-null   object 
 5   tenure            7032 non-null   int64  
 6   PhoneService      7032 non-null   object 
 7   MultipleLines     7032 non-null   object 
 8   InternetService   7032 non-null   object 
 9   OnlineSecurity    7032 non-null   object 
 10  OnlineBackup      7032 non-null   object 
 11  DeviceProtection  7032 non-null   object 
 12  TechSupport       7032 non-null   object 
 13  StreamingTV       7032 non-null   object 
 14  StreamingMovies   7032 non-null   object 
 15  Contract          7032 non-null   object 
 16  PaperlessBilling  7032 non-null   object 


In [46]:
df.loc[df['StreamingMovies']=='No internet service', 'StreamingMovies'] = 'No'
df.loc[df['StreamingTV']=='No internet service', 'StreamingTV'] = 'No'
df.loc[df['OnlineSecurity']=='No internet service', 'OnlineSecurity'] = 'No'
df.loc[df['OnlineBackup']=='No internet service', 'OnlineBackup'] = 'No'
df.loc[df['DeviceProtection']=='No internet service', 'DeviceProtection'] = 'No'
df.loc[df['TechSupport']=='No internet service', 'TechSupport'] = 'No'
df.loc[df['OnlineSecurity']=='No internet service', 'OnlineSecurity'] = 'No'
df.loc[df['MultipleLines']=='No phone service', 'MultipleLines'] = 'No'

In [47]:
df.columns

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')

In [48]:
transform = OrdinalEncoder()

Agora vamos tranformas os Yes ou Not em argumentos binarios 1 ou 0

Não esquecer e especificar a colina após o DF que você quer modificar!

In [49]:
col_bin = ['Churn','gender', 'Partner','Dependents','PhoneService','StreamingTV','PaperlessBilling','StreamingMovies','OnlineSecurity',
           'OnlineBackup', 'DeviceProtection', 'TechSupport', 'OnlineSecurity', 'MultipleLines']
df[col_bin] = pd.DataFrame(pd.DataFrame(columns=col_bin, data=transform.fit_transform(df[col_bin])))

In [50]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,0.0,0,1.0,0.0,1,0.0,0.0,DSL,0.0,...,0.0,0.0,0.0,0.0,Month-to-month,1.0,Electronic check,29.85,29.85,0.0
1,5575-GNVDE,1.0,0,0.0,0.0,34,1.0,0.0,DSL,1.0,...,1.0,0.0,0.0,0.0,One year,0.0,Mailed check,56.95,1889.5,0.0
2,3668-QPYBK,1.0,0,0.0,0.0,2,1.0,0.0,DSL,1.0,...,0.0,0.0,0.0,0.0,Month-to-month,1.0,Mailed check,53.85,108.15,1.0
3,7795-CFOCW,1.0,0,0.0,0.0,45,0.0,0.0,DSL,1.0,...,1.0,1.0,0.0,0.0,One year,0.0,Bank transfer (automatic),42.3,1840.75,0.0
4,9237-HQITU,0.0,0,0.0,0.0,2,1.0,0.0,Fiber optic,0.0,...,0.0,0.0,0.0,0.0,Month-to-month,1.0,Electronic check,70.7,151.65,1.0


Agora tranformamos as colunas categoricas menos customerID em colunas

In [51]:
Cat = ['InternetService', 'Contract', 'PaymentMethod']
df = pd.get_dummies(df, columns=Cat)

In [52]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,OnlineSecurity,OnlineBackup,...,InternetService_DSL,InternetService_Fiber optic,InternetService_No,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
0,7590-VHVEG,0.0,0,1.0,0.0,1,0.0,0.0,0.0,1.0,...,1,0,0,1,0,0,0,0,1,0
1,5575-GNVDE,1.0,0,0.0,0.0,34,1.0,0.0,1.0,0.0,...,1,0,0,0,1,0,0,0,0,1
2,3668-QPYBK,1.0,0,0.0,0.0,2,1.0,0.0,1.0,1.0,...,1,0,0,1,0,0,0,0,0,1
3,7795-CFOCW,1.0,0,0.0,0.0,45,0.0,0.0,1.0,0.0,...,1,0,0,0,1,0,1,0,0,0
4,9237-HQITU,0.0,0,0.0,0.0,2,1.0,0.0,0.0,0.0,...,0,1,0,1,0,0,0,0,1,0


## Vamos iniciar a analise exploratoria dos dados

In [53]:
df.corr()

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,OnlineSecurity,OnlineBackup,DeviceProtection,...,InternetService_DSL,InternetService_Fiber optic,InternetService_No,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
gender,1.0,-0.008189,-0.001233,0.010396,-0.00437,-0.007377,-0.009545,-0.016626,-0.013128,-0.001117,...,0.000386,0.016827,-0.020731,0.003716,-0.000503,-0.00385,0.001334,-0.003086,0.00084,0.000766
SeniorCitizen,-0.008189,1.0,-0.00278,-0.007028,0.015683,0.017513,0.001083,-0.004419,-0.007786,-0.00566,...,-0.108276,0.254923,-0.182519,0.137752,-0.046491,-0.116205,-0.016235,-0.024359,0.171322,-0.152987
Partner,-0.001233,-0.00278,1.0,0.452609,0.044033,0.018721,0.142786,0.142901,0.141633,0.153671,...,-0.020062,0.025309,-0.007369,-0.033035,0.008761,0.030141,0.021147,0.016504,-0.023405,-0.010686
Dependents,0.010396,-0.007028,0.452609,1.0,0.042703,-0.00105,-0.024365,0.080815,0.023145,0.013984,...,-0.012032,0.013689,-0.002624,-0.038186,0.010296,0.034681,-0.001837,0.02482,-0.011287,-0.009827
tenure,-0.00437,0.015683,0.044033,0.042703,1.0,-0.002827,0.023107,0.015404,0.026769,0.028203,...,0.013786,0.01793,-0.037529,-0.649346,0.202338,0.563801,0.243822,0.2328,-0.210197,-0.232181
PhoneService,-0.007377,0.017513,0.018721,-0.00105,-0.002827,1.0,0.279579,-0.091387,-0.05189,-0.070159,...,-0.045929,0.0448,-0.001031,0.017158,-0.010329,-0.01015,-0.013904,0.005875,0.024223,-0.019328
MultipleLines,-0.009545,0.001083,0.142786,-0.024365,0.023107,0.279579,1.0,0.099143,0.203262,0.201771,...,-0.020224,0.039338,-0.024096,-0.004721,-0.011572,0.016526,-0.007551,0.027341,0.002563,-0.022262
OnlineSecurity,-0.016626,-0.004419,0.142901,0.080815,0.015404,-0.091387,0.099143,1.0,0.282987,0.275224,...,0.033465,-0.014292,-0.02137,-0.019237,0.004956,0.01769,0.007849,0.0255,-0.029245,0.000174
OnlineBackup,-0.013128,-0.007786,0.141633,0.023145,0.026769,-0.05189,0.203262,0.282987,1.0,0.30309,...,-0.002579,0.017214,-0.017777,-0.018534,-0.011174,0.03224,0.006995,0.034865,-0.021862,-0.016483
DeviceProtection,-0.001117,-0.00566,0.153671,0.013984,0.028203,-0.070159,0.201771,0.275224,0.30309,1.0,...,0.009059,0.022909,-0.038067,-0.015832,-0.006694,0.024824,0.018738,0.031375,-0.021652,-0.024872


Churn                                      1.000000
PaperlessBilling                           0.191291
StreamingTV                                0.062715
StreamingMovies                            0.060802
MultipleLines                              0.040312
Contract_Month-to-month                    0.035835
PaymentMethod_Electronic check             0.016775
PhoneService                               0.011590
InternetService_DSL                        0.010885
PaymentMethod_Mailed check                 0.009573
InternetService_No                        -0.002737
PaymentMethod_Credit card (automatic)     -0.003313
SeniorCitizen                             -0.006838
InternetService_Fiber optic               -0.008144
gender                                    -0.008813
MonthlyCharges                            -0.013002
Contract_Two year                         -0.020439
Contract_One year                         -0.022367
PaymentMethod_Bank transfer (automatic)   -0.025573
TotalCharges