# Telco customer churn with XGBoost

#### DATA - IBM BASE SAMPLES DATASET 

#### The goal of this project is to predict: `whether or not a customer will stop using a company's service` 

#### In business lingo this is called Customer Churn.

In [1]:
# Importing modules
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import balanced_accuracy_score, roc_auc_score, make_scorer
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

In [2]:
df_raw = pd.read_excel('data/telco_customer_churn.xlsx')

In [3]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [4]:
df_raw.head()

Unnamed: 0,CustomerID,Count,Country,State,City,Zip Code,Lat Long,Latitude,Longitude,Gender,Senior Citizen,Partner,Dependents,Tenure Months,Phone Service,Multiple Lines,Internet Service,Online Security,Online Backup,Device Protection,Tech Support,Streaming TV,Streaming Movies,Contract,Paperless Billing,Payment Method,Monthly Charges,Total Charges,Churn Label,Churn Value,Churn Score,CLTV,Churn Reason
0,3668-QPYBK,1,United States,California,Los Angeles,90003,"33.964131, -118.272783",33.964131,-118.272783,Male,No,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes,1,86,3239,Competitor made better offer
1,9237-HQITU,1,United States,California,Los Angeles,90005,"34.059281, -118.30742",34.059281,-118.30742,Female,No,No,Yes,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes,1,67,2701,Moved
2,9305-CDSKC,1,United States,California,Los Angeles,90006,"34.048013, -118.293953",34.048013,-118.293953,Female,No,No,Yes,8,Yes,Yes,Fiber optic,No,No,Yes,No,Yes,Yes,Month-to-month,Yes,Electronic check,99.65,820.5,Yes,1,86,5372,Moved
3,7892-POOKP,1,United States,California,Los Angeles,90010,"34.062125, -118.315709",34.062125,-118.315709,Female,No,Yes,Yes,28,Yes,Yes,Fiber optic,No,No,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,104.8,3046.05,Yes,1,84,5003,Moved
4,0280-XJGEX,1,United States,California,Los Angeles,90015,"34.039224, -118.266293",34.039224,-118.266293,Male,No,No,Yes,49,Yes,Yes,Fiber optic,No,Yes,Yes,No,Yes,Yes,Month-to-month,Yes,Bank transfer (automatic),103.7,5036.3,Yes,1,89,5340,Competitor had better devices


#### The last 3 columns are exit interview data so we cannot use that data for prediction if someone will leave or not. This is the data which is gathered after that person left so we wouldn't have it for predictions any way. Because of that we have to remove those columns. Also 'Churn label' column is the same as 'Churn value' only using Yes/No insted 1/0.

In [5]:
df_eda = df_raw.copy()
df_eda.drop(['Churn Label', 'Churn Score', 'CLTV', 'Churn Reason'], axis =1, inplace = True)
df_eda.head()

Unnamed: 0,CustomerID,Count,Country,State,City,Zip Code,Lat Long,Latitude,Longitude,Gender,Senior Citizen,Partner,Dependents,Tenure Months,Phone Service,Multiple Lines,Internet Service,Online Security,Online Backup,Device Protection,Tech Support,Streaming TV,Streaming Movies,Contract,Paperless Billing,Payment Method,Monthly Charges,Total Charges,Churn Value
0,3668-QPYBK,1,United States,California,Los Angeles,90003,"33.964131, -118.272783",33.964131,-118.272783,Male,No,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,1
1,9237-HQITU,1,United States,California,Los Angeles,90005,"34.059281, -118.30742",34.059281,-118.30742,Female,No,No,Yes,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,1
2,9305-CDSKC,1,United States,California,Los Angeles,90006,"34.048013, -118.293953",34.048013,-118.293953,Female,No,No,Yes,8,Yes,Yes,Fiber optic,No,No,Yes,No,Yes,Yes,Month-to-month,Yes,Electronic check,99.65,820.5,1
3,7892-POOKP,1,United States,California,Los Angeles,90010,"34.062125, -118.315709",34.062125,-118.315709,Female,No,Yes,Yes,28,Yes,Yes,Fiber optic,No,No,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,104.8,3046.05,1
4,0280-XJGEX,1,United States,California,Los Angeles,90015,"34.039224, -118.266293",34.039224,-118.266293,Male,No,No,Yes,49,Yes,Yes,Fiber optic,No,Yes,Yes,No,Yes,Yes,Month-to-month,Yes,Bank transfer (automatic),103.7,5036.3,1


In [6]:
df_eda.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 29 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   CustomerID         7043 non-null   object 
 1   Count              7043 non-null   int64  
 2   Country            7043 non-null   object 
 3   State              7043 non-null   object 
 4   City               7043 non-null   object 
 5   Zip Code           7043 non-null   int64  
 6   Lat Long           7043 non-null   object 
 7   Latitude           7043 non-null   float64
 8   Longitude          7043 non-null   float64
 9   Gender             7043 non-null   object 
 10  Senior Citizen     7043 non-null   object 
 11  Partner            7043 non-null   object 
 12  Dependents         7043 non-null   object 
 13  Tenure Months      7043 non-null   int64  
 14  Phone Service      7043 non-null   object 
 15  Multiple Lines     7043 non-null   object 
 16  Internet Service   7043 

In [7]:
df_eda.describe(include='all')

Unnamed: 0,CustomerID,Count,Country,State,City,Zip Code,Lat Long,Latitude,Longitude,Gender,Senior Citizen,Partner,Dependents,Tenure Months,Phone Service,Multiple Lines,Internet Service,Online Security,Online Backup,Device Protection,Tech Support,Streaming TV,Streaming Movies,Contract,Paperless Billing,Payment Method,Monthly Charges,Total Charges,Churn Value
count,7043,7043.0,7043,7043,7043,7043.0,7043,7043.0,7043.0,7043,7043,7043,7043,7043.0,7043,7043,7043,7043,7043,7043,7043,7043,7043,7043,7043,7043,7043.0,7043.0,7043.0
unique,7043,,1,1,1129,,1652,,,2,2,2,2,,2,3,3,3,3,3,3,3,3,3,2,4,,6531.0,
top,3668-QPYBK,,United States,California,Los Angeles,,"33.964131, -118.272783",,,Male,No,No,No,,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,,20.2,
freq,1,,7043,7043,305,,5,,,3555,5901,3641,5416,,6361,3390,3096,3498,3088,3095,3473,2810,2785,3875,4171,2365,,11.0,
mean,,1.0,,,,93521.964646,,36.282441,-119.79888,,,,,32.371149,,,,,,,,,,,,,64.761692,,0.26537
std,,0.0,,,,1865.794555,,2.455723,2.157889,,,,,24.559481,,,,,,,,,,,,,30.090047,,0.441561
min,,1.0,,,,90001.0,,32.555828,-124.301372,,,,,0.0,,,,,,,,,,,,,18.25,,0.0
25%,,1.0,,,,92102.0,,34.030915,-121.815412,,,,,9.0,,,,,,,,,,,,,35.5,,0.0
50%,,1.0,,,,93552.0,,36.391777,-119.730885,,,,,29.0,,,,,,,,,,,,,70.35,,0.0
75%,,1.0,,,,95351.0,,38.224869,-118.043237,,,,,55.0,,,,,,,,,,,,,89.85,,1.0


#### After quick check of the dataset:

* We can drop CustomerID because it has no predictive power - it's just a number assigned to every individual customer (all are unique)
* We can drop also 'Count' because as we can see there are only 1's in that column
* We can drop 'Country' and 'State' because that dataset only have data from United States - California
* We can also drop 'Lat Long' column because those are columns 'Latitude' and 'Longitude' combined together


In [8]:
df_eda.drop(['CustomerID', 'Count', 'Country', 'State', 'Lat Long'], axis = 1, inplace = True)
df_eda.head()

Unnamed: 0,City,Zip Code,Latitude,Longitude,Gender,Senior Citizen,Partner,Dependents,Tenure Months,Phone Service,Multiple Lines,Internet Service,Online Security,Online Backup,Device Protection,Tech Support,Streaming TV,Streaming Movies,Contract,Paperless Billing,Payment Method,Monthly Charges,Total Charges,Churn Value
0,Los Angeles,90003,33.964131,-118.272783,Male,No,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,1
1,Los Angeles,90005,34.059281,-118.30742,Female,No,No,Yes,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,1
2,Los Angeles,90006,34.048013,-118.293953,Female,No,No,Yes,8,Yes,Yes,Fiber optic,No,No,Yes,No,Yes,Yes,Month-to-month,Yes,Electronic check,99.65,820.5,1
3,Los Angeles,90010,34.062125,-118.315709,Female,No,Yes,Yes,28,Yes,Yes,Fiber optic,No,No,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,104.8,3046.05,1
4,Los Angeles,90015,34.039224,-118.266293,Male,No,No,Yes,49,Yes,Yes,Fiber optic,No,Yes,Yes,No,Yes,Yes,Month-to-month,Yes,Bank transfer (automatic),103.7,5036.3,1


#### For XGBoost classification it doesn't matter if there are white spaces between words (like in City column) but to draw a tree at the end we cannot have them. The same is with column names.

In [9]:
df_eda['City'].replace(' ','', regex=True, inplace=True)

In [10]:
df_eda['City'].unique()[:10]

array(['LosAngeles', 'BeverlyHills', 'HuntingtonPark', 'Lynwood',
       'MarinaDelRey', 'Inglewood', 'SantaMonica', 'Torrance', 'Whittier',
       'LaHabra'], dtype=object)

In [11]:
df_eda.columns = df_eda.columns.str.replace(' ','')

## Missing data

In [12]:
df_eda.isna().sum()

City                0
ZipCode             0
Latitude            0
Longitude           0
Gender              0
SeniorCitizen       0
Partner             0
Dependents          0
TenureMonths        0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
ChurnValue          0
dtype: int64

#### At first look it seems that there are no missing values. But let's make sure ...

In [13]:
for column in df_eda:
    print(df_eda[column].unique())

['LosAngeles' 'BeverlyHills' 'HuntingtonPark' ... 'Standish' 'Tulelake'
 'OlympicValley']
[90003 90005 90006 ... 96128 96134 96146]
[33.964131 34.059281 34.048013 ... 40.346634 41.813521 39.191797]
[-118.272783 -118.30742  -118.293953 ... -120.386422 -121.492666
 -120.212401]
['Male' 'Female']
['No' 'Yes']
['No' 'Yes']
['No' 'Yes']
[ 2  8 28 49 10  1 47 17  5 34 11 15 18  9  7 12 25 68 55 37  3 27 20  4
 58 53 13  6 19 59 16 52 24 32 38 54 43 63 21 69 22 61 60 48 40 23 39 35
 56 65 33 30 45 46 62 70 50 44 71 26 14 41 66 64 29 42 67 51 31 57 36 72
  0]
['Yes' 'No']
['No' 'Yes' 'No phone service']
['DSL' 'Fiber optic' 'No']
['Yes' 'No' 'No internet service']
['Yes' 'No' 'No internet service']
['No' 'Yes' 'No internet service']
['No' 'Yes' 'No internet service']
['No' 'Yes' 'No internet service']
['No' 'Yes' 'No internet service']
['Month-to-month' 'Two year' 'One year']
['Yes' 'No']
['Mailed check' 'Electronic check' 'Bank transfer (automatic)'
 'Credit card (automatic)']
[ 53.85  70.7  

#### To be sure that everything is ok we should also check if the datatypes match the values in columns. By doing this we can see that even though 'TotalCharges' column looks like only numbers it's datatype is 'object'. We have to take a closer look at this column.

#### From descriptive statistics we can see that there are 6531 unique values so it would be very difficult to find what is wrong. Let's try to convert those values to numerical and see what will happen.

In [14]:
df_eda['Total_Charges'] = pd.to_numeric(df_eda['TotalCharges'])

ValueError: Unable to parse string " " at position 2234

In [15]:
df_eda[2230:2240]

Unnamed: 0,City,ZipCode,Latitude,Longitude,Gender,SeniorCitizen,Partner,Dependents,TenureMonths,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,ChurnValue
2230,Yucaipa,92399,34.04597,-117.011825,Female,No,Yes,No,72,Yes,Yes,Fiber optic,No,Yes,Yes,Yes,Yes,Yes,One year,Yes,Bank transfer (automatic),108.5,8003.8,0
2231,SanBernardino,92404,34.183286,-117.221722,Male,No,Yes,Yes,72,Yes,Yes,Fiber optic,No,No,No,No,No,Yes,Two year,Yes,Credit card (automatic),84.5,6130.85,0
2232,SanBernardino,92405,34.142747,-117.300864,Female,No,No,No,15,Yes,Yes,Fiber optic,No,Yes,No,No,Yes,Yes,Month-to-month,Yes,Electronic check,100.15,1415.0,0
2233,SanBernardino,92407,34.250069,-117.393949,Male,No,No,No,72,Yes,Yes,DSL,No,Yes,Yes,Yes,Yes,Yes,Two year,Yes,Bank transfer (automatic),88.6,6201.95,0
2234,SanBernardino,92408,34.084909,-117.258107,Female,No,Yes,No,0,No,No phone service,DSL,Yes,No,Yes,Yes,Yes,No,Two year,Yes,Bank transfer (automatic),52.55,,0
2235,SanBernardino,92411,34.122501,-117.320138,Male,No,Yes,Yes,63,Yes,Yes,Fiber optic,No,Yes,Yes,No,Yes,Yes,Two year,Yes,Bank transfer (automatic),104.8,6597.25,0
2236,Riverside,92501,33.994676,-117.372498,Female,No,No,No,2,Yes,No,DSL,No,Yes,No,No,Yes,No,Month-to-month,Yes,Electronic check,59.0,114.15,0
2237,Riverside,92504,33.9108,-117.398153,Male,Yes,Yes,No,61,Yes,Yes,DSL,No,Yes,No,No,No,Yes,One year,No,Bank transfer (automatic),64.05,3902.6,0
2238,Riverside,92505,33.920907,-117.489426,Male,No,No,No,1,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Month-to-month,No,Mailed check,20.4,20.4,0
2239,Riverside,92507,33.976328,-117.319786,Male,No,Yes,No,28,Yes,No,DSL,Yes,Yes,No,Yes,No,No,Month-to-month,No,Mailed check,60.9,1785.65,0


#### ... and we get an error. As we can see there is a "space" instead of any value in that column, that's why the datatype is object and that's why we cannot change it to numeric values. Let's see how many of those "missing values" there are in the dataset.

In [16]:
len(df_eda.loc[df_eda['TotalCharges'] == ' '])

11

In [17]:
df_eda[df_eda['TotalCharges'] == ' ']

Unnamed: 0,City,ZipCode,Latitude,Longitude,Gender,SeniorCitizen,Partner,Dependents,TenureMonths,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,ChurnValue
2234,SanBernardino,92408,34.084909,-117.258107,Female,No,Yes,No,0,No,No phone service,DSL,Yes,No,Yes,Yes,Yes,No,Two year,Yes,Bank transfer (automatic),52.55,,0
2438,Independence,93526,36.869584,-118.189241,Male,No,No,No,0,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.25,,0
2568,SanMateo,94401,37.590421,-122.306467,Female,No,Yes,No,0,Yes,No,DSL,Yes,Yes,Yes,No,Yes,Yes,Two year,No,Mailed check,80.85,,0
2667,Cupertino,95014,37.306612,-122.080621,Male,No,Yes,Yes,0,Yes,Yes,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,25.75,,0
2856,Redcrest,95569,40.363446,-123.835041,Female,No,Yes,No,0,No,No phone service,DSL,Yes,Yes,Yes,Yes,Yes,No,Two year,No,Credit card (automatic),56.05,,0
4331,LosAngeles,90029,34.089953,-118.294824,Male,No,Yes,Yes,0,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,19.85,,0
4687,SunCity,92585,33.739412,-117.173334,Male,No,Yes,Yes,0,Yes,Yes,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,25.35,,0
5104,BenLomond,95005,37.078873,-122.090386,Female,No,Yes,Yes,0,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.0,,0
5719,LaVerne,91750,34.144703,-117.770299,Male,No,Yes,Yes,0,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,One year,Yes,Mailed check,19.7,,0
6772,Bell,90201,33.970343,-118.171368,Female,No,Yes,Yes,0,Yes,Yes,DSL,No,Yes,Yes,Yes,Yes,No,Two year,No,Mailed check,73.35,,0


#### Because there are only 11 rows we could easily remove those rows with probably no harm to the results. But if we will take a closer look at other columns we can notice that in "TenureMonths" column we have 0's. That means that those people just subscribed and they were not charged for anything yet. That's why we can change all those blank spaces into 0's. 

In [18]:
# We can replace blanks like we did before:
# df_eda['TotalCharges'].replace(' ', 0, regex=True, inplace=True)

# or we can do that other way:
df_eda.loc[(df_eda['TotalCharges'] == ' '), 'TotalCharges'] = 0

In [19]:
df_eda[df_eda['TenureMonths'] == 0]

Unnamed: 0,City,ZipCode,Latitude,Longitude,Gender,SeniorCitizen,Partner,Dependents,TenureMonths,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,ChurnValue
2234,SanBernardino,92408,34.084909,-117.258107,Female,No,Yes,No,0,No,No phone service,DSL,Yes,No,Yes,Yes,Yes,No,Two year,Yes,Bank transfer (automatic),52.55,0,0
2438,Independence,93526,36.869584,-118.189241,Male,No,No,No,0,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.25,0,0
2568,SanMateo,94401,37.590421,-122.306467,Female,No,Yes,No,0,Yes,No,DSL,Yes,Yes,Yes,No,Yes,Yes,Two year,No,Mailed check,80.85,0,0
2667,Cupertino,95014,37.306612,-122.080621,Male,No,Yes,Yes,0,Yes,Yes,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,25.75,0,0
2856,Redcrest,95569,40.363446,-123.835041,Female,No,Yes,No,0,No,No phone service,DSL,Yes,Yes,Yes,Yes,Yes,No,Two year,No,Credit card (automatic),56.05,0,0
4331,LosAngeles,90029,34.089953,-118.294824,Male,No,Yes,Yes,0,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,19.85,0,0
4687,SunCity,92585,33.739412,-117.173334,Male,No,Yes,Yes,0,Yes,Yes,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,25.35,0,0
5104,BenLomond,95005,37.078873,-122.090386,Female,No,Yes,Yes,0,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.0,0,0
5719,LaVerne,91750,34.144703,-117.770299,Male,No,Yes,Yes,0,Yes,No,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,One year,Yes,Mailed check,19.7,0,0
6772,Bell,90201,33.970343,-118.171368,Female,No,Yes,Yes,0,Yes,Yes,DSL,No,Yes,Yes,Yes,Yes,No,Two year,No,Mailed check,73.35,0,0


In [20]:
df_eda['TotalCharges'] = pd.to_numeric(df_eda['TotalCharges'])
df_eda.dtypes

City                 object
ZipCode               int64
Latitude            float64
Longitude           float64
Gender               object
SeniorCitizen        object
Partner              object
Dependents           object
TenureMonths          int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges        float64
ChurnValue            int64
dtype: object

#### Now the dataset looks ok. The last thing we have to take care are whitespaces between words in all of the other column values. And again it's not neessary to do this to use XGBoost because white spaces are not important in here but because our graph at the end will look better after doing this.

In [21]:
df_eda.replace(' ', '_', regex=True, inplace=True)
df_eda.head()

Unnamed: 0,City,ZipCode,Latitude,Longitude,Gender,SeniorCitizen,Partner,Dependents,TenureMonths,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,ChurnValue
0,LosAngeles,90003,33.964131,-118.272783,Male,No,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed_check,53.85,108.15,1
1,LosAngeles,90005,34.059281,-118.30742,Female,No,No,Yes,2,Yes,No,Fiber_optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic_check,70.7,151.65,1
2,LosAngeles,90006,34.048013,-118.293953,Female,No,No,Yes,8,Yes,Yes,Fiber_optic,No,No,Yes,No,Yes,Yes,Month-to-month,Yes,Electronic_check,99.65,820.5,1
3,LosAngeles,90010,34.062125,-118.315709,Female,No,Yes,Yes,28,Yes,Yes,Fiber_optic,No,No,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic_check,104.8,3046.05,1
4,LosAngeles,90015,34.039224,-118.266293,Male,No,No,Yes,49,Yes,Yes,Fiber_optic,No,Yes,Yes,No,Yes,Yes,Month-to-month,Yes,Bank_transfer_(automatic),103.7,5036.3,1


## Splitting the data into independent and dependent variables

In [22]:
X = df_eda.drop('ChurnValue', axis = 1).copy()
X.head()

Unnamed: 0,City,ZipCode,Latitude,Longitude,Gender,SeniorCitizen,Partner,Dependents,TenureMonths,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges
0,LosAngeles,90003,33.964131,-118.272783,Male,No,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed_check,53.85,108.15
1,LosAngeles,90005,34.059281,-118.30742,Female,No,No,Yes,2,Yes,No,Fiber_optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic_check,70.7,151.65
2,LosAngeles,90006,34.048013,-118.293953,Female,No,No,Yes,8,Yes,Yes,Fiber_optic,No,No,Yes,No,Yes,Yes,Month-to-month,Yes,Electronic_check,99.65,820.5
3,LosAngeles,90010,34.062125,-118.315709,Female,No,Yes,Yes,28,Yes,Yes,Fiber_optic,No,No,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic_check,104.8,3046.05
4,LosAngeles,90015,34.039224,-118.266293,Male,No,No,Yes,49,Yes,Yes,Fiber_optic,No,Yes,Yes,No,Yes,Yes,Month-to-month,Yes,Bank_transfer_(automatic),103.7,5036.3


In [23]:
y = df_eda['ChurnValue'].copy()
y.head()

0    1
1    1
2    1
3    1
4    1
Name: ChurnValue, dtype: int64

## One-hot-encoding

In [24]:
X_encoded = pd.get_dummies(X,
                           columns = ['City',
                                      'Gender',
                                      'SeniorCitizen',
                                      'Partner',
                                      'Dependents',
                                      'PhoneService',
                                      'MultipleLines',
                                      'InternetService',
                                      'OnlineSecurity',
                                      'OnlineBackup',
                                      'DeviceProtection',
                                      'TechSupport',
                                      'StreamingTV',
                                      'StreamingMovies',
                                      'Contract',
                                      'PaperlessBilling',
                                      'PaymentMethod'])
X_encoded.head()

Unnamed: 0,ZipCode,Latitude,Longitude,TenureMonths,MonthlyCharges,TotalCharges,City_Acampo,City_Acton,City_Adelanto,City_Adin,City_AgouraHills,City_Aguanga,City_Ahwahnee,City_Alameda,City_Alamo,City_Albany,City_Albion,City_Alderpoint,City_Alhambra,City_AlisoViejo,City_Alleghany,City_Alpaugh,City_Alpine,City_Alta,City_Altadena,City_Alturas,City_Alviso,City_AmadorCity,City_Amboy,City_Anaheim,City_Anderson,City_AngelsCamp,City_AngelusOaks,City_Angwin,City_Annapolis,City_Antelope,City_Antioch,City_Anza,City_AppleValley,City_Applegate,City_Aptos,City_Arbuckle,City_Arcadia,City_Arcata,City_Armona,City_Arnold,City_Aromas,City_ArroyoGrande,City_Artesia,City_Arvin,City_Atascadero,City_Atherton,City_Atwater,City_Auberry,City_Auburn,City_Avalon,City_Avenal,City_Avery,City_AvilaBeach,City_Azusa,City_Badger,City_Baker,City_Bakersfield,City_BaldwinPark,City_Ballico,City_Bangor,City_Banning,City_Barstow,City_BassLake,City_Bayside,City_BealeAfb,City_Beaumont,City_Bell,City_BellaVista,City_Bellflower,City_Belmont,City_BelvedereTiburon,City_BenLomond,City_Benicia,City_Benton,City_Berkeley,City_BerryCreek,City_BethelIsland,City_BeverlyHills,City_Bieber,City_BigBar,City_BigBearCity,City_BigBearLake,City_BigBend,City_BigCreek,City_BigOakFlat,City_BigPine,City_BigSur,City_Biggs,City_Biola,City_BirdsLanding,City_Bishop,City_BlairsdenGraeagle,City_Blocksburg,City_Bloomington,City_BlueLake,City_Blythe,City_Bodega,City_BodegaBay,City_Bodfish,City_Bolinas,City_Bonita,City_Bonsall,City_Boonville,City_Boron,City_BorregoSprings,City_BoulderCreek,City_Boulevard,City_Bradley,City_Branscomb,City_Brawley,City_Brea,City_Brentwood,City_Bridgeport,City_Bridgeville,City_Brisbane,City_Brookdale,City_Brooks,City_BrownsValley,City_Brownsville,City_Buellton,City_BuenaPark,City_Burbank,City_Burlingame,City_Burney,City_BurntRanch,City_Burson,City_ButteCity,City_Buttonwillow,City_Byron,City_Cabazon,City_Calabasas,City_Calexico,City_Caliente,City_CaliforniaCity,City_CaliforniaHotSprings,City_Calimesa,City_Calipatria,City_Calistoga,City_Callahan,City_Calpine,City_Camarillo,City_Cambria,City_Camino,City_CampNelson,City_Campbell,City_Campo,City_CampoSeco,City_Camptonville,City_Canby,City_CanogaPark,City_CantuaCreek,City_CanyonCountry,City_CanyonDam,City_Capay,City_CapistranoBeach,City_Capitola,City_CardiffByTheSea,City_Carlotta,City_Carlsbad,City_Carmel,City_CarmelByTheSea,City_CarmelValley,City_Carmichael,City_CarnelianBay,City_Carpinteria,City_Carson,City_Caruthers,City_Casmalia,City_Caspar,City_Cassel,City_Castaic,City_Castella,City_CastroValley,City_Castroville,City_CathedralCity,City_CatheysValley,City_Cayucos,City_Cazadero,City_CedarGlen,City_Cedarville,City_Ceres,City_Cerritos,City_Challenge,City_Chatsworth,City_Chester,City_Chico,City_Chilcoot,City_Chino,City_ChinoHills,City_Chowchilla,City_Chualar,City_ChulaVista,City_CitrusHeights,City_Claremont,City_Clarksburg,City_Clayton,City_Clearlake,City_ClearlakeOaks,City_Clements,City_Clio,City_ClipperMills,City_Cloverdale,City_Clovis,City_Coachella,City_Coalinga,City_Coarsegold,City_Cobb,City_Coleville,City_Colfax,City_Colton,City_Columbia,City_Colusa,City_Comptche,City_Compton,City_Concord,City_Cool,City_Copperopolis,City_Corcoran,City_Corning,City_Corona,City_CoronaDelMar,City_Coronado,City_CorteMadera,City_CostaMesa,City_Cotati,City_Cottonwood,City_Coulterville,City_Courtland,City_Covelo,City_Covina,City_CrescentCity,City_CrescentMills,City_Cressey,City_Crestline,City_Creston,City_Crockett,City_CrowsLanding,City_CulverCity,City_Cupertino,City_Cutler,City_Cypress,City_Daggett,City_DalyCity,City_DanaPoint,City_Danville,City_Darwin,City_Davenport,City_Davis,City_DavisCreek,City_DeathValley,City_DeerPark,City_DelMar,City_DelRey,City_Delano,City_Delhi,City_Denair,City_Descanso,City_DesertCenter,City_DesertHotSprings,City_DiamondBar,City_DiamondSprings,City_DillonBeach,City_Dinuba,City_Dixon,City_Dobbins,City_Dorris,City_DosPalos,City_DosRios,City_DouglasCity,City_Downey,City_Downieville,City_Doyle,City_Duarte,City_Dublin,City_Ducor,City_Dulzura,City_DuncansMills,City_Dunlap,City_Dunnigan,City_Dunsmuir,City_Durham,City_DutchFlat,City_Eagleville,City_Earlimart,City_Earp,City_EchoLake,City_Edwards,City_ElCajon,City_ElCentro,City_ElCerrito,City_ElDorado,City_ElDoradoHills,City_ElMonte,City_ElNido,City_ElPortal,City_ElSegundo,City_ElSobrante,City_Eldridge,City_Elk,City_ElkCreek,City_ElkGrove,City_Elmira,City_Elverta,City_Emeryville,City_EmigrantGap,City_Encinitas,City_Encino,City_Escalon,City_Escondido,City_Esparto,City_Essex,City_Etna,City_Eureka,City_Exeter,City_FairOaks,City_Fairfax,City_Fairfield,City_FallRiverMills,City_Fallbrook,City_Farmersville,City_Farmington,City_Fawnskin,City_Fellows,City_Felton,City_Ferndale,City_Fiddletown,City_FieldsLanding,City_Fillmore,City_Firebaugh,City_FishCamp,City_FivePoints,City_Flournoy,City_Folsom,City_Fontana,City_FoothillRanch,City_Forbestown,City_ForestFalls,City_ForestKnolls,City_ForestRanch,City_Foresthill,City_Forestville,City_ForksOfSalmon,City_FortBidwell,City_FortBragg,City_FortIrwin,City_FortJones,City_Fortuna,City_FountainValley,City_Fowler,City_FrazierPark,City_Freedom,City_Fremont,City_FrenchCamp,City_FrenchGulch,City_Fresno,City_Friant,City_Fullerton,City_Fulton,City_Galt,City_Garberville,City_GardenGrove,City_GardenValley,City_Gardena,City_Gasquet,City_Gazelle,City_Georgetown,City_Gerber,City_Geyserville,City_Gilroy,City_GlenEllen,City_Glencoe,City_Glendale,City_Glendora,City_Glenhaven,City_Glenn,City_Glennville,City_GoldRun,City_Goleta,City_Gonzales,City_GoodyearsBar,City_GranadaHills,City_GrandTerrace,City_GraniteBay,City_GrassValley,City_Graton,City_GreenValleyLake,City_Greenbrae,City_Greenfield,City_Greenview,City_Greenville,City_Greenwood,City_Grenada,City_Gridley,City_Grimes,City_GrizzlyFlats,City_Groveland,City_GroverBeach,City_Guadalupe,City_Gualala,City_Guatay,City_Guerneville,City_Guinda,City_Gustine,City_HaciendaHeights,City_HalfMoonBay,City_HamiltonCity,City_Hanford,City_HappyCamp,City_HarborCity,City_HatCreek,City_HathawayPines,City_HawaiianGardens,City_Hawthorne,City_Hayfork,City_Hayward,City_Healdsburg,City_Heber,City_Helendale,City_Helm,City_Hemet,City_Herald,City_Hercules,City_Herlong,City_HermosaBeach,City_Hesperia,City_Hickman,City_Highland,City_Hilmar,City_Hinkley,City_Hollister,City_Holtville,City_Homeland,City_Homewood,City_Honeydew,City_Hood,City_Hoopa,City_Hopland,City_Hornbrook,City_Hornitos,City_Hughson,City_Hume,City_HuntingtonBeach,City_HuntingtonPark,City_Huron,City_Hyampom,City_Hydesville,City_Idyllwild,City_Igo,City_Imperial,City_ImperialBeach,City_Independence,City_IndianWells,City_Indio,City_Inglewood,City_Inverness,City_Inyokern,City_Ione,City_Irvine,City_Isleton,City_Ivanhoe,City_Jackson,City_Jacumba,City_Jamestown,City_Jamul,City_Janesville,City_Jenner,City_Johannesburg,City_Jolon,City_JoshuaTree,City_Julian,City_JunctionCity,City_JuneLake,City_Keeler,City_Keene,City_Kelseyville,City_Kenwood,City_Kerman,City_Kernville,City_KettlemanCity,City_Keyes,City_KingCity,City_KingsBeach,City_Kingsburg,City_Kirkwood,City_Klamath,City_KlamathRiver,City_Kneeland,City_KnightsLanding,City_Korbel,City_Kyburz,City_LaCanadaFlintridge,City_LaCrescenta,City_LaGrange,City_LaHabra,City_LaHonda,City_LaJolla,City_LaMesa,City_LaMirada,City_LaPalma,City_LaPuente,City_LaQuinta,City_LaVerne,City_LaderaRanch,City_Lafayette,City_LagunaBeach,City_LagunaHills,City_LagunaNiguel,City_Lagunitas,City_LakeArrowhead,City_LakeCity,City_LakeElsinore,City_LakeForest,City_LakeHughes,City_LakeIsabella,City_Lakehead,City_Lakeport,City_Lakeshore,City_Lakeside,City_Lakewood,City_Lamont,City_Lancaster,City_Landers,City_Larkspur,City_Lathrop,City_Laton,City_Lawndale,City_Laytonville,City_LeGrand,City_Lebec,City_LeeVining,City_Leggett,City_LemonCove,City_LemonGrove,City_Lemoore,City_Lewiston,City_Likely,City_Lincoln,City_Linden,City_Lindsay,City_Litchfield,City_LittleRiver,City_Littlerock,City_LiveOak,City_Livermore,City_Livingston,City_Llano,City_Lockeford,City_Lockwood,City_Lodi,City_Loleta,City_LomaLinda,City_LomaMar,City_Lomita,City_Lompoc,City_LonePine,City_LongBarn,City_LongBeach,City_Lookout,City_Loomis,City_LosAlamitos,City_LosAlamos,City_LosAltos,City_LosAngeles,City_LosBanos,City_LosGatos,City_LosMolinos,City_LosOlivos,City_LosOsos,City_LostHills,City_Lotus,City_LowerLake,City_Loyalton,City_Lucerne,City_LucerneValley,City_Ludlow,City_Lynwood,City_LytleCreek,City_Macdoel,City_MadRiver,City_Madeline,City_Madera,City_Madison,City_Magalia,City_Malibu,City_MammothLakes,City_Manchester,City_ManhattanBeach,City_Manteca,City_Manton,City_MarchAirReserveBase,City_Maricopa,City_Marina,City_MarinaDelRey,City_Mariposa,City_Markleeville,City_Marshall,City_Martinez,City_Marysville,City_Mather,City_Maxwell,City_Maywood,City_McFarland,City_McKittrick,City_Mcarthur,City_Mccloud,City_Mckinleyville,City_MeadowValley,City_MeadowVista,City_Mecca,City_Mendocino,City_Mendota,City_Menifee,City_MenloPark,City_Mentone,City_Merced,City_Meridian,City_MiWukVillage,City_Middletown,City_Midpines,City_MidwayCity,City_Milford,City_MillCreek,City_MillValley,City_Millbrae,City_Millville,City_Milpitas,City_Mineral,City_MiraLoma,City_Miramonte,City_Miranda,City_MissionHills,City_MissionViejo,City_Modesto,City_Mojave,City_MokelumneHill,City_Monrovia,City_Montague,City_Montara,City_Montclair,City_MonteRio,City_Montebello,City_Monterey,City_MontereyPark,City_MontgomeryCreek,City_Montrose,City_Moorpark,City_Moraga,City_MorenoValley,City_MorganHill,City_MorongoValley,City_MorroBay,City_MossBeach,City_MossLanding,City_MountHamilton,City_MountHermon,City_MountLaguna,City_MountShasta,City_MountainCenter,City_MountainRanch,City_MountainView,City_MtBaldy,City_Murphys,City_Murrieta,City_MyersFlat,City_Napa,City_NationalCity,City_Navarro,City_Needles,City_NevadaCity,City_NewCuyama,City_Newark,City_NewberrySprings,City_NewburyPark,City_Newcastle,City_Newhall,City_Newman,City_NewportBeach,City_NewportCoast,City_Nicasio,City_Nice,City_Nicolaus,City_Niland,City_Nipomo,City_Nipton,City_Norco,City_NorthFork,City_NorthHighlands,City_NorthHills,City_NorthHollywood,City_NorthPalmSprings,City_NorthSanJuan,City_Northridge,City_Norwalk,City_Novato,City_Nubieber,City_Nuevo,City_ONeals,City_OakPark,City_OakRun,City_OakView,City_Oakdale,City_Oakhurst,City_Oakland,City_Oakley,City_Occidental,City_Oceano,City_Oceanside,City_Ocotillo,City_Ojai,City_Olancha,City_OldStation,City_Olema,City_Olivehurst,City_OlympicValley,City_Ontario,City_Onyx,City_Orange,City_OrangeCove,City_Orangevale,City_OregonHouse,City_Orick,City_Orinda,City_Orland,City_Orleans,City_OroGrande,City_Orosi,City_Oroville,City_Oxnard,City_PacificGrove,City_PacificPalisades,City_Pacifica,City_Pacoima,City_Paicines,City_Pala,City_Palermo,City_PalmDesert,City_PalmSprings,City_Palmdale,City_PaloAlto,City_PaloCedro,City_PaloVerde,City_PalomarMountain,City_PalosVerdesPeninsula,City_PanoramaCity,City_Paradise,City_Paramount,City_ParkerDam,City_Parlier,City_Pasadena,City_Paskenta,City_PasoRobles,City_Patterson,City_PaumaValley,City_PaynesCreek,City_Pearblossom,City_PebbleBeach,City_PennValley,City_Penngrove,City_Penryn,City_Perris,City_Pescadero,City_Petaluma,City_Petrolia,City_Phelan,City_Phillipsville,City_Philo,City_PicoRivera,City_Piercy,City_PilotHill,City_PineGrove,City_PineValley,City_Pinecrest,City_Pinole,City_PinonHills,City_Pioneer,City_Pioneertown,City_Piru,City_PismoBeach,City_Pittsburg,City_Pixley,City_Placentia,City_Placerville,City_Planada,City_Platina,City_PlayaDelRey,City_PleasantGrove,City_PleasantHill,City_Pleasanton,City_Plymouth,City_PointArena,City_PointReyesStation,City_PollockPines,City_Pomona,City_PopeValley,City_PortCosta,City_PortHueneme,City_PorterRanch,City_Porterville,City_Portola,City_PortolaValley,City_Posey,City_Potrero,City_PotterValley,City_Poway,City_Prather,City_Princeton,City_Quincy,City_RaisinCity,City_Ramona,City_Ranchita,City_RanchoCordova,City_RanchoCucamonga,City_RanchoMirage,City_RanchoPalosVerdes,City_RanchoSantaFe,City_RanchoSantaMargarita,City_Randsburg,City_Ravendale,City_Raymond,City_RedBluff,City_Redcrest,City_Redding,City_Redlands,City_RedondoBeach,City_Redway,City_RedwoodCity,City_RedwoodValley,City_Reedley,City_Rescue,City_Reseda,City_Rialto,City_Richgrove,City_Richmond,City_Richvale,City_Ridgecrest,City_RioDell,City_RioLinda,City_RioNido,City_RioOso,City_RioVista,City_Ripon,City_RiverPines,City_Riverbank,City_Riverdale,City_Riverside,City_Rocklin,City_Rodeo,City_RohnertPark,City_Rosamond,City_Rosemead,City_Roseville,City_RoughAndReady,City_RoundMountain,City_RowlandHeights,City_RunningSprings,City_Sacramento,City_SaintHelena,City_Salida,City_Salinas,City_SaltonCity,City_Salyer,City_Samoa,City_SanAndreas,City_SanAnselmo,City_SanArdo,City_SanBernardino,City_SanBruno,City_SanCarlos,City_SanClemente,City_SanDiego,City_SanDimas,City_SanFernando,City_SanFrancisco,City_SanGabriel,City_SanGeronimo,City_SanGregorio,City_SanJacinto,City_SanJoaquin,City_SanJose,City_SanJuanBautista,City_SanJuanCapistrano,City_SanLeandro,City_SanLorenzo,City_SanLucas,City_SanLuisObispo,City_SanMarcos,City_SanMarino,City_SanMartin,City_SanMateo,City_SanMiguel,City_SanPablo,City_SanPedro,City_SanQuentin,City_SanRafael,City_SanRamon,City_SanSimeon,City_SanYsidro,City_Sanger,City_SantaAna,City_SantaBarbara,City_SantaClara,City_SantaClarita,City_SantaCruz,City_SantaFeSprings,City_SantaMargarita,City_SantaMaria,City_SantaMonica,City_SantaPaula,City_SantaRosa,City_SantaYnez,City_SantaYsabel,City_Santee,City_Saratoga,City_Sausalito,City_Scotia,City_ScottBar,City_ScottsValley,City_SealBeach,City_Seaside,City_Sebastopol,City_Seeley,City_SeiadValley,City_Selma,City_SequoiaNationalPark,City_Shafter,City_Shandon,City_Shasta,City_ShastaLake,City_ShaverLake,City_SheepRanch,City_Sheridan,City_ShermanOaks,City_ShingleSprings,City_Shingletown,City_Shoshone,City_SierraCity,City_SierraMadre,City_Sierraville,City_Silverado,City_SimiValley,City_Sloughhouse,City_Smartville,City_SmithRiver,City_Snelling,City_SodaSprings,City_SolanaBeach,City_Soledad,City_Solvang,City_Somerset,City_SomesBar,City_Somis,City_Sonoma,City_Sonora,City_Soquel,City_Soulsbyville,City_SouthDosPalos,City_SouthElMonte,City_SouthGate,City_SouthLakeTahoe,City_SouthPasadena,City_SouthSanFrancisco,City_Spreckels,City_SpringValley,City_Springville,City_SquawValley,City_Standish,City_Stanford,City_Stanton,City_StevensonRanch,City_Stevinson,City_StinsonBeach,City_StirlingCity,City_Stockton,City_Stonyford,City_Stratford,City_Strathmore,City_StrawberryValley,City_StudioCity,City_Sugarloaf,City_SuisunCity,City_Sultana,City_Summerland,City_SunCity,City_SunValley,City_Sunland,City_Sunnyvale,City_Sunol,City_SunsetBeach,City_Surfside,City_Susanville,City_Sutter,City_SutterCreek,City_Sylmar,City_Taft,City_TahoeCity,City_TahoeVista,City_Tahoma,City_Tarzana,City_Taylorsville,City_Tecate,City_Tecopa,City_Tehachapi,City_Tehama,City_Temecula,City_TempleCity,City_Templeton,City_Termo,City_TerraBella,City_TheSeaRanch,City_Thermal,City_Thornton,City_ThousandOaks,City_ThousandPalms,City_ThreeRivers,City_Tipton,City_Tollhouse,City_Tomales,City_Topanga,City_Topaz,City_Torrance,City_TrabucoCanyon,City_Tracy,City_Tranquillity,City_Traver,City_TravisAfb,City_Trinidad,City_TrinityCenter,City_Trona,City_Truckee,City_Tujunga,City_Tulare,City_Tulelake,City_Tuolumne,City_Tupman,City_Turlock,City_Tustin,City_Twain,City_TwainHarte,City_TwentyninePalms,City_TwinBridges,City_Ukiah,City_UnionCity,City_Upland,City_UpperLake,City_Vacaville,City_Valencia,City_Vallecito,City_Vallejo,City_ValleyCenter,City_ValleyFord,City_ValleySprings,City_ValleyVillage,City_Valyermo,City_VanNuys,City_Venice,City_Ventura,City_Vernalis,City_Victorville,City_Vidal,City_VillaPark,City_Vina,City_Visalia,City_Vista,City_Volcano,City_Wallace,City_Walnut,City_WalnutCreek,City_WalnutGrove,City_WarnerSprings,City_Wasco,City_Washington,City_Waterford,City_Watsonville,City_Weaverville,City_Weed,City_Weimar,City_Weldon,City_Wendel,City_Weott,City_WestCovina,City_WestHills,City_WestHollywood,City_WestPoint,City_WestSacramento,City_WestlakeVillage,City_Westley,City_Westminster,City_Westmorland,City_Westport,City_Westwood,City_Wheatland,City_WhiteWater,City_Whitethorn,City_Whitmore,City_Whittier,City_Wildomar,City_Williams,City_Willits,City_WillowCreek,City_Willows,City_Wilmington,City_Wilseyville,City_Wilton,City_Winchester,City_Windsor,City_Winnetka,City_Winterhaven,City_Winters,City_Winton,City_Wishon,City_WitterSprings,City_WoffordHeights,City_Woodacre,City_Woodbridge,City_Woodlake,City_Woodland,City_WoodlandHills,City_Woody,City_Wrightwood,City_Yermo,City_YorbaLinda,City_Yorkville,City_YosemiteNationalPark,City_Yountville,City_Yreka,City_YubaCity,City_Yucaipa,City_YuccaValley,City_Zenia,Gender_Female,Gender_Male,SeniorCitizen_No,SeniorCitizen_Yes,Partner_No,Partner_Yes,Dependents_No,Dependents_Yes,PhoneService_No,PhoneService_Yes,MultipleLines_No,MultipleLines_No_phone_service,MultipleLines_Yes,InternetService_DSL,InternetService_Fiber_optic,InternetService_No,OnlineSecurity_No,OnlineSecurity_No_internet_service,OnlineSecurity_Yes,OnlineBackup_No,OnlineBackup_No_internet_service,OnlineBackup_Yes,DeviceProtection_No,DeviceProtection_No_internet_service,DeviceProtection_Yes,TechSupport_No,TechSupport_No_internet_service,TechSupport_Yes,StreamingTV_No,StreamingTV_No_internet_service,StreamingTV_Yes,StreamingMovies_No,StreamingMovies_No_internet_service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One_year,Contract_Two_year,PaperlessBilling_No,PaperlessBilling_Yes,PaymentMethod_Bank_transfer_(automatic),PaymentMethod_Credit_card_(automatic),PaymentMethod_Electronic_check,PaymentMethod_Mailed_check
0,90003,33.964131,-118.272783,2,53.85,108.15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,1,0,0,1,1,0,0,1,0,0,0,0,1,0,0,1,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,1
1,90005,34.059281,-118.30742,2,70.7,151.65,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,1,0,1,1,0,0,0,1,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0
2,90006,34.048013,-118.293953,8,99.65,820.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,1,0,1,0,0,1,0,1,0,1,0,0,1,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0
3,90010,34.062125,-118.315709,28,104.8,3046.05,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0,1,0,0,1,0,1,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0
4,90015,34.039224,-118.266293,49,103.7,5036.3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,1,0,1,0,0,1,0,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0,0,0,1,1,0,0,0


## Splitting the dataset using train_test_split

In [25]:
y.value_counts()

0    5174
1    1869
Name: ChurnValue, dtype: int64

In here we have about 26% of 1's (meaning people who left company) and about 74% of 0's (meaning people who haven't left company). The data is imbalanced so when splitting the data we have to make sure than in both training and testing sets we will have the same percentage of 0's and 1's. We do that using "stratify" within the test_train_split

In [26]:
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y, random_state = 42, stratify = y)

In [27]:
# Because y it's only 0's and 1's it's easy to check if we succeeded
sum(y_train)/len(y_train), sum(y_test)/len(y_test)

(0.2654297614539947, 0.26519023282226006)