<div class="alert alert-success">
<b>Reviewer's comment V2</b>
  
The project is accepted. Good luck on the next sprint!
  
</div>

**Review**

Hi, my name is Dmitry and I will be reviewing your project.
  
You can find my comments in colored markdown cells:
  
<div class="alert alert-success">
  If everything is done successfully.
</div>
  
<div class="alert alert-warning">
  If I have some (optional) suggestions, or questions to think about, or general comments.
</div>
  
<div class="alert alert-danger">
  If a section requires some corrections. Work can't be accepted with red comments.
</div>
  
Please don't remove my comments, as it will make further review iterations much harder for me.
  
Feel free to reply to my comments or ask questions using the following template:
  
<div class="alert alert-info">
  For your comments and questions.
</div>
  
First of all, thank you for turning in the project! You did an excellent job! There's just one small problem that needs to be fixed before the project is accepted. It should be very straightforward though!

# Project description

Beta Bank customers are leaving: little by little, chipping away every month. The bankers figured out it’s cheaper to save the existing customers rather than to attract new ones.

We need to predict whether a customer will leave the bank soon. You have the data on clients’ past behavior and termination of contracts with the bank.

Build a model with the maximum possible F1 score. To pass the project, you need an F1 score of at least **0.59**. Check the F1 for the test set.

Additionally, measure the AUC-ROC metric and compare it with the F1.


## Data description
The data can be found in /datasets/Churn.csv file. 

**Features**

* *RowNumber* — data string index
* *CustomerId* — unique customer identifier
* *Surname* — surname
* *CreditScore* — credit score
* *Geography* — country of residence
* *Gender* — gender
* *Age* — age
* *Tenure* — period of maturation for a customer’s fixed deposit (years)
* *Balance* — account balance
* *NumOfProducts* — number of banking products used by the customer
* *HasCrCard* — customer has a credit card
* *IsActiveMember* — customer’s activeness
* *EstimatedSalary* — estimated salary

**Target**

* *Exited* — сustomer has left



## Data Preprocessing

In [41]:
! pip install sidetable

Defaulting to user installation because normal site-packages is not writeable


### Imports

In [42]:
import pandas as pd
import sidetable as stb
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score, roc_auc_score, roc_curve
from sklearn.utils import shuffle
import sys
import warnings
if not sys.warnoptions:
    warnings.simplefilter("ignore")

### Data samples

In [43]:
data = pd.read_csv('/datasets/Churn.csv')
data.sample(5)

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
191,192,15771086,Graham,512,France,Female,36,3.0,84327.77,2,1,0,17675.36,0
6957,6958,15802274,Waters,686,France,Female,44,7.0,55053.62,1,1,0,181757.19,0
4025,4026,15640769,Hobbs,660,France,Male,63,8.0,137841.53,1,1,1,42790.29,0
2269,2270,15613097,Kao,605,France,Female,33,4.0,0.0,2,0,1,83700.66,0
2581,2582,15634719,Chinwendu,704,France,Male,31,,0.0,2,1,0,183038.33,0


First look into the data.

### Info

In [44]:
data.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           9091 non-null   float64
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(3), int64(8), object(3)
memory usage: 2.6 MB


The data types are looking good. We can see we have some missing values in Tenure column.

### Describe

In [45]:
data.describe(include='all')

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
count,10000.0,10000.0,10000,10000.0,10000,10000,10000.0,9091.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
unique,,,2932,,3,2,,,,,,,,
top,,,Smith,,France,Male,,,,,,,,
freq,,,32,,5014,5457,,,,,,,,
mean,5000.5,15690940.0,,650.5288,,,38.9218,4.99769,76485.889288,1.5302,0.7055,0.5151,100090.239881,0.2037
std,2886.89568,71936.19,,96.653299,,,10.487806,2.894723,62397.405202,0.581654,0.45584,0.499797,57510.492818,0.402769
min,1.0,15565700.0,,350.0,,,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,2500.75,15628530.0,,584.0,,,32.0,2.0,0.0,1.0,0.0,0.0,51002.11,0.0
50%,5000.5,15690740.0,,652.0,,,37.0,5.0,97198.54,1.0,1.0,1.0,100193.915,0.0
75%,7500.25,15753230.0,,718.0,,,44.0,7.0,127644.24,2.0,1.0,1.0,149388.2475,0.0


We can see here the description of our parameters. No problems here.

<div class="alert alert-success">
<b>Reviewer's comment</b>

Alright, the data was loaded and inspected.

</div>

### Missing values

In [46]:
data.stb.missing(style=True)

Unnamed: 0,missing,total,percent
Tenure,909,10000,9.09%
RowNumber,0,10000,0.00%
CustomerId,0,10000,0.00%
Surname,0,10000,0.00%
CreditScore,0,10000,0.00%
Geography,0,10000,0.00%
Gender,0,10000,0.00%
Age,0,10000,0.00%
Balance,0,10000,0.00%
NumOfProducts,0,10000,0.00%


We have 9.09% of missing values in Tenure. A reasonable cause can be new clients who has no tenure at all.

In [47]:
data = data[data.Tenure.notnull()]

The missing rows share is less then 10%, so we can delete them.

In [48]:
data.stb.missing()

Unnamed: 0,missing,total,percent
RowNumber,0,9091,0.0
CustomerId,0,9091,0.0
Surname,0,9091,0.0
CreditScore,0,9091,0.0
Geography,0,9091,0.0
Gender,0,9091,0.0
Age,0,9091,0.0
Tenure,0,9091,0.0
Balance,0,9091,0.0
NumOfProducts,0,9091,0.0


No missing values here.

<div class="alert alert-success">
<b>Reviewer's comment</b>

Missing values were dealt with reasonably

</div>

### Duplicates

In [49]:
data.duplicated().sum()

0

No duplicates.

### Dropping unnecessary columns

In order to train our model, we have to drop some unnecessary columns (which has no added value for our model).

In [50]:
data = data.drop(['RowNumber', 'CustomerId', 'Surname'], axis = 1)

In [51]:
data.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2.0,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1.0,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8.0,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1.0,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2.0,125510.82,1,1,1,79084.1,0


Ok. No unnecessary columns.

<div class="alert alert-success">
<b>Reviewer's comment</b>

Great!

</div>

### Conclusion
In this section, we:
* had a first look into the data.
* checked the info and the description of the data.
* handled with missing values.
* looked for duplicates.
* dropped unnecassary columns.

## Features Preparation

Let's use One-Hot Encoding to handle our categorical columns (Geography, Gender): 

### One-Hot Encoding

In [52]:
data_ohe = pd.get_dummies(data, drop_first=True)

In [53]:
data_ohe.head()

Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Geography_Germany,Geography_Spain,Gender_Male
0,619,42,2.0,0.0,1,1,1,101348.88,1,0,0,0
1,608,41,1.0,83807.86,1,0,1,112542.58,0,0,1,0
2,502,42,8.0,159660.8,3,1,0,113931.57,1,0,0,0
3,699,39,1.0,0.0,2,0,0,93826.63,0,0,0,0
4,850,43,2.0,125510.82,1,1,1,79084.1,0,0,1,0


Done.

<div class="alert alert-success">
<b>Reviewer's comment</b>

Categorical features were encoded!

</div>

### Features & Target Split

Let's split our data to 3 sets: train, valid and test.

In [54]:
target = data_ohe.Exited
features = data_ohe.drop('Exited', axis=1)

In [55]:
features_train, features_valid_test, target_train, target_valid_test = train_test_split(features, target, test_size=0.4, random_state=12345)
features_valid, features_test, target_valid, target_test = train_test_split(features_valid_test, target_valid_test, test_size=0.5, random_state=12345)

In [56]:
print('train set:', round(features_train.shape[0]/data.shape[0], 3), 
     '\nvalid set:', round(features_valid.shape[0]/data.shape[0], 3),
     '\ntest set:', round(features_test.shape[0]/data.shape[0], 3))

train set: 0.6 
valid set: 0.2 
test set: 0.2


Ok. We have the right proportions.

<div class="alert alert-success">
<b>Reviewer's comment</b>

The split into train, validation and test looks good

</div>

### Feature Scaling

Let's scale our features:

In [57]:
numeric = features.columns

scaler = StandardScaler()
scaler.fit(features_train[numeric])
features_train[numeric] = scaler.transform(features_train[numeric])
features_valid[numeric] = scaler.transform(features_valid[numeric])
features_test[numeric] = scaler.transform(features_test[numeric])

features_train.head()

Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Geography_Germany,Geography_Spain,Gender_Male
9344,0.809075,-1.039327,-1.025995,0.554904,-0.908179,0.663468,-1.024127,0.019508,-0.58229,-0.572128,-1.107304
3796,-1.152518,-1.227561,0.696524,0.480609,-0.908179,-1.507231,-1.024127,0.056167,-0.58229,-0.572128,0.903094
7462,-0.398853,0.090079,1.385532,-1.23783,-0.908179,0.663468,0.976442,0.848738,-0.58229,-0.572128,0.903094
1508,-0.749875,-0.286389,0.35202,-1.23783,0.8093,0.663468,0.976442,-0.894953,-0.58229,-0.572128,0.903094
4478,-1.028628,-0.756975,-0.336987,-1.23783,0.8093,-1.507231,0.976442,-1.284516,-0.58229,-0.572128,0.903094


Done.

<div class="alert alert-success">
<b>Reviewer's comment</b>

Feature scaling was applied correctly

</div>

### Conclusion
In this section, we:
* used One-Hot Encoding for the categorical parameters.
* splitted the data into 3 sets: train, valid and test.
* scaled the features.

## Classes Imbalance

Let's check the percentage of the classes:

### Classes percentage

In [58]:
data_ohe.stb.freq(['Exited'], style=True,  cum_cols=False)

Unnamed: 0,Exited,count,percent
0,0,7237,79.61%
1,1,1854,20.39%


We have 4 times zeros (80%) more than ones (20%). There is an imbalance in the data.

<div class="alert alert-success">
<b>Reviewer's comment</b>

Class imbalance was noted

</div>

### Decision Tree

In [59]:
f1_scores = []

for depth in range(1, 21):
    model = DecisionTreeClassifier(random_state=12345, max_depth=depth)
    model.fit(features_train, target_train)
    predicted_valid = model.predict(features_valid)
    f1_scores.append({'max_depth': depth,
                     'f1_score': round(f1_score(target_valid, predicted_valid), 3)})

f1_df = pd.DataFrame(f1_scores)
f1_df = f1_df.sort_values(by='f1_score', ascending=False)
f1_df.head(10)

Unnamed: 0,max_depth,f1_score
6,7,0.576
5,6,0.561
8,9,0.545
3,4,0.541
7,8,0.541
1,2,0.53
4,5,0.514
10,11,0.513
9,10,0.512
18,19,0.486


The highest f1 score of an imbalanced decision tree (max_depth=7): 0.576. 

Not enough.

In [60]:
probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]
auc_roc = roc_auc_score(target_valid, probabilities_one_valid)

print('AUC-ROC:', round(auc_roc, 3))

AUC-ROC: 0.671


### Random Forest

In [61]:
f1_scores = []

for n_estimator in range(1, 21):
    for depth in range(1, 21):
        model = RandomForestClassifier(random_state=12345, n_estimators=n_estimator, max_depth=depth)
        model.fit(features_train, target_train)
        predicted_valid = model.predict(features_valid)
        f1_scores.append({'n_estimators':n_estimator,
                         'max_depth': depth,
                         'f1_score': round(f1_score(target_valid, predicted_valid), 3)})

f1_df = pd.DataFrame(f1_scores)
f1_df = f1_df.sort_values(by='f1_score', ascending=False)
f1_df.head(10)

Unnamed: 0,n_estimators,max_depth,f1_score
258,13,19,0.595
298,15,19,0.59
253,13,14,0.589
218,11,19,0.589
278,14,19,0.589
134,7,15,0.587
339,17,20,0.586
296,15,17,0.586
394,20,15,0.585
396,20,17,0.585


The highest f1 score of an imbalanced random forest (n_estimators=13, max_depth=19): 0.595. 

That's better :)

In [62]:
probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]
auc_roc = roc_auc_score(target_valid, probabilities_one_valid)

print('AUC-ROC:', round(auc_roc, 3))

AUC-ROC: 0.85


### Logistic Regression

In [63]:
model = LogisticRegression(random_state=12345, solver='liblinear')
model.fit(features_train, target_train)
predicted_valid = model.predict(features_valid)
print('F1:', round(f1_score(target_valid, predicted_valid), 3))

F1: 0.303


An imbalanced Logistic Regression has the lowest f1 score. 

Not enough at all.

In [64]:
probabilities_valid = model.predict_proba(features_valid)
probabilities_one_valid = probabilities_valid[:, 1]
auc_roc = roc_auc_score(target_valid, probabilities_one_valid)

print('AUC-ROC:', round(auc_roc, 3))

AUC-ROC: 0.774


### Conclusion
In this section, we:
* checked the classes percentage and found an imbalance.
* trained a decision tree with imbalanced data (f1=0.576).
* trained a random forest with imbalanced data (f1=0.595).
* trained a logistic regression with imbalanced data (f1=0.303).

<div class="alert alert-success">
<b>Reviewer's comment</b>

Great, you trained three different models without taking class imbalance into account

</div>

## Improving the quality of the model

In order to improve our models, we will try 3 different methods of classes balancing: 
* Class Weight Adjustment
* Upsampling
* Downsampling

### Class Weight Adjustment

#### Decision Tree

In [65]:
f1_scores = []

for depth in range(1, 21):
    model = DecisionTreeClassifier(random_state=12345, max_depth=depth, class_weight='balanced')
    model.fit(features_train, target_train)
    predicted_valid = model.predict(features_valid)
    f1_scores.append({'max_depth': depth,
                     'f1_score': round(f1_score(target_valid, predicted_valid), 3)})

f1_df = pd.DataFrame(f1_scores)
f1_df = f1_df.sort_values(by='f1_score', ascending=False)
f1_df.head(10)

Unnamed: 0,max_depth,f1_score
4,5,0.574
5,6,0.563
7,8,0.558
2,3,0.549
3,4,0.546
6,7,0.541
1,2,0.53
10,11,0.514
8,9,0.512
9,10,0.51


The highest f1 score of decision tree with class weight adjustment (max_depth=5): 0.574. A bit lower, but not a significant difference from the imbalanced decision tree (0.576).

#### Random Forest

In [66]:
f1_scores = []

for n_estimator in range(1, 21):
    for depth in range(1, 21):
        model = RandomForestClassifier(random_state=12345, n_estimators=n_estimator, max_depth=depth, class_weight='balanced')
        model.fit(features_train, target_train)
        predicted_valid = model.predict(features_valid)
        f1_scores.append({'n_estimators':n_estimator,
                         'max_depth': depth,
                         'f1_score': round(f1_score(target_valid, predicted_valid), 3)})

f1_df = pd.DataFrame(f1_scores)
f1_df = f1_df.sort_values(by='f1_score', ascending=False)
f1_df.head(10)

Unnamed: 0,n_estimators,max_depth,f1_score
246,13,7,0.644
249,13,10,0.641
366,19,7,0.641
389,20,10,0.641
269,14,10,0.64
346,18,7,0.64
369,19,10,0.639
329,17,10,0.639
386,20,7,0.638
247,13,8,0.638


The highest f1 score of random forest with class weight adjustment (n_estimators=13, max_depth=7): 0.644. We can see an improvement from the imbalanced random forest (0.595).

#### Logistic Regression

In [67]:
model = LogisticRegression(random_state=12345, solver='liblinear', class_weight='balanced')
model.fit(features_train, target_train)
predicted_valid = model.predict(features_valid)
print('F1:', round(f1_score(target_valid, predicted_valid), 3))

F1: 0.51


The highest f1 score of logistic regression with class weight adjustment: 0.51. A significant improvement from the imbalanced logistic regression (0.303).

### Upsampling

#### Finding the classes ratio

In [68]:
ratio = target[target == 0].count() / target[target == 1].count()
print('classes ratio:', round(ratio, 3))

classes ratio: 3.903


The ratio of the classes is almost 4. It means that we have to multiply the ones by 4.

In [69]:

def upsample(features, target, repeat):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_upsampled = pd.concat([features_zeros] + [features_ones] * repeat)
    target_upsampled = pd.concat([target_zeros] + [target_ones] * repeat)

    features_upsampled, target_upsampled = shuffle(
        features_upsampled, target_upsampled, random_state=12345
    )

    return features_upsampled, target_upsampled


features_upsampled, target_upsampled = upsample(features_train, target_train, 4)

print('ratio:', round(target_upsampled[target_upsampled == 0].shape[0] / target_upsampled[target_upsampled == 1].shape[0], 3))

ratio: 0.961


Now the ratio is almost 1. We got the ratio we wanted.

<div class="alert alert-success">
<b>Reviewer's comment</b>

Upsampling function is correct and it's applied only to the train set.

</div>

#### Decision Tree

In [70]:
f1_scores = []

for depth in range(1, 21):
    model = DecisionTreeClassifier(random_state=12345, max_depth=depth)
    model.fit(features_upsampled, target_upsampled)
    predicted_valid = model.predict(features_valid)
    f1_scores.append({'max_depth': depth,
                     'f1_score': round(f1_score(target_valid, predicted_valid), 3)})

f1_df = pd.DataFrame(f1_scores)
f1_df = f1_df.sort_values(by='f1_score', ascending=False)
f1_df.head(10)

Unnamed: 0,max_depth,f1_score
4,5,0.574
5,6,0.563
7,8,0.558
2,3,0.549
3,4,0.546
6,7,0.541
1,2,0.53
8,9,0.516
10,11,0.515
0,1,0.507


The highest f1 score of decision tree after upsampling looks same as decision tree with class weight adjustment (max_depth=5): 0.574. A bit lower, but not a significant difference from the imbalanced decision tree (0.576).

#### Random Forest

In [71]:
f1_scores = []

for n_estimator in range(1, 21):
    for depth in range(1, 21):
        model = RandomForestClassifier(random_state=12345, n_estimators=n_estimator, max_depth=depth)
        model.fit(features_upsampled, target_upsampled)
        predicted_valid = model.predict(features_valid)
        f1_scores.append({'n_estimators':n_estimator,
                         'max_depth': depth,
                         'f1_score': round(f1_score(target_valid, predicted_valid), 3)})

f1_df = pd.DataFrame(f1_scores)
f1_df = f1_df.sort_values(by='f1_score', ascending=False)
f1_df.head(10)

Unnamed: 0,n_estimators,max_depth,f1_score
375,19,16,0.622
331,17,12,0.62
327,17,8,0.62
354,18,15,0.618
387,20,8,0.617
386,20,7,0.616
367,19,8,0.616
307,16,8,0.616
311,16,12,0.616
287,15,8,0.616


The highest f1 score of random forest after upsampling is very close to random forest with class weight adjustment: 0.622 and 0.644 respectively. We can see an improvement from the imbalanced random forest (0.595).

#### Logistic Regression

In [72]:
model = LogisticRegression(solver='liblinear', random_state = 12345)
model.fit(features_upsampled, target_upsampled)
predicted_valid = model.predict(features_valid)

print('F1:', f1_score(target_valid, predicted_valid))

F1: 0.5081374321880651


The highest f1 score of logistic regression after upsampling is very close to logistic regression with class weight adjustment: 0.5 and 0.51 respectively. A significant improvement from the imbalanced logistic regression (0.303).

### Downsampling

In [73]:
def downsample(features, target, fraction):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_downsampled = pd.concat(
        [features_zeros.sample(frac=fraction, random_state=12345)]
        + [features_ones]
    )
    target_downsampled = pd.concat(
        [target_zeros.sample(frac=fraction, random_state=12345)]
        + [target_ones]
    )

    features_downsampled, target_downsampled = shuffle(
        features_downsampled, target_downsampled, random_state=12345
    )

    return features_downsampled, target_downsampled


features_downsampled, target_downsampled = downsample(features_train, target_train, 0.25)

print('ratio:', round(target_downsampled[target_downsampled == 0].shape[0] / target_downsampled[target_downsampled == 1].shape[0], 3))

ratio: 0.961


Now the ratio is almost 1. We got the ratio we wanted.

<div class="alert alert-success">
<b>Reviewer's comment</b>

Downsampling also looks good

</div>

#### Decision Tree

In [74]:
f1_scores = []

for depth in range(1, 21):
    model = DecisionTreeClassifier(random_state=12345, max_depth=depth)
    model.fit(features_downsampled, target_downsampled)
    predicted_valid = model.predict(features_valid)
    f1_scores.append({'max_depth': depth,
                     'f1_score': round(f1_score(target_valid, predicted_valid), 3)})

f1_df = pd.DataFrame(f1_scores)
f1_df = f1_df.sort_values(by='f1_score', ascending=False)
f1_df.head(10)

Unnamed: 0,max_depth,f1_score
5,6,0.564
6,7,0.557
3,4,0.556
4,5,0.554
7,8,0.551
12,13,0.517
8,9,0.515
11,12,0.514
10,11,0.51
2,3,0.51


The highest f1 score of decision tree after downsampling is very close to decision trees after upsampling or class weight adjustment: 0.564 and 0.574 respectively. A bit lower, but not a significant difference from the imbalanced decision tree (0.576).

#### Random Forest

In [75]:
f1_scores = []

for n_estimator in range(1, 21):
    for depth in range(1, 21):
        model = RandomForestClassifier(random_state=12345, n_estimators=n_estimator, max_depth=depth)
        model.fit(features_downsampled, target_downsampled)
        predicted_valid = model.predict(features_valid)
        f1_scores.append({'n_estimators':n_estimator,
                         'max_depth': depth,
                         'f1_score': round(f1_score(target_valid, predicted_valid), 3)})

f1_df = pd.DataFrame(f1_scores)
f1_df = f1_df.sort_values(by='f1_score', ascending=False)
f1_df.head(10)

Unnamed: 0,n_estimators,max_depth,f1_score
347,18,8,0.604
387,20,8,0.603
327,17,8,0.602
367,19,8,0.601
287,15,8,0.6
227,12,8,0.6
264,14,5,0.6
307,16,8,0.599
236,12,17,0.598
224,12,5,0.597


The highest f1 score of random forest after downsampling is very close to random forest after upsampling  and class weight adjustment: 0.622, 0.644 and 0.604 respectively. We can see an improvement from the imbalanced random forest (0.595).

#### Logistic Regression

In [76]:
model = LogisticRegression(solver='liblinear', random_state=12345)
model.fit(features_downsampled, target_downsampled)
predicted_valid = model.predict(features_valid) 

print('F1:', f1_score(target_valid, predicted_valid))

F1: 0.5053763440860214


The highest f1 score of logistic regression after downsampling is very close to logistic regression after upsampling  and class weight adjustment: 0.5, 0.5 and 0.51 respectively. A significant improvement from the imbalanced logistic regression (0.303).

#### Conclusion
In this section, we tried to improve the f1 scores of our models by using different methods:
* F1 scores of decision trees after downsampling or upsampling or class weight adjustment (0.564 and 0.574 respectively) didn't improve much relative to the imbalanced decision tree (0.576).
* F1 scores of random forests after downsampling or upsampling or class weight adjustment  (0.622, 0.644 and 0.604 respectively) did improve relative to the imbalanced random forest (0.595).
* F1 scores of logistic regressions after downsampling or upsampling or class weight adjustment (0.5, 0.5 and 0.51 respectively) have significant improvement relative to the imbalanced logistic regression (0.303).

<div class="alert alert-success">
<b>Reviewer's comment</b>

Excellent! You tried three different methods of dealing with class imbalance, trained different models and tuned their hyperparameters using the validation set

</div>

## Perform the final testing.

### Decision Tree

In [77]:
model = DecisionTreeClassifier(random_state=12345, max_depth=5, class_weight='balanced')
model.fit(features_train, target_train)
predicted_test = model.predict(features_test)
print('F1:', round(f1_score(target_test, predicted_test), 3))

F1: 0.565


In [78]:
probabilities_test = model.predict_proba(features_test)
probabilities_one_test = probabilities_test[:, 1]
auc_roc = roc_auc_score(target_test, probabilities_one_test)

print('AUC-ROC:', round(auc_roc, 3))

AUC-ROC: 0.837


The best decision tree (max_depth=5, class_weight='balanced') didn't perform well regarding its f1 score, although its AUC-ROC score isn't bad.

Scores before balancing the decision tree (max_depth=7): f1: 0.576, AUC-ROC: 0.671

Scores after balancing the decision tree (max_depth=5): f1: 0.565, AUC-ROC: 0.837

### Random Forest

In [79]:
model = RandomForestClassifier(random_state=12345, n_estimators=13, max_depth=7, class_weight='balanced')
model.fit(features_train, target_train)
predicted_test = model.predict(features_test)
print('F1:', round(f1_score(target_test, predicted_test), 3))

F1: 0.596


In [80]:
probabilities_test = model.predict_proba(features_test)
probabilities_one_test = probabilities_test[:, 1]
auc_roc = roc_auc_score(target_test, probabilities_one_test)

print('AUC-ROC:', round(auc_roc, 3))

AUC-ROC: 0.849


The best random forest (n_estimators=13, max_depth=7, class_weight='balanced') performed quite well and passed the test! Its f1 and AUC-ROC scores are the highest :)

Scores before balancing the random forest (n_estimators=13, max_depth=19): f1: 0.595, AUC-ROC: 0.85

Scores after balancing the random forest (n_estimators=13, max_depth=7): f1: 0.596, AUC-ROC: 0.849

### Logistic Regression

In [81]:
model = LogisticRegression(random_state=12345, solver='liblinear', class_weight='balanced')
model.fit(features_train, target_train)
predicted_test = model.predict(features_test)
print('F1:', round(f1_score(target_test, predicted_test), 3))

F1: 0.503


In [82]:
probabilities_test = model.predict_proba(features_test)
probabilities_one_test = probabilities_test[:, 1]
auc_roc = roc_auc_score(target_test, probabilities_one_test)

print('AUC-ROC:', round(auc_roc, 3))

AUC-ROC: 0.782


The best logistic regression (class_weight='balanced') didn't perform well and got the lowest scores.

Scores before balancing the logistic regression: F1: 0.303, AUC-ROC: 0.774

Scores after balancing the logistic regression: F1: 0.503, AUC-ROC: 0.782

<div class="alert alert-success">
<b>Reviewer's comment</b>

Great, the final model was evaluated on the test set!

</div>

<div class="alert alert-danger">
<s><b>Reviewer's comment</b>

Could you please add calculation of ROC-AUC score for all models? The idea is to compare how our balancing techniques impact the two metrics (F1 score and ROC-AUC)

</div>

<div class="alert alert-success">
<b>Reviewer's comment V2</b>
  
Ok, great!
  
</div>

### General Conclusion
Our task was to build a model that can predict if a client will leave the bank or not. 

In order to do this, we:
* preprocessed the data: missing values, duplicates etc.
* prepared the features for the model: One-Hot Encoding, features scaling, spliiting the data to sets.
* examined the data imbalance: training three models with the data (decision tree, random forest, logistic regression) and checking their f1 scores.
* improved the quality of our models: balancing our data by three different methods (class weight adjustment, upsampling, downsampling) and training the models on each of them.
* choosed the best model and performed the final testing on the test set.

Our random forest model got good scores (F1: 0.596, AUC-ROC: 0.84). In order to predict which client is going to leave the bank soon, we recommend using random forest with the following hyperparameters: n_estimators=13, max_depth=7, class_weight='balanced'.

<div class="alert alert-success">
<b>Reviewer's comment</b>

Conclusions look good!

</div>

# Project evaluation
We’ve put together the evaluation criteria for the project. Read this carefully before moving on to the task.

Here’s what the reviewers will look at when reviewing your project:

* How did you prepare the data for training? Have you processed all of the feature types?
* Have you explained the preprocessing steps well enough?
* How did you investigate the balance of classes?
* Did you study the model without taking into account the imbalance of classes?
* What are your findings about the task research?
* Have you correctly split the data into sets?
* How have you worked with the imbalance of classes?
* Did you use at least two techniques for imbalance fixing?
* Have you performed the model training, validation, and final testing correctly?
* How high is your F1 score?
* Did you examine the AUC-ROC values?
* Have you kept to the project structure and kept the code neat?

You have your takeaway sheets and chapter summaries, so you are ready to proceed to the project.

Good luck!