# Chapter 11 Assignment: Neural nets (NN)

### Read Chapter 11 of DMBA and review relevant resources in Module - Chapter 11 Neural Nets before starting this assignment. Provide your answers to all problems below, save this Jupyter notebook (.ipynb file), and then submit it along with your Excel worksheet in Canvas by the due date.

In [1]:
# Import required packages for this chapter
from pathlib import Path

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.neural_network import MLPClassifier, MLPRegressor

import matplotlib.pylab as plt

from dmba import classificationSummary, regressionSummary

%matplotlib inline

In [2]:
# Working directory:
# If you keep your data in a different folder, replace the argument of the `Path`
# DATA = Path('/Users/user/data/dmba/')
DATA = Path('E:/Aliit/School/MSBA/206/MSBA-206/dmba/')
# and then load data using 
# pd.read_csv(DATA / ‘filename.csv’)

# 1: Credit Card Use.

Consider the hypothetical bank data in Table 11.7 of the DMBA textbook on consumers’ use of credit card credit facilities. Create a small worksheet in Excel to illustrate one pass through a simple neural network (Randomly generate initial weight values)

_Years: number of years the customer has been with the bank_

_Salary: customer’s salary (in thousands of dollars)_

_Used Credit:<br> 
1 = customer has left an unpaid credit card balance at the end of at least one month in the prior year, <br>
0 = balance was paid off at the end of each month_
<p>
Upload your Excel worksheet via canvas submission.

### Please see included file 'MSBA 207 Chapter 11 Question 1.xlsx'

# 2: Neural Net Evolution. 

A neural net typically starts out with random coeffcients; hence, it produces essentially random predictions when presented with its first case. What is the key ingredient by which the net evolves to produce a more accurate prediction?

#### Given the neural nets are currently established to be forwards only, they evolve by taking the the error of the first random weights and adding it to the previous weights guess to iteratively update the weights data point by data point until the desired result is achieved.

# 3: Direct Mailing to Airline Customers.

East-West Airlines has entered into a partnership with the wireless phone company Telcon to sell the latter’s service via direct mail. The file _EastWestAirlinesNN.csv_ contains a subset of a data sample of who has already received a test oﬀer. About 13% accepted.

You are asked to develop a model to classify East–West customers as to whether they purchase a wireless phone service contract (outcome variable Phone_Sale). This model will be used to classify additional customers.

Review the <a href="https://www.thecasesolutions.com/project-data-mining-on-east-west-airlines-65598">Data Dictionary</a> first to understand the data.

You will need <a href="https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html?highlight=mlpclassifier#sklearn.neural_network.MLPClassifier">sklearn.neural_network.MLPClassifier</a> so review this documentation first. Try both ‘logistic’ and ‘relu’ activation functions for the hidden layer.<p>


In [3]:
# load the data
airline_df = pd.read_csv(DATA / 'EastWestAirlinesNN.csv')

__a.__ Run a neural net model on these data, using a single hidden layer with five nodes. Try both ‘logistic’ and ‘relu’ activation functions for the hidden layer. Remember to first convert categorical variables into dummies and scale numerical predictor variables to a 0–1 (use the scikit-learn transformer <a href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html">MinMaxScaler() </a> (also see Chapter 2.4 of DMBA).<p>
Use the training data to learn the transformation (see Table 7.2 in DMBA) rescaling the entire data (numerical variables only) to [0, 1] via "clip=True" in: <p>
scaleInput = MinMaxScaler(feature_range=(0, 1), clip=True)<p>
clip=True to clip transformed values of held-out data to provided feature range<p>
Do not scale binary dummy variables. Create a decile-wise lift chart for the training and validation sets. Interpret the meaning (in business terms) of the leftmost bar of the validation decile-wise lift chart.

In [4]:
airline_df.dtypes

ID#                  float64
Topflight            float64
Balance              float64
Qual_miles           float64
cc1_miles?           float64
cc2_miles?           float64
cc3_miles?           float64
Bonus_miles          float64
Bonus_trans          float64
Flight_miles_12mo    float64
Flight_trans_12      float64
Online_12            float64
Email                float64
Club_member          float64
Any_cc_miles_12mo    float64
Phone_sale           float64
dtype: object

__b.__ Comment on the diﬀerence between the training and validation lift charts.

__c.__ Run a second neural net model on the data, this time setting the number of hidden nodes to 1. Comment now on the diﬀerence between this model and the model you ran earlier, and how overftting might have aﬀected results.

__d.__ What sort of information, if any, is provided about the eﬀects of the various variables?

__e.__ Use GridSearchCV() to search for the number of nodes with the best score in a single layer of hidden nodes. 

# 4: Car Sales.

Consider the data on used cars (_ToyotaCorolla.csv_) with 1436 records and details on 38 attributes, including Price, Age, KM, HP, and other specifcations. The goal is to predict the price of a used Toyota Corolla based on its specifcations. You will need <a href="https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html">sklearn.neural_network.MLPRegressor</a> so review this documentation first. Try both ‘logistic’ and ‘relu’ activation functions for the hidden layer.<p>
__a.__ Fit a neural network model to the data. Use a single hidden layer with 2 nodes. Use predictors Age_08_04, KM, Fuel_Type, HP, Automatic, Doors, Quarterly_Tax, Mfr_Guarantee, Guarantee_Period, Airco, Automatic_airco, CD_Player, Powered_Windows, Sport_Model, and Tow_Bar. Use the scikit-learn transformer _MinMaxScaler()_ to scale numerical variables to the range [0, 1]. Use separate transformer for the input and output data. Try both ‘logistic’ and ‘relu’ activation functions for the hidden layer.<p>
<pre>    
# Use the training data to learn the transformation (see Table 7.2 in DMBA) rescaling the entire data (numerical variables only) to [0, 1]. 
scaleInput = MinMaxScaler(feature_range=(0, 1), clip=True)
scaleOutput = MinMaxScaler(feature_range=(0, 1), clip=True)
# clip=True to clip transformed values of held-out data to provided feature range
# Do not scale binary dummy variables.
</pre>
<p>    
To create the dummy variables, use the pandas function pd.get_dummies(). Record the RMS error for the training data and the validation data. Repeat the process, changing the number of hidden layers and nodes to {single layer with 5 nodes}, {two layers, 5 nodes in each layer}.
<p>
    
<pre>
From the textbook: "Using the Output for Prediction and Classification - When the neural network is used for predicting a numerical outcome variable, MLPRegressor() uses an identity activation function (i.e., no activation function). Both predictor and outcome variables should be scaled to a [0, 1] interval before training the network. The output will therefore also be on a [0, 1] scale. To transform the prediction back to the original y units, which were in the range [a, b], we multiply the network output by (b − a) and add a."
To transform the prediction back to the original y units, use <a href="https://stackoverflow.com/questions/59771061/using-inverse-transform-minmaxscaler-from-scikit-learn-to-force-a-dataframe-be-i">inverse_transform</a>.

Example:

#Create new data
new_data = pd.DataFrame(np.array([[8,20],[11,2],[5,3]]))
new_data

# Create a Scaler for the new data
scaler_new_data = MinMaxScaler() 
# Trasform new data in the [0-1] range
scaled_new_data = scaler_new_data.fit_transform(new_data)
scaled_new_data

# Inverse transform new data from [0-1] to [min, max] of data
inver_new_data = scaler_new_data.inverse_transform(scaled_new_data)
inver_new_data

</pre>



In [None]:
# load the data
car_df = pd.read_csv(DATA / 'ToyotaCorolla.csv')

i. What happens to the RMS error for the training data as the number of layers and nodes increases? 

ii. What happens to the RMS error for the validation data?

iii. Comment on the appropriate number of layers and nodes for this application

__b.__ Use GridSearchCV() to search for the number of nodes with the best score in a single layer of hidden nodes.