# LSE Machine Learning: Practical Applications

## Module 7 Unit 2 IDE Activity (Assessment) | Fit a neural network onto a suitable data set 
### In this activity, you are required to follow the different steps introduced in the practice activity to fit a neural network onto a suitable data set in R to calculate a prediction.
The instructions for this IDE activity are positioned throughout this notebook as text cells before each step. As a result, you are required to first read the text cells above a code cell, familiarise yourself with the required step, and execute the step. You are encouraged to refer back to the practice IDE activity to familiarise yourself with the steps.

1. Load the relevant packages.

The packages you will need to load include tidyverse and caret.

In [1]:
# Load the required packages and set the parameters
library(tidyverse)
library(caret)
set.seed(1)

“running command 'timedatectl' had status 1”
── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.0     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.2     [32m✔[39m [34mdplyr  [39m 1.0.4
[32m✔[39m [34mtidyr  [39m 1.1.3     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.4.0     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

Loading required package: lattice


Attaching package: ‘caret’


The following object is masked from ‘package:purrr’:

    lift




2. Load the data set.

In [2]:
# Load the data
data <- read_csv("Term Life Insurance.csv")


[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────[39m
cols(
  GENDER = [32mcol_double()[39m,
  AGE = [32mcol_double()[39m,
  MARSTAT = [32mcol_double()[39m,
  EDUCATION = [32mcol_double()[39m,
  ETHNICITY = [32mcol_double()[39m,
  SMARSTAT = [32mcol_double()[39m,
  SGENDER = [32mcol_double()[39m,
  SAGE = [32mcol_double()[39m,
  SEDUCATION = [32mcol_double()[39m,
  NUMHH = [32mcol_double()[39m,
  INCOME = [32mcol_double()[39m,
  TOTINCOME = [32mcol_double()[39m,
  CHARITY = [32mcol_double()[39m,
  FACE = [32mcol_double()[39m,
  FACECVLIFEPOLICIES = [32mcol_double()[39m,
  CASHCVLIFEPOLICIES = [32mcol_double()[39m,
  BORROWCVLIFEPOL = [32mcol_double()[39m,
  NETVALUE = [32mcol_double()[39m
)




3. Split the data into training and testing sets using the `createDataPartion` function. The training set should consist of 70% of the data. 

**Hint:** Create a training index, `trainIndex`, then create the subsets, `dataTrain` and `dataTest`, for later use.

In [3]:
trainIndex <- createDataPartition(data$FACE, p=.7, list=F)
dataTrain <- data[trainIndex, ]
dataTest <- data[-trainIndex, ]

4. Fit the regression model onto the training data, and use the model to calculate a prediction on the test data.

**Hint:** Use the `lm` function, and create a model called `trainReg` to predict the _**face**_ value using all the other variables as input variables. Next, use the `predict` function to create the `testRegPred` variable, to predict the `dataTest` subset.

In [4]:
trainReg <- lm(FACE ~ ., data = dataTrain)
testRegPred <- predict(trainReg, newdata = dataTest)

5. Calculate the RMSE value.

In [5]:
regRMSE <- sqrt(mean((testRegPred - dataTest$FACE)^2))
regRMSE

6. Set the model parameters of the neural network.

**Note:** First apply cross-validation with 10 folds, then set up the grid parameters as follows: 
*   Use a neural network with a single layer, in addition to the layers of the inputs and the outputs of 6, 8, 10, 12, or 24 nodes tested.
*   Use dropout rates of 0.1, 0.01, and 0.001.

In [6]:
tuneCtrl <- trainControl(method = "cv", n = 10)

In [7]:
nnetGrid <- expand.grid(size = c(6, 8, 10, 12, 24),
                        decay = c(0.01, 0.001, 0.1))

7. Train the model to estimate the median value, and use RMSE as the performance metric. 

**Note:** The linout parameter needs to be equal to 1, or true, as this is a regression problem. _**Face**_ is the predicted variable. Apply the `nnet` function, and aim to minimise the RMSE value. Use the `nnetGrid` and `tuneCtrl` objects created in the previous step. Remember to use the training data set when building the model.

In [8]:
#set.seed(1)
nnetFit <- train(FACE ~ ., 
                data = dataTrain,
                method = "nnet",
                metric = "RMSE",
                tuneGrid = nnetGrid,
                trControl = tuneCtrl,
                maxit = 200,
                linout = TRUE, 
                trace = FALSE)

8. Use the `predict` function to generate an output based on the test data.

In [9]:
nnetPredict <- predict(nnetFit, newdata = dataTest)

9. Calculate the RMSE value to compare this model's RMSE value with that of the regression model.

In [10]:
nnetRMSE <- sqrt(mean((nnetPredict - dataTest$FACE)^2))
nnetRMSE

10. Calculate the difference between the RMSE values of the regression model and the neural network.

In [11]:
regRMSE - nnetRMSE

11. Convert the difference in the RMSE values into a percentage.

In [12]:
(regRMSE - nnetRMSE) / regRMSE

**Pause and reflect:** Based on the difference in RMSE values between the regression model and the neural network, do you think the neural network performs better than the regression model?

12. Make a prediction.

**Note:** To demonstrate how the model would be used, you can manually provide the input values for a new observation. You would typically generate random data, but in this case, there are three hard-coded observations included that can serve as test cases.


In [13]:
# Add test cases to a new data frame
predData <- data.frame(
  "GENDER" = 1,
  "AGE" = 44,
  "MARSTAT" = 1,
  "EDUCATION" = c(16,18,15),
  "ETHNICITY" = 2,
  "SMARSTAT" = 1,
  "SGENDER" = 1,
  "SAGE" = 33,
  "SEDUCATION" = 10,
  "NUMHH" = c(5,6,3),
  "INCOME" = c(100000,110000,90000),
  "TOTINCOME" = 200000,
  "CHARITY" = 500,
  "FACECVLIFEPOLICIES" = 0,
  "CASHCVLIFEPOLICIES" = 0,
  "BORROWCVLIFEPOL" = 0,
  "NETVALUE" = 0)

In [14]:
# Predict the balances for the test cases using the ANN
nnetPredict <- predict(nnetFit, newdata = predData)
nnetPredict
predData

GENDER,AGE,MARSTAT,EDUCATION,ETHNICITY,SMARSTAT,SGENDER,SAGE,SEDUCATION,NUMHH,INCOME,TOTINCOME,CHARITY,FACECVLIFEPOLICIES,CASHCVLIFEPOLICIES,BORROWCVLIFEPOL,NETVALUE
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,44,1,16,2,1,1,33,10,5,100000,200000.0,500,0,0,0,0
1,44,1,18,2,1,1,33,10,6,110000,200000.0,500,0,0,0,0
1,44,1,15,2,1,1,33,10,3,90000,200000.0,500,0,0,0,0


In [15]:
# Predict the balances for the test cases using the regression model
regPredict <- predict(trainReg, newdata = predData)
regPredict

**Note:** Remember to submit this IDE notebook after completion and complete the written part of this assessment in the activity submission that follows.