# Neural Networks

In this example we will be using neural network to perform regression rather than classification. This dataset has an issue, since it has categorical values and a neural network does not accept them.

## Predicting Medical Insurance Costs

We have a dataset related to medical costs, with the following features:
- age: age beneficiary
- sex: gender, female, male
- bmi: Body mass index, providing an understanding of body, weights that are relatively high or low relative to height,
- children: Number of children covered by health insurance
- smoker: whether the beneficiary is a smoker or not
- region: the beneficiary's residential area in the US, northeast, southeast, southwest, northwest.

The target is:
- charges: Individual medical costs billed by health insurance

### Loading the data

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv(fr'C:\Users\ivane\Desktop\ACI-3\data\insurance.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1338 entries, 0 to 1337
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1338 non-null   int64  
 1   sex       1338 non-null   object 
 2   bmi       1338 non-null   float64
 3   children  1338 non-null   int64  
 4   smoker    1338 non-null   object 
 5   region    1338 non-null   object 
 6   charges   1338 non-null   float64
dtypes: float64(2), int64(2), object(3)
memory usage: 73.3+ KB


### Check data

Next step is to check the state of the data. We can obtain basic statistics.

(1338, 7)
           count          mean           std        min         25%       50%  \
age       1338.0     39.207025     14.049960    18.0000    27.00000    39.000   
bmi       1338.0     30.663397      6.098187    15.9600    26.29625    30.400   
children  1338.0      1.094918      1.205493     0.0000     0.00000     1.000   
charges   1338.0  13270.422265  12110.011237  1121.8739  4740.28715  9382.033   

                   75%          max  
age          51.000000     64.00000  
bmi          34.693750     53.13000  
children      2.000000      5.00000  
charges   16639.912515  63770.42801  


We can also check if there are any nulls.

age         0
sex         0
bmi         0
children    0
smoker      0
region      0
charges     0
dtype: int64


### Selecting Labels and Features

Now we can define the features, and the target label. But first we will have to identify the numerical features and categorical features and make necessary changes.

The easiest way to convert categorical values is to simply assign an integer e.g. male = 1 and female = 0. But this does not really make sense, since one cannot say male > female or compare them using numerical values.

A more viable solution is to use One-Hot Encoding (or Dummy Encoding), this will create a column for every unique value and each observation will have a 1 if the attribute is related to it, otherwise 0.

Unnamed: 0,sex_female,sex_male,smoker_no,smoker_yes,region_northeast,region_northwest,region_southeast,region_southwest
0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
1,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0
2,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0
3,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0
4,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0


So, now we have categorical values converted to a binary matrix. The next step is to join it with the other numerical features.

Unnamed: 0,sex_female,sex_male,smoker_no,smoker_yes,region_northeast,region_northwest,region_southeast,region_southwest,age,bmi,children
0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,19,27.9,0
1,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,18,33.77,1
2,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,28,33.0,3
3,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,33,22.705,0
4,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,32,28.88,0


Next, we take the names of all the columns that will be used as features.

### Data Scaling

In neural networks it is ideal to convert all the values to the same range. This can be done using MinMaxScaler found in sklearn. It changes all the values from 0 to 1.

[[1.         0.         0.         ... 0.02173913 0.3212268  0.        ]
 [0.         1.         1.         ... 0.         0.47914985 0.2       ]
 [0.         1.         1.         ... 0.2173913  0.45843422 0.6       ]
 ...
 [1.         0.         1.         ... 0.         0.56201238 0.        ]
 [1.         0.         1.         ... 0.06521739 0.26472962 0.        ]
 [1.         0.         0.         ... 0.93478261 0.35270379 0.        ]]


### Data Splitting

Now, we can split the data making sure to use the scaled data for the features.

Training Instances :  (936, 11)
Test Instances :  (402, 11)


### Training the Model

In this case we will be using a Neural Network regresson named MLPRegressor. It accepts a number of parameters, in this case:
- hidden_layer_sizes is set to (11,11,11) this means 3 hidden layers with 11 perceptrons in each node
- solver is set to 'lbfgs', this is the solver for the weight optimization; in this case this was used since others did not converge
- max_iter is set to 5000, this is the number of times the network iterates until it converges

For more information: **[MLPRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html)**

After the model is trained, we can predict using the test data set.

### Evaluating the Model

After we predict the values we can use any metric we want to calculate the accuracy of the model. In this case a classification report, and a confusion matrix is created.

MSE:  27579926.379803084
RMSE:  5251.659392973147
R2:  0.827051991672209
