<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 1. Introduction
*in Machine Learning*

----
A neural network, just like any machine learning method, learns how to perform tasks by processing data and adjusting its model to best predict the desired outcome. Most popular machine learning tasks are:
- *Classification:* given data and true labels or categories for each data point, train a model that predicts for each data example what its label should be. For example, given data of previous fire hazards, our model can learn how to predict whether a fire will occur for a given day in the future, with all the factors taken into account.
- *Regression:* given data and true continuous value for each data point, train a model that can predict values for each data example. For example, given the previous stock market data, we can build a regression model that forecasts what the stock market price will be at a specific point in time when the data is available.

<br/>Parametric models such as neural networks are described by *parameters:* configuration variables representing the model’s knowledge. We can tweak the parameters using the training data and we can evaluate the performance of the model using hold-out test data the model has not seen during training.

<br/>Take a look at the main components of a neural network learning pipeline depicted below:
- *Input data:* this is used to train a neural network model you need to provide it with some training data.
- *An optimizer:* this is an algorithm that based on the training data adjusts the parameters of the network in order to perform the task at hand.
- *Loss or cost function:* this informs the optimizer whether it is doing a good job on the training data and how to adjust the parameters in the right direction.
- *Evaluation metrics:* these tell us how well the current model performs on validation data. For example, mean absolute error for regression tells us how far the predictions are on average from the true values.
<img src="Images/introduction_diagram.png" style="width:1000px">

<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 2. Predicting medical costs: loading the data
*in Machine Learning*

----
Every machine learning pipeline starts with data and a task. Let’s take a look at the [Medical Cost Personal Datasets dataset](https://www.kaggle.com/datasets/mirichoi0218/insurance), which consists of seven columns with the following descriptions:
<img src="Images/insurance_data.png" style="width:800px">

<br/>We would like to predict the individual medical costs (charges) given the rest of the columns/features. Since charges represent continuous values (in dollars), we’re performing a regression task. 

<br/>Our data is in the `.csv` format and we load it with pandas:

In [1]:
import pandas as pd
dataset = pd.read_csv('Data/insurance.csv')
#view the first 5 entries of the dataset
print(dataset.head()) 

   age     sex     bmi  children smoker     region      charges
0   19  female  27.900         0    yes  southwest  16884.92400
1   18    male  33.770         1     no  southeast   1725.55230
2   28    male  33.000         3     no  southeast   4449.46200
3   33    male  22.705         0     no  northwest  21984.47061
4   32    male  28.880         0     no  northwest   3866.85520


Next, we split the data into *features* (independent variables) and the *target* variable (dependent variable):

In [2]:
# Dataframe slicing using iloc, this designates the first 6 rows as features (independent variables)
features = dataset.iloc[:,0:6]  
# We select the last column with -1, which is designated as target variables (dependent variable)
labels = dataset.iloc[:,-1] 

The pandas `shape` property tells us the shape of our data — a vector of two values: the number of samples and the number of features. To check the shape of our dataset, we can do:

In [3]:
print(features.shape)

(1338, 6)


Or, to make things clearer:

In [4]:
print("Number of features: ", features.shape[1])
print("Number of samples: ", features.shape[0])

Number of features:  6
Number of samples:  1338


To see a useful summary statistics of the dataset we do:

In [5]:
print(features.describe())

               age          bmi     children
count  1338.000000  1338.000000  1338.000000
mean     39.207025    30.663397     1.094918
std      14.049960     6.098187     1.205493
min      18.000000    15.960000     0.000000
25%      27.000000    26.296250     0.000000
50%      39.000000    30.400000     1.000000
75%      51.000000    34.693750     2.000000
max      64.000000    53.130000     5.000000


*Exercise:*
1. Use the `.shape` property of pandas DataFrames to print the number of samples of the `labels` Series.

In [6]:
print(labels.shape)

(1338,)


2. Use the `.describe()` method to print the summary statistics of the `labels` Series.

In [7]:
print(labels.describe())

count     1338.000000
mean     13270.422265
std      12110.011237
min       1121.873900
25%       4740.287150
50%       9382.033000
75%      16639.912515
max      63770.428010
Name: charges, dtype: float64


<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 3. Data preprocessing: one-hot encoding and standardization
*in Machine Learning*

----
### A. One-hot encoding of categorical features:
Since neural networks cannot work with string data directly, we need to convert our categorical features (“region”) into numerical. *One-hot encoding* creates a binary column for each category. For example, since the “region” variable has four categories, the one-hot encoding will result in four binary columns: “northeast”, “northwest”, “southeast”, “southwest” as shown in the table below.
<img src="Images/one_hot_encoding.png" style="width:500px">

<br/>One-hot encoding can be accomplished by using the pandas `get_dummies()` function:

In [8]:
features = pd.get_dummies(features)

### B. Split data into train and test sets:
In machine learning, we train a model on a training data, and we evaluate its performance on a held-out set of data, our test set, not seen during the learning:

In [9]:
from sklearn.model_selection import train_test_split
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42)

Here we chose the test size to be 33% of the total data, and random state controls the shuffling applied to the data before applying the split.

### C. Standardize/normalize numerical features:
The usual preprocessing step for numerical variables, among others, is *standardization* that rescales features to zero mean and unit variance. Why do we want to do that? Well, our features have different scales or units: “age” has an interval of [18, 64] and the “children” column’s interval is much smaller, [0, 5]. By having features with differing scales, the optimizer might update some weights faster than the others.

<br/>*Normalization* is another way of preprocessing numerical data: it scales the numerical features to a fixed range - usually between 0 and 1.

<br/>So which should you use? Well, there isn’t always a clear answer, but you can try them all out and choose the one method that gives the best results.

<br/>To normalize the numerical features we use an exciting addition to `scikit-learn`, `ColumnTransformer`, in the following way:

In [10]:
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Normalizer
from sklearn.compose import ColumnTransformer

ct = ColumnTransformer([('normalize', Normalizer(), ['age', 'bmi', 'children'])], remainder='passthrough')
features_train_norm = ct.fit_transform(features_train)
features_test_norm = ct.transform(features_test)

The name of the column transformer is “only numeric”, it applies a `Normalizer()` to the `age`, `bmi`, and `children` columns, and for the rest of the columns it just passes through. `ColumnTransformer()` returns `NumPy` arrays and we convert them back to a pandas DataFrame so we can see some useful summaries of the scaled data.

<br/>To convert a NumPy array back into a pandas DataFrame, we can do:

In [11]:
features_train_norm = pd.DataFrame(features_train_norm, columns = features_train.columns)
features_test_norm = pd.DataFrame(features_test_norm, columns = features_test.columns)

Note that we fit the scaler to the training data only, and then we apply the trained scaler onto the test data. This way we avoid “information leakage” from the training set to the test set. These two datasets should be completely unaware of each other!

<br/>*Exercise:*
1. Create a new ColumnTransformer instance called `my_ct` that uses `StandardScaler()` and `'scale'` (instead of `Normalizer()` and `'normalize'`) with the same numerical features (`age`, `bmi`, `children`). Make sure to passthrough the remainder of the columns. I already imported the `StandardScaler` module for you.

In [12]:
my_ct = ColumnTransformer([('scale', StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough')

2. Use the `.fit_transform()` method of `my_ct` to fit the column transformer to the `features_train` DataFrame and at the same time transform it. Assign the result to a variable called `features_train_scale`.

In [13]:
features_train_scale = my_ct.fit_transform(features_train)

3. Use the `.transform()` method to transform the trained column transformer `my_ct` to the `features_test` DataFrame. Assign the result to a variable called `features_test_scale`.

In [14]:
features_test_scale = my_ct.transform(features_test)

4. Transform the `features_train_scale` NumPy array back to a DataFrame using `pd.DataFrame()` and assign the result back to a variable called `features_train_scale`. For the `columns` attribute use the `.columns` property of `features_train`.

In [15]:
features_train_scale = pd.DataFrame(features_train_scale, columns = features_train.columns)

5. Transform the `features_test_scale` NumPy array back to DataFrame using `pd.DataFrame()` and assign the result back to a variable called `features_test_scale`. For the `columns` attribute use the `.columns` property of `features_test`.

In [16]:
features_test_scale = pd.DataFrame(features_test_scale, columns = features_test.columns)

6. Print the statistics summary of the resulting train and test DataFrames, `features_train_scale` and `features_test_scale`. Observe the statistics of the numeric columns (mean, variance).

In [17]:
print(features_train_scale.describe())
print(features_test_scale.describe())

                age           bmi      children  sex_female    sex_male  \
count  8.960000e+02  8.960000e+02  8.960000e+02  896.000000  896.000000   
mean  -1.189525e-17  6.819941e-16 -3.965082e-17    0.487723    0.512277   
std    1.000559e+00  1.000559e+00  1.000559e+00    0.500128    0.500128   
min   -1.494934e+00 -2.438281e+00 -9.126072e-01    0.000000    0.000000   
25%   -8.613199e-01 -7.139833e-01 -9.126072e-01    0.000000    0.000000   
50%   -1.650038e-02 -5.227104e-02 -8.245892e-02    0.000000    1.000000   
75%    8.987207e-01  6.598116e-01  7.476894e-01    1.000000    1.000000   
max    1.743540e+00  3.776715e+00  3.238134e+00    1.000000    1.000000   

        smoker_no  smoker_yes  region_northeast  region_northwest  \
count  896.000000  896.000000        896.000000        896.000000   
mean     0.790179    0.209821          0.256696          0.252232   
std      0.407408    0.407408          0.437054          0.434536   
min      0.000000    0.000000          0.000000 

<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 4. Neural network model: tf.keras.Sequential
*in Machine Learning*

----
Now that we have our data preprocessed we can start building the neural network model. The most frequently used model in TensorFlow is Keras *Sequential*. A sequential model, as the name suggests, allows us to create models layer-by-layer in a step-by-step fashion. This model can have only one input tensor and only one output tensor.

<br/>To design a sequential model, we first need to import `Sequential` from `keras.models`:

In [18]:
from tensorflow.keras.models import Sequential




To improve readability, we will design the model in a separate Python function called `design_model()`. The following command initializes a Sequential model instance `my_model`:

In [19]:
def design_model(features):
    model = Sequential(name="my first model")
    return model

my_model = Sequential(name="my first model")

`name` is an optional argument to any model in Keras.

<br/>Finally, we invoke our function in the main program with:

In [20]:
my_model = design_model(features_train)

The model’s `layers` are accessed via the layers attribute:

In [21]:
print(my_model.layers)

[]


As expected, the list of layers is empty. In the next exercise, we will start adding layers to our model.

<br/>*Exercise:*
1. In the `design_model()` function: initialize an instance of `Sequential()` and assign it to a variable called `model`. Then return the model instance `model` from the `design_model()` function.

In [22]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers

def design_model(features):
    model = Sequential()
    return model

2. In the main program, using the `layers` attribute, print the layers of the model instance `model`.

In [23]:
dataset = pd.read_csv('Data/insurance.csv') # load the dataset
features = dataset.iloc[:,0:6] # Choose first 7 columns as features
labels = dataset.iloc[:,-1] # Choose the final column for prediction

features = pd.get_dummies(features) # One-hot encoding for categorical variables
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) # Split the data into training (67%) and test (33%) data

# Standardize
ct = ColumnTransformer([('standardize', StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough')
features_train = ct.fit_transform(features_train)
features_test = ct.transform(features_test)

# Invoke the function for our model design
model = design_model(features_train)

# Print the layers
print(model.layers)


[]


<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 5. Neural network model: layers
*in Machine Learning*

----
Layers are the building blocks of neural networks and can contain 1 or more neurons. Each layer is associated with parameters: weights, and bias, that are tuned during the learning. A fully-connected layer in which all neurons connect to all neurons in the next layer is created the following way in TensorFlow:

In [24]:
import tensorflow as tf
from tensorflow.keras import layers
# We chose 3 neurons here
layer = layers.Dense(3)

This layer looks like this graphically:
<img src="Images/layers_diagram.svg" style="width:800px">

<br/>Pay attention to the dimensions of the weight and bias parameter matrices. Since we chose to create a layer with three neurons, the number of outputs of this layer is 3. Hence, the bias parameter would be a vector of (3, 1) dimensions. But what is the first dimension of the weights matrix? Without knowing how many features or input nodes are in the previous layer, we have no way of knowing! For that reason, with the following code:

In [25]:
print(layer.weights)

[]


We get an empty array since no input layer is specified. However, if we write:

In [26]:
# 13388 samples, 11 features as in our dataset
input = tf.ones((1338, 11))
# A fully-connected layer with 3 neurons
layer = layers.Dense(3)
# Calculate the outputs
output = layer(input)
# Print the weights
print(layer.weights)

[<tf.Variable 'dense_1/kernel:0' shape=(11, 3) dtype=float32, numpy=
array([[ 0.5956639 ,  0.5126096 ,  0.5785978 ],
       [ 0.1813289 , -0.46000773,  0.4351108 ],
       [ 0.49746227,  0.32362473,  0.1388455 ],
       [ 0.30135095, -0.6340486 ,  0.4041859 ],
       [ 0.3742993 , -0.02449501,  0.03209305],
       [-0.09133315,  0.6518723 , -0.09996384],
       [ 0.31146115, -0.34994245, -0.5222039 ],
       [ 0.31551486, -0.59917915, -0.04777688],
       [-0.08655363,  0.27628505, -0.44196674],
       [ 0.45963335,  0.44013715, -0.26298845],
       [-0.6037773 ,  0.41678703, -0.20069456]], dtype=float32)>, <tf.Variable 'dense_1/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]


We get that the weight matrix has `shape = (11, 3)` and the bias matrix has `shape=(3,)`. Compare these weights with the diagram above to make sure you can associate the resulting shapes to it.

<br/>Fortunately, we don’t have to worry about this. TensorFlow will determine the shapes of the weight matrix and bias matrix automatically the moment it encounters the first input.

<br/>*Exercise:*
1. Change the number of samples in the `input` tensor from 1338 to 5000. How does this change affect the shape of the weight and bias vectors?

In [27]:
# 13388 samples, 11 features as in our dataset
input = tf.ones((5000, 11))
# A fully-connected layer with 3 neurons
layer = layers.Dense(3)
# Calculate the outputs
output = layer(input)
# Print the weights
print(layer.weights)

[<tf.Variable 'dense_2/kernel:0' shape=(11, 3) dtype=float32, numpy=
array([[ 0.48526013, -0.03079021,  0.20876396],
       [ 0.46494317, -0.17629805,  0.47531736],
       [-0.04008222, -0.5142191 ,  0.5109639 ],
       [-0.4639427 , -0.20485678,  0.06226873],
       [ 0.00150168,  0.07655787,  0.20752954],
       [ 0.34711754,  0.30834746,  0.3538165 ],
       [-0.4878307 , -0.5460418 ,  0.5268897 ],
       [-0.55937195,  0.54024696, -0.20742199],
       [-0.29072195,  0.28665823, -0.12120688],
       [-0.20033634, -0.03908801, -0.51500547],
       [-0.5081461 ,  0.44304228, -0.08830643]], dtype=float32)>, <tf.Variable 'dense_2/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]


2. Now, change the number of features in input from 11 to 21. How does this change affect the shape of the weight and bias vectors?

In [28]:
# 13388 samples, 11 features as in our dataset
input = tf.ones((5000, 21))
# A fully-connected layer with 3 neurons
layer = layers.Dense(3)
# Calculate the outputs
output = layer(input)
# Print the weights
print(layer.weights)

[<tf.Variable 'dense_3/kernel:0' shape=(21, 3) dtype=float32, numpy=
array([[ 0.0862571 ,  0.41380572,  0.49643922],
       [-0.05824327,  0.49197423,  0.20641243],
       [ 0.21893144, -0.24017906, -0.43543732],
       [ 0.14845884,  0.2034111 , -0.24715662],
       [-0.49297845, -0.3737409 , -0.3927033 ],
       [-0.24269176,  0.07486439, -0.3072039 ],
       [-0.16982019, -0.47522783, -0.24347746],
       [ 0.35459077, -0.17449772, -0.1040051 ],
       [ 0.19645607,  0.07279503, -0.0255338 ],
       [-0.3176787 , -0.27472854, -0.34658146],
       [-0.03967273, -0.46081388,  0.12320101],
       [-0.1772039 ,  0.20319295, -0.27084708],
       [ 0.23490953, -0.25558138, -0.14912665],
       [-0.31256497, -0.17188573,  0.2698233 ],
       [-0.32975376,  0.39006472,  0.12028968],
       [-0.36244082,  0.34840143,  0.25267684],
       [ 0.14136767, -0.02619708, -0.29262817],
       [-0.09993172, -0.23336887, -0.17277384],
       [-0.47303462,  0.25248313, -0.04553413],
       [ 0.16226089

3. Change the number of neurons in `layer` (below where `input` is defined) from 3 to 10. How does this change affect the shape of the weight and bias vectors?

In [29]:
# 13388 samples, 11 features as in our dataset
input = tf.ones((5000, 21))
# A fully-connected layer with 3 neurons
layer = layers.Dense(10)
# Calculate the outputs
output = layer(input)
# Print the weights
print(layer.weights)

[<tf.Variable 'dense_4/kernel:0' shape=(21, 10) dtype=float32, numpy=
array([[ 1.74634814e-01,  3.02489698e-02, -3.85932028e-02,
         1.43705547e-01,  8.27887654e-03, -4.08338249e-01,
        -2.80947387e-02, -3.60466987e-01,  1.32441461e-01,
        -1.37825191e-01],
       [-3.22229177e-01, -8.34048092e-02, -1.77137911e-01,
        -4.04979229e-01, -3.28722179e-01, -3.20237398e-01,
        -3.59046668e-01,  4.48316336e-02,  1.76968515e-01,
        -3.92362416e-01],
       [ 1.13584399e-02, -3.49011511e-01, -2.81731188e-01,
         2.13400960e-01,  2.21962452e-01, -4.71977592e-02,
         3.68815184e-01, -1.36629850e-01, -3.27083498e-01,
         9.42098498e-02],
       [-4.35595214e-01, -3.75793755e-01, -2.29545832e-02,
        -3.77012670e-01, -2.21576750e-01,  2.92596519e-02,
        -4.15438682e-01,  4.44765985e-02,  2.80404329e-01,
        -4.21756119e-01],
       [ 3.39443982e-01, -1.84488744e-01,  1.17498040e-02,
        -3.09018254e-01, -1.77058518e-01,  2.53757596e-01,


<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 6. Neural network model: input layer
*in Machine Learning*

----
Inputs to a neural network are usually not considered the actual transformative layers. They are merely placeholders for data. In Keras, an input for a neural network can be specified with a `tf.keras.layers.InputLayer` object.

<br/>The following code initializes an input layer for a `DataFrame` `my_data` that has 15 columns:

In [30]:
from tensorflow.keras.layers import InputLayer
my_input = InputLayer(input_shape=(15,))

Notice that the `input_shape` parameter has to have its first dimension equal to the number of features in the data. You don’t need to specify the second dimension: the number of samples or batch size.

<br/>The following code avoids hard-coding with using the `.shape` property of the `my_data` DataFrame:

In [31]:
# Get the number of features/dimensions in the data
num_features = features.shape[1]
# Without hard-coding
my_input = tf.keras.layers.InputLayer(input_shape=(num_features,))

The following code adds this input layer to a model instance `my_model`:

In [32]:
my_model.add(my_input)

The following code prints a useful summary of a model instance `my_model`:

In [33]:
print(my_model.summary())

Model: "my first model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
Total params: 0 (0.00 Byte)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None


As you can see, the summary shows that the total number of parameters is 0. This shows you that the input layer has no trainable parameters and is just a placeholder for data.

<br/>*Exercise:*
1. In the `design_model()` function, create a variable called `num_features` and assign it the number of columns in the `features` `DataFrame` using the `.shape` property.

In [34]:
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers

def design_model(features):
    model = Sequential(name = "my_first_model")
    # Your code here
    num_features = features.shape[1]

2. In the `design_model()` function: create a variable called input, assign `input` an instance of `InputLayer`, set the first dimension of the `input_shape` parameter equal to `num_features`. Then add the `input` layer to the model.

In [35]:
def design_model(features):
    model = Sequential(name = "my_first_model")
    #your code here
    num_features = features.shape[1]
    input = layers.InputLayer(input_shape=(num_features,))
    model.add(input)
    return model

3. Use the `.summary()` method to print the summary of the model instance model.

In [36]:
dataset = pd.read_csv('Data/insurance.csv') # Load the dataset
features = dataset.iloc[:,0:6] # Choose first 7 columns as features
labels = dataset.iloc[:,-1] # Choose the final column for prediction

features = pd.get_dummies(features) # One-hot encoding for categorical variables
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) # Split the data into training and test data

# Standardize
ct = ColumnTransformer([('standardize', StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough')
features_train = ct.fit_transform(features_train)
features_test = ct.transform(features_test)

# Invoke the function for our model design
model = design_model(features_train)
# Your code here
print(model.summary())

Model: "my_first_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
Total params: 0 (0.00 Byte)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None


<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 7. Neural network model: output layer
*in Machine Learning*

----
The output layer shape depends on your task. In the case of regression, we need one output for each sample, or, *one output for each prediction required.* For example, if your data has 100 samples, you would expect your output to be a vector with 100 entries - a numerical prediction for each sample.

<br/>In our case, we are doing regression and wish to predict one number for each data point: the medical cost billed by health insurance indicated in the `charges` column in our data. Hence, our output layer has only one neuron.

<br/>The following command adds a layer with one neuron to a model instance `my_model`:

In [37]:
from tensorflow.keras.layers import Dense
my_model.add(Dense(1))

Notice that you don’t need to specify the input shape of this layer since Tensorflow with Keras can automatically infer its shape from the previous layer.

<br/>*Exercise:*
<br/>In a single command, create and add an output layer to the model instance model as an instance of `tensorflow.keras.layers.Dense`.

In [38]:
def design_model(features):
    model = Sequential(name = "my_first_model")
    num_features = features.shape[1]
    input = InputLayer(input_shape=(num_features,))
    model.add(input) # Add the input layer
    # Your code
    model.add(Dense(1))
    return model

# Invoke the function for our model design
model = design_model(features_train)
print(model.summary())

Model: "my_first_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_6 (Dense)             (None, 1)                 12        
                                                                 
Total params: 12 (48.00 Byte)
Trainable params: 12 (48.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None


<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 8. Neural network model: hidden layers
*in Machine Learning*

----
So far we have added one input layer and one output layer to our model. If you think about it, our model currently represents a linear regression. To capture more complex or non-linear interactions among the inputs and outputs neural networks, we’ll need to incorporate hidden layers.

<br/>The following command adds a hidden layer to a model instance `my_model`:

In [39]:
from tensorflow.keras.layers import Dense
my_model.add(Dense(64, activation='relu'))

We chose 64 (2<sup>6</sup>) to be the number of neurons since it makes optimization more efficient due to the binary nature of computation.

<br/>With the `activation` parameter, we specify which activation function we want to have in the output of our hidden layer. There are a number of activation functions such as `softmax`, `sigmoid`, but `ReLU` (relu) (Rectified Linear Unit) is very effective in many applications and we’ll use it here.

<br/>Adding more layers to a neural network naturally increases the number of parameters to be tuned. With every layer, there are associated weight and bias vectors.

<br/>In the diagram below, we show the size of parameter vectors with each layer. In our case, the 1st layer’s weight matrix (red) has shape (11, 64) because we feed 11 features to 64 hidden neurons. The output layer (purple) has the weight matrix of shape (64, 1) because we have 64 input units and 1 neuron in the final layer.
<img src="Images/hidden_layers_diagram.svg" style="width:800px">

<br/>*Exercise:*
<br/>In the `design_model()` function, in a single command, add a new hidden layer to the model instance model with the following parameters: 128 hidden units, a `relu` activation function.

In [40]:
def design_model(features):
    model = Sequential(name = "my_first_model")
    input = InputLayer(input_shape=(features.shape[1],))
    # Add the input layer
    model.add(input) 
    # Add the hidden layer here
    model.add(Dense(128, activation='relu'))
    # Adding an output layer to our model
    model.add(Dense(1)) 
    return model

<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 9. Optimizers
*in Machine Learning*

----
As we mentioned, our goal is for the network to effectively adjust its weights or parameters in order to reach the best performance. We do do this using *backpropagation,* which refers to the computation of gradients with an algorithm known as *gradient descent*. Keras offers a variety of optimizers such as `SGD` (Stochastic Gradient Descent optimizer), `Adam`, `RMSprop`, and others.

<br/>We’ll start by introducing the Adam optimizer:

In [41]:
from tensorflow.keras.optimizers import Adam
opt = Adam(learning_rate=0.01)

The learning rate determines how big of jumps the optimizer makes in the parameter space (weights and bias) and it is considered a *hyperparameter* that can be also tuned. While model parameters are the ones that the model uses to make predictions, hyperparameters determine the learning process (learning rate, number of iterations, optimizer type).

<br/>If the learning rate is set too high, the optimizer will make large jumps and possibly miss the solution. On the other hand, if set too low, the learning process is too slow and might not converge to a desirable solution with the allotted time. Here we’ll use a value of 0.01, which is often used.

<br/>Once the optimizer algorithm is chosen, a model instance `my_model` is compiled with the following code:

In [42]:
my_model.compile(loss='mse',  metrics=['mae'], optimizer=opt)

`loss` denotes the measure of learning success and the lower the loss the better the performance. In the case of regression, the most often used loss function is the Mean Squared Error `mse` (the average squared difference between the estimated values and the actual value).

<br/>Additionally, we want to observe the progress of the Mean Absolute Error (`mae`) while training the model because MAE can give us a better idea than `mse` on how far off we are from the true values in the units we are predicting. In our case, we are predicting `charge` in dollars and MAE will tell us how many dollars we’re off, on average, from the actual values as the network is being trained.

<br/>*Exercise:*
<br/>In the `design_model()` function, create an instance of `Adam` optimizer with 0.01 learning rate and assign the result to a variable called `opt`. Then, in the `design_model()` function, use the `.compile()` method to compile the model instance model with: the `mse` loss, `mae` metrics, `opt` as the optimizer.

In [43]:
def design_model(features):
    model = Sequential(name = "my_first_model")
    # Add the input layer
    input = InputLayer(input_shape=(features.shape[1],))
    model.add(input)
    # Add a hidden layer with 128 neurons
    model.add(Dense(128, activation='relu'))
    # Add an output layer
    model.add(Dense(1))
    # Add the Adam optimizer
    opt = Adam(learning_rate=0.01)
    model.compile(loss='mse', metrics=['mae'], optimizer=opt)
    return model

<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 10. Training and evaluating the model
*in Machine Learning*

----
Now that we built the model we are ready to train the model using the training data.

<br/>The following command trains a model instance `my_model` using training data `my_data` and training labels `my_labels`:

In [None]:
my_model.fit(my_data, my_labels, epochs=50, batch_size=3, verbose=1)

`model.fit()` takes the following parameters:
- `my_data` is the training data set.
- `my_labels` are true labels for the training data points.
- `epochs` refers to the number of cycles through the full training dataset. Since training of neural networks is an iterative process, you need multiple passes through data. Here we chose 50 epochs, but how do you pick a number of epochs? Well, it is hard to give one answer since it depends on your dataset. Amongst others, this is a hyperparameter that can be tuned — which we’ll cover later.
- `batch_size` is the number of data points to work through before updating the model parameters. It is also a hyperparameter that can be tuned.
- `verbose = 1` will show you the progress bar of the training.

<br/>When the training is finalized, we use the trained model to predict values for samples that the training procedure haven’t seen: the *test set*.

<br/>The following commands evaluates the model instance `my_model` using the test data `my_data` and test labels `my_labels`:

In [None]:
val_mse, val_mae = my_model.evaluate(my_data, my_labels, verbose = 0)

In our case, `model.evaluate()` returns the value for our chosen loss metrics (`mse`) and for an additional metrics (`mae`).

<br/>So what is the final result? We should get ~$3884.21. This means that on average we’re off with our prediction by around 3800 dollars. Is that a good result or a bad result?

<br/>Often you need an expert or domain knowledge to decide this. What is an acceptable error for the application? Is $3800 a big error when deciding on insurance charges? Can you do better and how? As you see, the process doesn’t stop here.

<br/>*Exercise:*
1. Using the `.fit()` method, train the model instance model with: the training data `features_train`, training labels `labels_train`, 40 epochs, batch size equal to 1, verbose set to true (1).

In [45]:
model = design_model(features_train)
# Fit the model using 40 epochs and batch size 1
model.fit(features_train, labels_train, epochs=40, batch_size=1, verbose=1)

Epoch 1/40


Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<keras.src.callbacks.History at 0x2d4ff76fc10>

2. Using the .evaluate() method, evaluate the model instance model with: the test data `features_test`, test labels `labels_test`, the verbose parameter set to false (0). Assign the result to variables `val_mse` and `val_mae`, respectively.

In [46]:
# Evaluate the model on the test data
val_mse, val_mae = model.evaluate(features_test, labels_test, verbose=0)
print("MAE: ", val_mae)

MAE:  2632.363037109375


<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 11. Summary
*of implementing Neural Networks*

----
Congrats! You have built your neural network, trained it, and evaluated it using TensorFlow with Keras. To remind you, these are the concepts you learned in this lesson:

<br/>A. Preparing the data for learning:
- separating features from labels using array slicing
- determining the shape of your data
- preprocessing the categorical variables using one-hot encoding
- splitting the data into training and test sets
- scaling the numerical features

<br/>B. Designing a Sequential model by chaining `InputLayer()` and the `tf.keras.layers.Dense layers`. `InputLayer()` was used as a placeholder for the input data. The output layer in this case needed one neuron since we need a prediction of a single value in the regression. And finally, hidden layers were added with the `relu` activation function to handle complex dependencies in the data.

<br/>C. Choosing an optimizer using `keras.optimizers` with a specific learning rate hyperparameter.

<br/>D. Training the model - using `model.fit()` to train the model on the training data and training labels.

<br/>E. Setting the values for the learning hyperparameters: number of epochs and batch sizes.

<br/>F. Evaluating the model using `model.evaluate()` on the test data.

<br/>You might be wondering, what do I do with the plethora of hyperparameters? Or why if I use different random states I receive different results? Plus, how I can guarantee that my good performance isn’t just good luck? And you are right! This is not the full story. In machine learning, we tweak the hyperparameters using a better evaluation methodology — something we’ll cover next.

<br/>*Exercise: put it all together!*

In [3]:
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import Normalizer, StandardScaler
from sklearn.compose import ColumnTransformer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import InputLayer, Dense
from tensorflow.keras.optimizers import Adam

'''A. IMPORT DATA'''
dataset = pd.read_csv('Data/insurance.csv') # 1. Import the data as a pandas dataframe.
features = dataset.iloc[:,0:6]  # 2. Dataframe slicing using iloc, this designates the first 6 rows as features (independent variables).
labels = dataset.iloc[:,-1] # 3. We select the last column with -1, which is designated as target variables (dependent variable).

'''B. DATA PRE-PROCESSING'''
features = pd.get_dummies(features) # 4. One-hot encoding for categorical variables.
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) # 5. Split the data into training (67%) and test (33%) data.
# ct = ColumnTransformer([('normalize', Normalizer(), ['age', 'bmi', 'children'])], remainder='passthrough') # 6a. Create a new ColumnTransformer instance and normalize the data (scale the numerical features to a fixed range, usually between 0 and 1).
ct = ColumnTransformer([('standardize', StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough') # 6b. You may also standardize the data (rescale features to zero mean and unit variance).
features_train = ct.fit_transform(features_train) # 7. Fit the column transformer to the features_train DataFrame.
features_test = ct.transform(features_test) # 8. Transform the trained column transformer my_ct to the features_test DataFrame.

'''C. CREATE THE MODEL'''
def design_model(features):
    model = Sequential(name = "My_Sequential_Model") # 9. The most frequently used model in TensorFlow is Keras "Sequential". A sequential model allows us to create models layer-by-layer in a step-by-step fashion. 
    model.add(InputLayer(input_shape=(features.shape[1],))) # 10. Add the input layer. Notice that the input_shape parameter has to have its first dimension equal to the number of features (features.shape[1]) in the data. You don’t need to specify the second dimension: the number of samples or batch size.
    model.add(Dense(128, activation='relu')) # 11. Add a hidden layer with 128 neurons. With the activation parameter, we specify which activation function (ReLu) we want to have in the output of our hidden layer.
    model.add(Dense(1)) # 12. Add an output layer. The output layer shape depends on your task. In the case of regression, we need one output for each prediction required.
    opt = Adam(learning_rate=0.01) # 13. Optimize the model's weights using backpropagation, which refers to the computation of gradients with an algorithm known as gradient descent. Keras offers a variety of optimizers: SGD, Adam, RMSprop, and others. Also determine the learning rate.
    model.compile(loss='mse', metrics=['mae'], optimizer=opt) # 14. Compile an instance of the model. "loss" denotes the measure of learning success and is inversely proportional to performance. The most commonly used loss function is the Mean Squared Error (mse).
    return model

'''D. TRAIN THE MODEL'''
model = design_model(features_train) # 15. Construct the model
model.fit(features_train, labels_train, epochs=50, batch_size=1, verbose=0) # 16. Fit the model using 50 epochs and batch size 1.

'''E. TEST THE MODEL'''
val_mse, val_mae = model.evaluate(features_test, labels_test, verbose=0) # 17. Evaluate the model on the test data.
print("MAE: ", val_mae)  # 18. We want to observe the progress of the Mean Absolute Error (mae) while training the model because MAE can give us a better idea than mse on how far off we are from the true values in the units we are predicting.



MAE:  2718.997802734375
