<a href="https://colab.research.google.com/github/zelal-Eizaldeen/deeplearning_course/blob/main/3_12tf_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

- In this programming example, we will implement a network that solves a **regression problem** using **TensorFlow.** In this Google Colab notebook, we will look at how to use TensorFlow to do a regression problem.

So in the regression problem, we are **predicting a numerical value** instead of predicting the class, and we will use the **California housing dataset**. So each training example represents a house in California, and there are a number of variables about that house, such as **its size and location** and things like that. And then the **target value is the price of the hous**e.

The idea here is, given the number of variables of a house try to predict what the cost of that house is. And that's a **regression problem** because we're now predicting a numerical value.

In [None]:
"""
The MIT License (MIT)
Copyright (c) 2021 NVIDIA
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
"""

We are going to train for **256 epics** and use a batch size of 128. So we start with doing some imports including importing the California housing data set. And this is not included in TensorFlow, so we are getting it from sklearn instead.

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
import numpy as np
import logging
tf.get_logger().setLevel(logging.ERROR)

EPOCHS = 256
BATCH_SIZE = 128



We are loading the data set, getting the data values and the target values, and then we use a function here from sklearn to split it into a train set and a test set. So we say that we want **to use 20% of the examples as test examples**.

In [2]:
# Read dataset and split into train and test.
california_housing = fetch_california_housing()
data = california_housing.get('data')
target = california_housing.get('target')
raw_x_train, raw_x_test, y_train, y_test = train_test_split(
    data, target, test_size=0.2, random_state=0)



We want to **standardize the data**. We're passing this argument
- axis=0 means “average down the rows for each column,” so you get a 1-D array of length D: the mean of each feature across all samples, which means that we're standardizing each variable individually. So each individual input variable, **there are eight of them will be standardized individually.**
- without this argument -> you get one single number: the average of all elements in the table (both columns combined).


In [3]:
# Standardize the data.
x_mean = np.mean(raw_x_train, axis=0)
x_stddev = np.std(raw_x_train, axis=0)


And then we calculate the **standardized training data and test data**.

In [4]:
x_train =(raw_x_train - x_mean) / x_stddev
x_test =(raw_x_test - x_mean) / x_stddev

 and then we're gonna **create the model**. And this is a simple model.

We use the **sequential API**. We have a **input layer with 32 neurons**. We use t**he activation function relu**, and we say here that we have **eight input** variables. And then **the second layer, the output layer is a single neuron** because we just **predict a single value**. And its activation will be a **linear activation function**

And when we're doing regression, we want to use **the mean squared error loss function**. We're using the **adam optimizer**. We When we talk about regression, we look at** how far off are we from the true value**. So we will print out the **mean absolute error to see how close we are getting**.

In [5]:
# Create and train model.
model = Sequential()
model.add(Dense(32, activation='relu', input_shape=[8]))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam',
              metrics =['mean_absolute_error'])


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


And then we are calling model summary. That's a way of just printing out the models. We can see what it looks like before we do the training.

In [6]:
model.summary()


So we then do the **training where we call the fit function **with the training data and the validation data for this number of epics and the batch size.

In [7]:
history = model.fit(x_train, y_train, validation_data=(
    x_test, y_test), epochs=EPOCHS, batch_size=BATCH_SIZE,
    verbose=2, shuffle=True)

Epoch 1/256
129/129 - 1s - 10ms/step - loss: 2.2366 - mean_absolute_error: 1.0739 - val_loss: 2.5377 - val_mean_absolute_error: 0.7400
Epoch 2/256
129/129 - 0s - 2ms/step - loss: 0.9779 - mean_absolute_error: 0.6604 - val_loss: 1.6867 - val_mean_absolute_error: 0.6346
Epoch 3/256
129/129 - 0s - 3ms/step - loss: 0.7683 - mean_absolute_error: 0.5983 - val_loss: 1.0988 - val_mean_absolute_error: 0.5864
Epoch 4/256
129/129 - 0s - 3ms/step - loss: 0.6306 - mean_absolute_error: 0.5557 - val_loss: 0.7400 - val_mean_absolute_error: 0.5449
Epoch 5/256
129/129 - 0s - 4ms/step - loss: 0.5401 - mean_absolute_error: 0.5213 - val_loss: 0.5808 - val_mean_absolute_error: 0.5167
Epoch 6/256
129/129 - 1s - 5ms/step - loss: 0.4849 - mean_absolute_error: 0.4971 - val_loss: 0.4813 - val_mean_absolute_error: 0.4957
Epoch 7/256
129/129 - 1s - 4ms/step - loss: 0.4521 - mean_absolute_error: 0.4819 - val_loss: 0.4476 - val_mean_absolute_error: 0.4824
Epoch 8/256
129/129 - 0s - 2ms/step - loss: 0.4314 - mean_abs

So we can see here the model summary that was printed out where we see it's the first layer has 32 neurons and 288 trainable parameters. The second layer has a single neuron with 33 parameters. And then we can see how the training process has proceeded here.

And we are already done with 256 epics. And we see here that we have a loss of 0.2845, and we have a validation loss of 0.2923.

And now we can see if we want to use this **trained network to do some predictions**. **We take the entire test set and use the model to predict.** So that's will give us a prediction for all of the examples, but then we'll just print out the first 3, and we'll print out both the** prediction as well as the true value** and see how close they are to each other. And we can see here that the predicted value is somewhat resembling the true value. It's not perfect.

In [8]:
# Print first 3 predictions.
predictions = model.predict(x_test)
for i in range(0, 3):
    print('Prediction: ', predictions[i],
          ', true value: ', y_test[i])

[1m129/129[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step  
Prediction:  [1.5840333] , true value:  1.369
Prediction:  [2.5435379] , true value:  2.413
Prediction:  [1.378815] , true value:  2.007


# Modified Version



Let's modify this network a little bit and see if we can get to something better.

The changes that I've done here is to **add some more layers**. So and **increase the size of the layer.** So we have the first layer has instead of 32 neurons, it has 256 neurons. And then we **have another hidden layer with 256 neurons.** **And then the output**. So it's a three layer network instead of a two layer network.
- But when I ran it the first time, I saw that we got overfitting. In other words, I saw that the **training error was much lower than the test error**. To mitigate overfitting, **I added dropout regularization here in between the layers as well**.


In [10]:
from tensorflow.keras.layers import Dropout

In [11]:
# Create and train model.
model = Sequential()
model.add(Dense(256, activation='relu', input_shape=[8]))
model.add(Dropout(0.3))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam',
              metrics =['mean_absolute_error'])
model.summary()
history = model.fit(x_train, y_train, validation_data=(
    x_test, y_test), epochs=EPOCHS, batch_size=BATCH_SIZE,
    verbose=2, shuffle=True)

Epoch 1/256
129/129 - 2s - 15ms/step - loss: 1.5829 - mean_absolute_error: 0.7014 - val_loss: 0.5654 - val_mean_absolute_error: 0.4989
Epoch 2/256
129/129 - 1s - 5ms/step - loss: 0.5165 - mean_absolute_error: 0.5128 - val_loss: 0.3990 - val_mean_absolute_error: 0.4450
Epoch 3/256
129/129 - 1s - 5ms/step - loss: 0.4689 - mean_absolute_error: 0.4871 - val_loss: 0.3771 - val_mean_absolute_error: 0.4320
Epoch 4/256
129/129 - 1s - 5ms/step - loss: 0.4302 - mean_absolute_error: 0.4698 - val_loss: 0.3802 - val_mean_absolute_error: 0.4322
Epoch 5/256
129/129 - 1s - 9ms/step - loss: 0.4353 - mean_absolute_error: 0.4632 - val_loss: 0.3889 - val_mean_absolute_error: 0.4196
Epoch 6/256
129/129 - 1s - 9ms/step - loss: 0.4070 - mean_absolute_error: 0.4542 - val_loss: 0.5329 - val_mean_absolute_error: 0.4207
Epoch 7/256
129/129 - 1s - 6ms/step - loss: 0.4103 - mean_absolute_error: 0.4468 - val_loss: 0.3582 - val_mean_absolute_error: 0.4089
Epoch 8/256
129/129 - 1s - 5ms/step - loss: 0.3923 - mean_abs

- It's going to train that for 256 epics, and that'll be **a little bit slower **because we have more layers, but it's still going pretty fast because it's a  simple problem with a small data set. We can now look at the **loss function**, which remember previously it was about 0.28.


So it improved **the network accuracy a little bit by making it a deeper network.** And we can then see the predicted values here that are now a little bit closer to what we wanted. We can experiment with this. One thing to do would be to remove the dropout and see what happens. And I would expect that you would see then that the training loss would go down further, but the validation loss wouldn't be helped by that. So we'll see more overfitting in that case.

# Refrence
Learning Deep Learning Book