## Download dataset

the dataset can be found under /content/autoencoder/data/ml-1m

In [1]:
!git clone https://github.com/tonylaioffer/autoencoder.git

Cloning into 'autoencoder'...
remote: Enumerating objects: 6, done.[K
remote: Counting objects: 100% (6/6), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 174 (delta 0), reused 1 (delta 0), pack-reused 168[K
Receiving objects: 100% (174/174), 17.58 MiB | 14.71 MiB/s, done.
Resolving deltas: 100% (132/132), done.


## Define data process methods

---



In [0]:
import tensorflow as tf
import os


def _get_training_data(FLAGS):  
    ''' Buildind the input pipeline for training and inference using TFRecords files.
    @return data only for the training
    @return data for the inference
    '''
    '''
    I guess here it's not read the entire dataset into memory in one time, and this is
    a lazy operation, need session to activate it, so in this operation, it rules certain
    actions in series:
    -create TFRecordDataset to read files
    -map this binary TFRecord dataset to contains feature 'movie_ratings'
    -shuffle it to randomly extract 500 in buffer each time
    -repeat this action infinite times ( i guess it's would end while all data are processed)
    -from buffer get a batch of data
    -prefetch one datapoint from the batch each time in order to iterative process data
    all above are actions, thus we can see in train stage, it initilize a iterator 
    
    '''
    
    filenames = [os.path.join(FLAGS['tf_records_train_path'], f) for f in os.listdir(FLAGS['tf_records_train_path'])]
    
    dataset = tf.data.TFRecordDataset(filenames)
    #Creates a TFRecordDataset to read one or more TFRecord files.
    dataset = dataset.map(parse)
    #Maps map_func across the elements of this dataset.
    #This transformation applies map_func to each element of this dataset, and returns a new dataset containing the transformed elements,
    #in the same order as they appeared in the input
    dataset = dataset.shuffle(buffer_size=500) #Randomly shuffles a tensor along its first dimension.
                                               #buffer_size representing the number of elements from this dataset from which the new dataset will sample.
    dataset = dataset.repeat()
    # why repeat?
    dataset = dataset.batch(FLAGS['batch_size'])
    #Combines consecutive elements of this dataset into batches.
    dataset = dataset.prefetch(buffer_size=1)
    #Creates a Dataset that prefetches elements from this dataset.
    
    '''
    dataset 2 is used to validation, here called infer
    shuffle with buffer size 1 and batch with size 1 is because for validation, we only need one datapoint each time
    to get corresponding prediction
    but for train, we use batch train to speed up
    '''
    dataset2 = tf.data.TFRecordDataset(filenames)
    dataset2 = dataset2.map(parse)
    dataset2 = dataset2.shuffle(buffer_size=1)
    # why dataset2? why buffer_size is 1 here?
    dataset2 = dataset2.repeat()
    dataset2 = dataset2.batch(1)
    # why batch size is 1?
    dataset2 = dataset2.prefetch(buffer_size=1)

    return dataset, dataset2


def _get_test_data(FLAGS):
    ''' Buildind the input pipeline for test data.'''

    filenames = [os.path.join(FLAGS['tf_records_test_path'], f) for f in os.listdir(FLAGS['tf_records_test_path'])]

    dataset = tf.data.TFRecordDataset(filenames)
    dataset = dataset.map(parse)
    dataset = dataset.shuffle(buffer_size=1)
    dataset = dataset.repeat()
    dataset = dataset.batch(1)
    dataset = dataset.prefetch(buffer_size=1)

    return dataset


def parse(serialized):
    ''' Parser for the TFRecords file.'''

    features = {'movie_ratings':tf.FixedLenFeature([3952], tf.float32),  
              } # the shape is 3952? what's that mean? ! the movie_ID range between 1 and 3952
    parsed_example = tf.parse_single_example(serialized,
                                           features=features,
                                           )
    movie_ratings = tf.cast(parsed_example['movie_ratings'], tf.float32)
    
    return movie_ratings

In [0]:
#filenames = [os.path.join(FLAGS['tf_records_train_path'], f) for f in os.listdir(FLAGS['tf_records_train_path'])]
    
#dataset = tf.data.TFRecordDataset(filenames)

In [0]:
#dataset.map(parse)

## Define autoencoder architecture

### 1. Sigmoid Function

In [0]:
# import model_helper


def _get_bias_initializer():
    return tf.zeros_initializer()


def _get_weight_initializer():
    return tf.random_normal_initializer(mean=0.0, stddev=0.05)


class DAE: #Data Acquisition Engine
    
    def __init__(self, FLAGS):
        ''' Implementation of deep autoencoder class.'''
        
        self.FLAGS = FLAGS
        self.weight_initializer = _get_weight_initializer()
        self.bias_initializer = _get_bias_initializer()
        self.init_parameters()
        

    def init_parameters(self):
        '''Initialize networks weights and biasis.'''
        
        with tf.name_scope('weights'):
          #This context manager validates that the given values are from the same graph,  
          #makes that graph the default graph, and pushes a name scope in that graph
            self.W_1 = tf.get_variable(name='weight_1', shape=(self.FLAGS['num_v'], self.FLAGS['num_h']),
                                       initializer=self.weight_initializer) #Gets an existing variable with these parameters or create a new one
            self.W_2 = tf.get_variable(name='weight_2', shape=(self.FLAGS['num_h'], self.FLAGS['num_h']),
                                       initializer=self.weight_initializer)
            self.W_3 = tf.get_variable(name='weight_3', shape=(self.FLAGS['num_h'], self.FLAGS['num_h']),
                                       initializer=self.weight_initializer)
            self.W_4 = tf.get_variable(name='weight_4', shape=(self.FLAGS['num_h'], self.FLAGS['num_v']),
                                       initializer=self.weight_initializer)
        
        with tf.name_scope('biases'):
            self.b1 = tf.get_variable(name='bias_1', shape=(self.FLAGS['num_h']),
                                      initializer=self.bias_initializer)
            self.b2 = tf.get_variable(name='bias_2', shape=(self.FLAGS['num_h']),
                                      initializer=self.bias_initializer)
            self.b3 = tf.get_variable(name='bias_3', shape=(self.FLAGS['num_h']),
                                      initializer=self.bias_initializer)
        
    def _inference(self, x):
        ''' Making one forward pass. Predicting the networks outputs.
        @param x: input ratings
        
        @return : networks predictions
        '''\
        
        with tf.name_scope('inference'):
             a1 = tf.nn.sigmoid(tf.nn.bias_add(tf.matmul(x, self.W_1),self.b1)) # sign(W1T*X+b1)
             a2 = tf.nn.sigmoid(tf.nn.bias_add(tf.matmul(a1, self.W_2),self.b2))
             a3 = tf.nn.sigmoid(tf.nn.bias_add(tf.matmul(a2, self.W_3),self.b3))
             a4 = tf.matmul(a3, self.W_4)
        return a4
    
    def _compute_loss(self, predictions, labels, num_labels):
        ''' Computing the Mean Squared Error loss between the input and output of the network.
            
          @param predictions: predictions of the stacked autoencoder
          @param labels: input values of the stacked autoencoder which serve as labels at the same time
          @param num_labels: number of labels !=0 in the data set to compute the mean
            
          @return mean squared error loss tf-operation
          '''
            
        with tf.name_scope('loss'):
            loss_op = tf.div(tf.reduce_sum(tf.square(tf.subtract(predictions,labels))),num_labels)
            return loss_op
          
        

    def _optimizer(self, x):
        '''Optimization of the network parameter through stochastic gradient descent.
            
            @param x: input values for the stacked autoencoder.
            
            @return: tensorflow training operation
            @return: ROOT!! mean squared error
        '''
        
        outputs = self._inference(x)
        # ? the mask is same as x ?
        mask = tf.where(tf.equal(x,0.0), tf.zeros_like(x), x) # indices of 0 values in the training set
        # tf.zero_like : Creates a tensor with all elements set to zero.
        # The condition tensor acts as a mask that chooses, based on the value at each element, 
        # whether the corresponding element / row in the output should be taken from x (if true) or y (if false).
        num_train_labels = tf.cast(tf.count_nonzero(mask),dtype=tf.float32) # number of non zero values in the training set
        bool_mask = tf.cast(mask,dtype=tf.bool) # boolean mask
        outputs = tf.where(bool_mask, outputs, tf.zeros_like(outputs)) # set the output values to zero if corresponding input values are zero
        # ? why set output values to zero? does'nt it change the output?

        MSE_loss = self._compute_loss(outputs,x,num_train_labels)
        
        if self.FLAGS['l2_reg'] == True:
            l2_loss = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables()]) # Returns all variables created with trainable=True.
            MSE_loss = MSE_loss +  self.FLAGS['lambda_'] * l2_loss
        
        train_op = tf.train.AdamOptimizer(self.FLAGS['learning_rate']).minimize(MSE_loss) #An Operation that updates the variables in var_list
        RMSE_loss = tf.sqrt(MSE_loss)

        return train_op, RMSE_loss
    
    def _validation_loss(self, x_train, x_test):
        ''' Computing the loss during the validation time.
            
          @param x_train: training data samples
          @param x_test: test data samples
            
          @return networks predictions
          @return root mean squared error loss between the predicted and actual ratings
          '''
        
        outputs = self._inference(x_train) # use training sample to make prediction
        mask = tf.where(tf.equal(x_test,0.0), tf.zeros_like(x_test), x_test) # identify the zero values in the test ste
        num_test_labels = tf.cast(tf.count_nonzero(mask),dtype=tf.float32) # count the number of non zero values
        bool_mask = tf.cast(mask,dtype=tf.bool) 
        outputs = tf.where(bool_mask, outputs, tf.zeros_like(outputs))
    
        MSE_loss = self._compute_loss(outputs, x_test, num_test_labels)
        RMSE_loss = tf.sqrt(MSE_loss)
            
        return outputs, RMSE_loss

## Train model

In [0]:
import numpy as np


def train(FLAGS):
    '''Building the graph, opening of a session and starting the training od the neural network.'''
    
    num_batches = int(FLAGS['num_samples']/FLAGS['batch_size'])

    with tf.Graph().as_default():

        train_data, train_data_infer = _get_training_data(FLAGS)
        test_data = _get_test_data(FLAGS)
        
        iter_train = train_data.make_initializable_iterator()
        #Creates a tf.data.Iterator for enumerating the elements of a dataset.
        iter_train_infer = train_data_infer.make_initializable_iterator()
        iter_test = test_data.make_initializable_iterator()
        
        x_train = iter_train.get_next() #Returns a nested structure of tf.Tensors representing the next element.
        x_train_infer = iter_train_infer.get_next()
        x_test = iter_test.get_next()

        model = DAE(FLAGS)

        train_op, train_loss_op = model._optimizer(x_train)
        pred_op, test_loss_op = model._validation_loss(x_train_infer, x_test)
       
        with tf.Session() as sess: #A class for running TensorFlow operations
            
            sess.run(tf.global_variables_initializer())
            train_loss = 0
            test_loss = 0

            for epoch in range(FLAGS['num_epoch']):
                
                sess.run(iter_train.initializer) #The returned iterator will be in an uninitialized state, 
                                                 #and you must run the iterator.initializer operation before using it
                
                for batch_nr in range(num_batches):
                    
                    _, loss_ = sess.run((train_op, train_loss_op))
                    train_loss += loss_
              
                sess.run(iter_train_infer.initializer)
                sess.run(iter_test.initializer)

                for i in range(FLAGS['num_samples']):
                    pred, loss_ = sess.run((pred_op, test_loss_op))
                    test_loss += loss_
                    
                print('epoch_nr: %i, train_loss: %.3f, test_loss: %.3f'%(epoch,(train_loss/num_batches), (test_loss/FLAGS['num_samples'])))
                train_loss = 0
                test_loss = 0

### 1.1. Sigmoid Function without  L2 Regularization

In [45]:
FLAGS = {'tf_records_train_path': '/content/autoencoder/data/ml-1m/train/',  # Path of the training data
         'tf_records_test_path': '/content/autoencoder/data/ml-1m/test/',  # Path of the test data
         'num_epoch': 100,  # Number of training epochs
         'batch_size': 16,  # Size of the training batch
         'learning_rate': 5e-4,  # Learning_Rate
         'l2_reg': False,  # L2 regularization
         'lambda_': 0.01,  # Wight decay factor
         'num_v': 3952,  # Number of visible neurons (Number of movies the users rated.
         'num_h': 128,  # Number of hidden neurons
         'num_samples': 5953}  # Number of training samples (Number of users, who gave a rating)


train(FLAGS)

epoch_nr: 0, train_loss: 1.339, test_loss: 0.978
epoch_nr: 1, train_loss: 0.989, test_loss: 0.975
epoch_nr: 2, train_loss: 0.991, test_loss: 0.973
epoch_nr: 3, train_loss: 0.990, test_loss: 0.976
epoch_nr: 4, train_loss: 0.992, test_loss: 0.972
epoch_nr: 5, train_loss: 0.984, test_loss: 0.979
epoch_nr: 6, train_loss: 0.968, test_loss: 0.985
epoch_nr: 7, train_loss: 0.955, test_loss: 0.994
epoch_nr: 8, train_loss: 0.941, test_loss: 1.006
epoch_nr: 9, train_loss: 0.931, test_loss: 1.008
epoch_nr: 10, train_loss: 0.926, test_loss: 1.012
epoch_nr: 11, train_loss: 0.924, test_loss: 1.014
epoch_nr: 12, train_loss: 0.920, test_loss: 1.017
epoch_nr: 13, train_loss: 0.918, test_loss: 1.020
epoch_nr: 14, train_loss: 0.918, test_loss: 1.019
epoch_nr: 15, train_loss: 0.915, test_loss: 1.021
epoch_nr: 16, train_loss: 0.908, test_loss: 1.029
epoch_nr: 17, train_loss: 0.901, test_loss: 1.036
epoch_nr: 18, train_loss: 0.895, test_loss: 1.043
epoch_nr: 19, train_loss: 0.891, test_loss: 1.047
epoch_nr: 

### 1.2. Sigmoid Function with L2 Regularization

In [0]:
FLAGS = {'tf_records_train_path': '/content/autoencoder/data/ml-1m/train/',  # Path of the training data
         'tf_records_test_path': '/content/autoencoder/data/ml-1m/test/',  # Path of the test data
         'num_epoch': 100,  # Number of training epochs
         'batch_size': 16,  # Size of the training batch
         'learning_rate': 5e-4,  # Learning_Rate
         'l2_reg': True,  # L2 regularization
         'lambda_': 0.01,  # Wight decay factor
         'num_v': 3952,  # Number of visible neurons (Number of movies the users rated.
         'num_h': 128,  # Number of hidden neurons
         'num_samples': 5953}  # Number of training samples (Number of users, who gave a rating)


train(FLAGS)

epoch_nr: 0, train_loss: 2.441, test_loss: 1.092
epoch_nr: 1, train_loss: 1.563, test_loss: 1.088
epoch_nr: 2, train_loss: 1.504, test_loss: 1.088
epoch_nr: 3, train_loss: 1.498, test_loss: 1.094
epoch_nr: 4, train_loss: 1.497, test_loss: 1.090
epoch_nr: 5, train_loss: 1.496, test_loss: 1.089
epoch_nr: 6, train_loss: 1.496, test_loss: 1.083
epoch_nr: 7, train_loss: 1.498, test_loss: 1.093
epoch_nr: 8, train_loss: 1.497, test_loss: 1.088
epoch_nr: 9, train_loss: 1.497, test_loss: 1.097
epoch_nr: 10, train_loss: 1.498, test_loss: 1.086
epoch_nr: 11, train_loss: 1.497, test_loss: 1.091
epoch_nr: 12, train_loss: 1.500, test_loss: 1.083
epoch_nr: 13, train_loss: 1.496, test_loss: 1.087
epoch_nr: 14, train_loss: 1.497, test_loss: 1.086
epoch_nr: 15, train_loss: 1.496, test_loss: 1.095
epoch_nr: 16, train_loss: 1.497, test_loss: 1.080
epoch_nr: 17, train_loss: 1.497, test_loss: 1.086
epoch_nr: 18, train_loss: 1.497, test_loss: 1.083
epoch_nr: 19, train_loss: 1.498, test_loss: 1.083
epoch_nr: 

### 2. ReLU Activation Function

In [0]:
# import model_helper


def _get_bias_initializer():
    return tf.zeros_initializer()


def _get_weight_initializer():
    return tf.random_normal_initializer(mean=0.0, stddev=0.05)


class DAE_ReLU: #Data Acquisition Engine
    
    def __init__(self, FLAGS):
        ''' Implementation of deep autoencoder class.'''
        
        self.FLAGS = FLAGS
        self.weight_initializer = _get_weight_initializer()
        self.bias_initializer = _get_bias_initializer()
        self.init_parameters()
        

    def init_parameters(self):
        '''Initialize networks weights and biasis.'''
        
        with tf.name_scope('weights'):
          #This context manager validates that the given values are from the same graph,  
          #makes that graph the default graph, and pushes a name scope in that graph
            self.W_1 = tf.get_variable(name='weight_1', shape=(self.FLAGS['num_v'], self.FLAGS['num_h']),
                                       initializer=self.weight_initializer) #Gets an existing variable with these parameters or create a new one
            self.W_2 = tf.get_variable(name='weight_2', shape=(self.FLAGS['num_h'], self.FLAGS['num_h']),
                                       initializer=self.weight_initializer)
            self.W_3 = tf.get_variable(name='weight_3', shape=(self.FLAGS['num_h'], self.FLAGS['num_h']),
                                       initializer=self.weight_initializer)
            self.W_4 = tf.get_variable(name='weight_4', shape=(self.FLAGS['num_h'], self.FLAGS['num_v']),
                                       initializer=self.weight_initializer)
        
        with tf.name_scope('biases'):
            self.b1 = tf.get_variable(name='bias_1', shape=(self.FLAGS['num_h']),
                                      initializer=self.bias_initializer)
            self.b2 = tf.get_variable(name='bias_2', shape=(self.FLAGS['num_h']),
                                      initializer=self.bias_initializer)
            self.b3 = tf.get_variable(name='bias_3', shape=(self.FLAGS['num_h']),
                                      initializer=self.bias_initializer)
        
    def _inference(self, x):
        ''' Making one forward pass. Predicting the networks outputs.
        @param x: input ratings
        
        @return : networks predictions
        '''\
        
        with tf.name_scope('inference'):
             a1 = tf.nn.relu(tf.nn.bias_add(tf.matmul(x, self.W_1),self.b1)) # sign(W1T*X+b1)
             a2 = tf.nn.relu(tf.nn.bias_add(tf.matmul(a1, self.W_2),self.b2))
             a3 = tf.nn.relu(tf.nn.bias_add(tf.matmul(a2, self.W_3),self.b3))
             a4 = tf.matmul(a3, self.W_4)
        return a4
    
    def _compute_loss(self, predictions, labels, num_labels):
        ''' Computing the Mean Squared Error loss between the input and output of the network.
            
          @param predictions: predictions of the stacked autoencoder
          @param labels: input values of the stacked autoencoder which serve as labels at the same time
          @param num_labels: number of labels !=0 in the data set to compute the mean
            
          @return mean squared error loss tf-operation
          '''
            
        with tf.name_scope('loss'):
            loss_op = tf.div(tf.reduce_sum(tf.square(tf.subtract(predictions,labels))),num_labels)
            return loss_op
          
        

    def _optimizer(self, x):
        '''Optimization of the network parameter through stochastic gradient descent.
            
            @param x: input values for the stacked autoencoder.
            
            @return: tensorflow training operation
            @return: ROOT!! mean squared error
        '''
        
        outputs = self._inference(x)
        # ? the mask is same as x ?
        mask = tf.where(tf.equal(x,0.0), tf.zeros_like(x), x) # indices of 0 values in the training set
        # tf.zero_like : Creates a tensor with all elements set to zero.
        # The condition tensor acts as a mask that chooses, based on the value at each element, 
        # whether the corresponding element / row in the output should be taken from x (if true) or y (if false).
        num_train_labels = tf.cast(tf.count_nonzero(mask),dtype=tf.float32) # number of non zero values in the training set
        bool_mask = tf.cast(mask,dtype=tf.bool) # boolean mask
        outputs = tf.where(bool_mask, outputs, tf.zeros_like(outputs)) # set the output values to zero if corresponding input values are zero
        # ? why set output values to zero? does'nt it change the output?

        MSE_loss = self._compute_loss(outputs,x,num_train_labels)
        
        if self.FLAGS['l2_reg'] == True:
            l2_loss = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables()]) # Returns all variables created with trainable=True.
            MSE_loss = MSE_loss +  self.FLAGS['lambda_'] * l2_loss
        
        train_op = tf.train.AdamOptimizer(self.FLAGS['learning_rate']).minimize(MSE_loss) #An Operation that updates the variables in var_list
        RMSE_loss = tf.sqrt(MSE_loss)

        return train_op, RMSE_loss
    
    def _validation_loss(self, x_train, x_test):
        ''' Computing the loss during the validation time.
            
          @param x_train: training data samples
          @param x_test: test data samples
            
          @return networks predictions
          @return root mean squared error loss between the predicted and actual ratings
          '''
        
        outputs = self._inference(x_train) # use training sample to make prediction
        mask = tf.where(tf.equal(x_test,0.0), tf.zeros_like(x_test), x_test) # identify the zero values in the test ste
        num_test_labels = tf.cast(tf.count_nonzero(mask),dtype=tf.float32) # count the number of non zero values
        bool_mask = tf.cast(mask,dtype=tf.bool) 
        outputs = tf.where(bool_mask, outputs, tf.zeros_like(outputs))
    
        MSE_loss = self._compute_loss(outputs, x_test, num_test_labels)
        RMSE_loss = tf.sqrt(MSE_loss)
            
        return outputs, RMSE_loss

In [0]:
import numpy as np


def train_with_ReLU(FLAGS):
    '''Building the graph, opening of a session and starting the training od the neural network.'''
    
    num_batches = int(FLAGS['num_samples']/FLAGS['batch_size'])

    with tf.Graph().as_default():

        train_data, train_data_infer = _get_training_data(FLAGS)
        test_data = _get_test_data(FLAGS)
        
        iter_train = train_data.make_initializable_iterator()
        #Creates a tf.data.Iterator for enumerating the elements of a dataset.
        iter_train_infer = train_data_infer.make_initializable_iterator()
        iter_test = test_data.make_initializable_iterator()
        
        x_train = iter_train.get_next() #Returns a nested structure of tf.Tensors representing the next element.
        x_train_infer = iter_train_infer.get_next()
        x_test = iter_test.get_next()

        model = DAE_ReLU(FLAGS)

        train_op, train_loss_op = model._optimizer(x_train)
        pred_op, test_loss_op = model._validation_loss(x_train_infer, x_test)
       
        with tf.Session() as sess: #A class for running TensorFlow operations
            
            sess.run(tf.global_variables_initializer())
            train_loss = 0
            test_loss = 0

            for epoch in range(FLAGS['num_epoch']):
                
                sess.run(iter_train.initializer) #The returned iterator will be in an uninitialized state, 
                                                 #and you must run the iterator.initializer operation before using it
                
                for batch_nr in range(num_batches):
                    
                    _, loss_ = sess.run((train_op, train_loss_op))
                    train_loss += loss_
              
                sess.run(iter_train_infer.initializer)
                sess.run(iter_test.initializer)

                for i in range(FLAGS['num_samples']):
                    pred, loss_ = sess.run((pred_op, test_loss_op))
                    test_loss += loss_
                    
                print('epoch_nr: %i, train_loss: %.3f, test_loss: %.3f'%(epoch,(train_loss/num_batches), (test_loss/FLAGS['num_samples'])))
                train_loss = 0
                test_loss = 0

### 2.1. ReLU Activation Function without  L2 Regularization

In [16]:
FLAGS = {'tf_records_train_path': '/content/autoencoder/data/ml-1m/train/',  # Path of the training data
         'tf_records_test_path': '/content/autoencoder/data/ml-1m/test/',  # Path of the test data
         'num_epoch': 100,  # Number of training epochs
         'batch_size': 16,  # Size of the training batch
         'learning_rate': 5e-4,  # Learning_Rate
         'l2_reg': False,  # L2 regularization
         'lambda_': 0.01,  # Wight decay factor
         'num_v': 3952,  # Number of visible neurons (Number of movies the users rated.
         'num_h': 128,  # Number of hidden neurons
         'num_samples': 5953}  # Number of training samples (Number of users, who gave a rating)


train_with_ReLU(FLAGS)

epoch_nr: 0, train_loss: 1.493, test_loss: 1.180
epoch_nr: 1, train_loss: 1.085, test_loss: 1.109
epoch_nr: 2, train_loss: 1.018, test_loss: 1.079
epoch_nr: 3, train_loss: 0.987, test_loss: 1.076
epoch_nr: 4, train_loss: 0.952, test_loss: 1.065
epoch_nr: 5, train_loss: 0.930, test_loss: 1.069
epoch_nr: 6, train_loss: 0.920, test_loss: 1.079
epoch_nr: 7, train_loss: 0.910, test_loss: 1.078
epoch_nr: 8, train_loss: 0.903, test_loss: 1.082
epoch_nr: 9, train_loss: 0.893, test_loss: 1.082
epoch_nr: 10, train_loss: 0.883, test_loss: 1.093
epoch_nr: 11, train_loss: 0.870, test_loss: 1.091
epoch_nr: 12, train_loss: 0.862, test_loss: 1.097
epoch_nr: 13, train_loss: 0.854, test_loss: 1.105
epoch_nr: 14, train_loss: 0.854, test_loss: 1.104
epoch_nr: 15, train_loss: 0.846, test_loss: 1.123
epoch_nr: 16, train_loss: 0.836, test_loss: 1.131
epoch_nr: 17, train_loss: 0.826, test_loss: 1.124
epoch_nr: 18, train_loss: 0.810, test_loss: 1.116
epoch_nr: 19, train_loss: 0.797, test_loss: 1.133
epoch_nr: 

### 2.2. ReLU Activation Function with  L2 Regularization

In [17]:
FLAGS = {'tf_records_train_path': '/content/autoencoder/data/ml-1m/train/',  # Path of the training data
         'tf_records_test_path': '/content/autoencoder/data/ml-1m/test/',  # Path of the test data
         'num_epoch': 100,  # Number of training epochs
         'batch_size': 16,  # Size of the training batch
         'learning_rate': 5e-4,  # Learning_Rate
         'l2_reg': True,  # L2 regularization
         'lambda_': 0.01,  # Wight decay factor
         'num_v': 3952,  # Number of visible neurons (Number of movies the users rated.
         'num_h': 128,  # Number of hidden neurons
         'num_samples': 5953}  # Number of training samples (Number of users, who gave a rating)


train_with_ReLU(FLAGS)

epoch_nr: 0, train_loss: 2.808, test_loss: 1.247
epoch_nr: 1, train_loss: 2.013, test_loss: 1.177
epoch_nr: 2, train_loss: 1.754, test_loss: 1.139
epoch_nr: 3, train_loss: 1.592, test_loss: 1.140
epoch_nr: 4, train_loss: 1.496, test_loss: 1.165
epoch_nr: 5, train_loss: 1.442, test_loss: 1.126
epoch_nr: 6, train_loss: 1.395, test_loss: 1.114
epoch_nr: 7, train_loss: 1.342, test_loss: 1.127
epoch_nr: 8, train_loss: 1.342, test_loss: 1.112
epoch_nr: 9, train_loss: 1.308, test_loss: 1.119
epoch_nr: 10, train_loss: 1.293, test_loss: 1.113
epoch_nr: 11, train_loss: 1.347, test_loss: 1.136
epoch_nr: 12, train_loss: 1.306, test_loss: 1.108
epoch_nr: 13, train_loss: 1.266, test_loss: 1.112
epoch_nr: 14, train_loss: 1.241, test_loss: 1.103
epoch_nr: 15, train_loss: 1.237, test_loss: 1.100
epoch_nr: 16, train_loss: 1.237, test_loss: 1.111
epoch_nr: 17, train_loss: 1.342, test_loss: 1.132
epoch_nr: 18, train_loss: 1.260, test_loss: 1.102
epoch_nr: 19, train_loss: 1.220, test_loss: 1.095
epoch_nr: 

### 2.3. ReLU Activation Function with  L1 Regularization

In [0]:
# import model_helper


def _get_bias_initializer():
    return tf.zeros_initializer()


def _get_weight_initializer():
    return tf.random_normal_initializer(mean=0.0, stddev=0.05)


class DAE_ReLU_L1: #Data Acquisition Engine
    
    def __init__(self, FLAGS):
        ''' Implementation of deep autoencoder class.'''
        
        self.FLAGS = FLAGS
        self.weight_initializer = _get_weight_initializer()
        self.bias_initializer = _get_bias_initializer()
        self.init_parameters()
        

    def init_parameters(self):
        '''Initialize networks weights and biasis.'''
        
        with tf.name_scope('weights'):
          #This context manager validates that the given values are from the same graph,  
          #makes that graph the default graph, and pushes a name scope in that graph
            self.W_1 = tf.get_variable(name='weight_1', shape=(self.FLAGS['num_v'], self.FLAGS['num_h']),
                                       initializer=self.weight_initializer) #Gets an existing variable with these parameters or create a new one
            self.W_2 = tf.get_variable(name='weight_2', shape=(self.FLAGS['num_h'], self.FLAGS['num_h']),
                                       initializer=self.weight_initializer)
            self.W_3 = tf.get_variable(name='weight_3', shape=(self.FLAGS['num_h'], self.FLAGS['num_h']),
                                       initializer=self.weight_initializer)
            self.W_4 = tf.get_variable(name='weight_4', shape=(self.FLAGS['num_h'], self.FLAGS['num_v']),
                                       initializer=self.weight_initializer)
        
        with tf.name_scope('biases'):
            self.b1 = tf.get_variable(name='bias_1', shape=(self.FLAGS['num_h']),
                                      initializer=self.bias_initializer)
            self.b2 = tf.get_variable(name='bias_2', shape=(self.FLAGS['num_h']),
                                      initializer=self.bias_initializer)
            self.b3 = tf.get_variable(name='bias_3', shape=(self.FLAGS['num_h']),
                                      initializer=self.bias_initializer)
        
    def _inference(self, x):
        ''' Making one forward pass. Predicting the networks outputs.
        @param x: input ratings
        
        @return : networks predictions
        '''\
        
        with tf.name_scope('inference'):
             a1 = tf.nn.relu(tf.nn.bias_add(tf.matmul(x, self.W_1),self.b1)) # sign(W1T*X+b1)
             a2 = tf.nn.relu(tf.nn.bias_add(tf.matmul(a1, self.W_2),self.b2))
             a3 = tf.nn.relu(tf.nn.bias_add(tf.matmul(a2, self.W_3),self.b3))
             a4 = tf.matmul(a3, self.W_4)
        return a4
    
    def _compute_loss(self, predictions, labels, num_labels):
        ''' Computing the Mean Squared Error loss between the input and output of the network.
            
          @param predictions: predictions of the stacked autoencoder
          @param labels: input values of the stacked autoencoder which serve as labels at the same time
          @param num_labels: number of labels !=0 in the data set to compute the mean
            
          @return mean squared error loss tf-operation
          '''
            
        with tf.name_scope('loss'):
            loss_op = tf.div(tf.reduce_sum(tf.square(tf.subtract(predictions,labels))),num_labels)
            return loss_op
          
    def _l1_loss(self, params):
        '''
        Unfortunately, L1 isn’t a tensorflow function, so we have to create it by ourselves and use it instead of L2
        '''
        return tf.reduce_sum(tf.abs(params))

    def _optimizer(self, x):
        '''Optimization of the network parameter through stochastic gradient descent.
            
            @param x: input values for the stacked autoencoder.
            
            @return: tensorflow training operation
            @return: ROOT!! mean squared error
        '''
        
        outputs = self._inference(x)
        # ? the mask is same as x ?
        mask = tf.where(tf.equal(x,0.0), tf.zeros_like(x), x) # indices of 0 values in the training set
        # tf.zero_like : Creates a tensor with all elements set to zero.
        # The condition tensor acts as a mask that chooses, based on the value at each element, 
        # whether the corresponding element / row in the output should be taken from x (if true) or y (if false).
        num_train_labels = tf.cast(tf.count_nonzero(mask),dtype=tf.float32) # number of non zero values in the training set
        bool_mask = tf.cast(mask,dtype=tf.bool) # boolean mask
        outputs = tf.where(bool_mask, outputs, tf.zeros_like(outputs)) # set the output values to zero if corresponding input values are zero
        # ? why set output values to zero? does'nt it change the output?

        MSE_loss = self._compute_loss(outputs,x,num_train_labels)
        
        if self.FLAGS['l1_reg'] == True:
            l1_loss = tf.add_n([self._l1_loss(v) for v in tf.trainable_variables()]) # Returns all variables created with trainable=True.
            MSE_loss = MSE_loss +  self.FLAGS['lambda_'] * l1_loss
        
        train_op = tf.train.AdamOptimizer(self.FLAGS['learning_rate']).minimize(MSE_loss) #An Operation that updates the variables in var_list
        RMSE_loss = tf.sqrt(MSE_loss)

        return train_op, RMSE_loss
    
    def _validation_loss(self, x_train, x_test):
        ''' Computing the loss during the validation time.
            
          @param x_train: training data samples
          @param x_test: test data samples
            
          @return networks predictions
          @return root mean squared error loss between the predicted and actual ratings
          '''
        
        outputs = self._inference(x_train) # use training sample to make prediction
        mask = tf.where(tf.equal(x_test,0.0), tf.zeros_like(x_test), x_test) # identify the zero values in the test ste
        num_test_labels = tf.cast(tf.count_nonzero(mask),dtype=tf.float32) # count the number of non zero values
        bool_mask = tf.cast(mask,dtype=tf.bool) 
        outputs = tf.where(bool_mask, outputs, tf.zeros_like(outputs))
    
        MSE_loss = self._compute_loss(outputs, x_test, num_test_labels)
        RMSE_loss = tf.sqrt(MSE_loss)
            
        return outputs, RMSE_loss

In [0]:
import numpy as np


def train_with_ReLU_L1(FLAGS):
    '''Building the graph, opening of a session and starting the training od the neural network.'''
    
    num_batches = int(FLAGS['num_samples']/FLAGS['batch_size'])

    with tf.Graph().as_default():

        train_data, train_data_infer = _get_training_data(FLAGS)
        test_data = _get_test_data(FLAGS)
        
        iter_train = train_data.make_initializable_iterator()
        #Creates a tf.data.Iterator for enumerating the elements of a dataset.
        iter_train_infer = train_data_infer.make_initializable_iterator()
        iter_test = test_data.make_initializable_iterator()
        
        x_train = iter_train.get_next() #Returns a nested structure of tf.Tensors representing the next element.
        x_train_infer = iter_train_infer.get_next()
        x_test = iter_test.get_next()

        model = DAE_ReLU_L1(FLAGS)

        train_op, train_loss_op = model._optimizer(x_train)
        pred_op, test_loss_op = model._validation_loss(x_train_infer, x_test)
       
        with tf.Session() as sess: #A class for running TensorFlow operations
            
            sess.run(tf.global_variables_initializer())
            train_loss = 0
            test_loss = 0

            for epoch in range(FLAGS['num_epoch']):
                
                sess.run(iter_train.initializer) #The returned iterator will be in an uninitialized state, 
                                                 #and you must run the iterator.initializer operation before using it
                
                for batch_nr in range(num_batches):
                    
                    _, loss_ = sess.run((train_op, train_loss_op))
                    train_loss += loss_
              
                sess.run(iter_train_infer.initializer)
                sess.run(iter_test.initializer)

                for i in range(FLAGS['num_samples']):
                    pred, loss_ = sess.run((pred_op, test_loss_op))
                    test_loss += loss_
                    
                print('epoch_nr: %i, train_loss: %.3f, test_loss: %.3f'%(epoch,(train_loss/num_batches), (test_loss/FLAGS['num_samples'])))
                train_loss = 0
                test_loss = 0

In [37]:
FLAGS = {'tf_records_train_path': '/content/autoencoder/data/ml-1m/train/',  # Path of the training data
         'tf_records_test_path': '/content/autoencoder/data/ml-1m/test/',  # Path of the test data
         'num_epoch': 100,  # Number of training epochs
         'batch_size': 16,  # Size of the training batch
         'learning_rate': 5e-4,  # Learning_Rate
         'l1_reg': True,  # L1 regularization
         'lambda_': 0.01,  # Wight decay factor
         'num_v': 3952,  # Number of visible neurons (Number of movies the users rated.
         'num_h': 128,  # Number of hidden neurons
         'num_samples': 5953}  # Number of training samples (Number of users, who gave a rating)


train_with_ReLU_L1(FLAGS)

epoch_nr: 0, train_loss: 7.396, test_loss: 2.374
epoch_nr: 1, train_loss: 2.715, test_loss: 2.210
epoch_nr: 2, train_loss: 2.565, test_loss: 2.096
epoch_nr: 3, train_loss: 2.484, test_loss: 2.055
epoch_nr: 4, train_loss: 2.434, test_loss: 2.036
epoch_nr: 5, train_loss: 2.391, test_loss: 1.892
epoch_nr: 6, train_loss: 2.345, test_loss: 1.832
epoch_nr: 7, train_loss: 2.314, test_loss: 1.738
epoch_nr: 8, train_loss: 2.259, test_loss: 1.619
epoch_nr: 9, train_loss: 2.213, test_loss: 1.501
epoch_nr: 10, train_loss: 2.182, test_loss: 1.388
epoch_nr: 11, train_loss: 2.130, test_loss: 1.271
epoch_nr: 12, train_loss: 2.095, test_loss: 1.225
epoch_nr: 13, train_loss: 2.045, test_loss: 1.214
epoch_nr: 14, train_loss: 2.051, test_loss: 1.190
epoch_nr: 15, train_loss: 1.984, test_loss: 1.111
epoch_nr: 16, train_loss: 1.966, test_loss: 1.097
epoch_nr: 17, train_loss: 1.975, test_loss: 1.101
epoch_nr: 18, train_loss: 1.925, test_loss: 1.107
epoch_nr: 19, train_loss: 1.920, test_loss: 1.091
epoch_nr: 

### 3. Add more hidden layers/neurons

In [38]:
FLAGS = {'tf_records_train_path': '/content/autoencoder/data/ml-1m/train/',  # Path of the training data
         'tf_records_test_path': '/content/autoencoder/data/ml-1m/test/',  # Path of the test data
         'num_epoch': 100,  # Number of training epochs
         'batch_size': 16,  # Size of the training batch
         'learning_rate': 5e-4,  # Learning_Rate
         'l1_reg': True,  # L1 regularization
         'lambda_': 0.01,  # Wight decay factor
         'num_v': 3952,  # Number of visible neurons (Number of movies the users rated.
         'num_h': 228,  # Number of hidden neurons, add 100 more hidden neurons
         'num_samples': 5953}  # Number of training samples (Number of users, who gave a rating)


train_with_ReLU_L1(FLAGS)

epoch_nr: 0, train_loss: 9.771, test_loss: 2.413
epoch_nr: 1, train_loss: 3.018, test_loss: 2.167
epoch_nr: 2, train_loss: 2.773, test_loss: 2.110
epoch_nr: 3, train_loss: 2.640, test_loss: 2.005
epoch_nr: 4, train_loss: 2.546, test_loss: 1.861
epoch_nr: 5, train_loss: 2.482, test_loss: 1.806
epoch_nr: 6, train_loss: 2.417, test_loss: 1.670
epoch_nr: 7, train_loss: 2.357, test_loss: 1.569
epoch_nr: 8, train_loss: 2.310, test_loss: 1.444
epoch_nr: 9, train_loss: 2.264, test_loss: 1.354
epoch_nr: 10, train_loss: 2.234, test_loss: 1.252
epoch_nr: 11, train_loss: 2.212, test_loss: 1.233
epoch_nr: 12, train_loss: 2.156, test_loss: 1.169
epoch_nr: 13, train_loss: 2.147, test_loss: 1.161
epoch_nr: 14, train_loss: 2.138, test_loss: 1.131
epoch_nr: 15, train_loss: 2.137, test_loss: 1.181
epoch_nr: 16, train_loss: 2.091, test_loss: 1.113
epoch_nr: 17, train_loss: 2.046, test_loss: 1.132
epoch_nr: 18, train_loss: 2.046, test_loss: 1.091
epoch_nr: 19, train_loss: 2.045, test_loss: 1.119
epoch_nr: 

### 4. Drop some hidden neurons

In [39]:
FLAGS = {'tf_records_train_path': '/content/autoencoder/data/ml-1m/train/',  # Path of the training data
         'tf_records_test_path': '/content/autoencoder/data/ml-1m/test/',  # Path of the test data
         'num_epoch': 100,  # Number of training epochs
         'batch_size': 16,  # Size of the training batch
         'learning_rate': 5e-4,  # Learning_Rate
         'l1_reg': True,  # L1 regularization
         'lambda_': 0.01,  # Wight decay factor
         'num_v': 3952,  # Number of visible neurons (Number of movies the users rated.
         'num_h': 78,  # Number of hidden neurons, drop 50 hidden neurons
         'num_samples': 5953}  # Number of training samples (Number of users, who gave a rating)


train_with_ReLU_L1(FLAGS)

epoch_nr: 0, train_loss: 6.666, test_loss: 3.839
epoch_nr: 1, train_loss: 3.816, test_loss: 3.839
epoch_nr: 2, train_loss: 3.817, test_loss: 3.839
epoch_nr: 3, train_loss: 3.820, test_loss: 3.839
epoch_nr: 4, train_loss: 3.816, test_loss: 3.839
epoch_nr: 5, train_loss: 3.815, test_loss: 3.839
epoch_nr: 6, train_loss: 3.815, test_loss: 3.839
epoch_nr: 7, train_loss: 3.818, test_loss: 3.839
epoch_nr: 8, train_loss: 3.818, test_loss: 3.839
epoch_nr: 9, train_loss: 3.813, test_loss: 3.839
epoch_nr: 10, train_loss: 3.815, test_loss: 3.839
epoch_nr: 11, train_loss: 3.815, test_loss: 3.839
epoch_nr: 12, train_loss: 3.810, test_loss: 3.839
epoch_nr: 13, train_loss: 3.817, test_loss: 3.839
epoch_nr: 14, train_loss: 3.816, test_loss: 3.839
epoch_nr: 15, train_loss: 3.815, test_loss: 3.839
epoch_nr: 16, train_loss: 3.814, test_loss: 3.839
epoch_nr: 17, train_loss: 3.814, test_loss: 3.839
epoch_nr: 18, train_loss: 3.816, test_loss: 3.839
epoch_nr: 19, train_loss: 3.817, test_loss: 3.839
epoch_nr: 

### 5. Batch Normalization (BN)

In [0]:
# import model_helper


def _get_bias_initializer():
    return tf.zeros_initializer()


def _get_weight_initializer():
    return tf.random_normal_initializer(mean=0.0, stddev=0.05)


class DAE_ReLU_L1: #Data Acquisition Engine
    
    def __init__(self, FLAGS):
        ''' Implementation of deep autoencoder class.'''
        
        self.FLAGS = FLAGS
        self.weight_initializer = _get_weight_initializer()
        self.bias_initializer = _get_bias_initializer()
        self.init_parameters()
        

    def init_parameters(self):
        '''Initialize networks weights and biasis.'''
        
        with tf.name_scope('weights'):
          #This context manager validates that the given values are from the same graph,  
          #makes that graph the default graph, and pushes a name scope in that graph
            self.W_1 = tf.get_variable(name='weight_1', shape=(self.FLAGS['num_v'], self.FLAGS['num_h']),
                                       initializer=self.weight_initializer) #Gets an existing variable with these parameters or create a new one
            self.W_2 = tf.get_variable(name='weight_2', shape=(self.FLAGS['num_h'], self.FLAGS['num_h']),
                                       initializer=self.weight_initializer)
            self.W_3 = tf.get_variable(name='weight_3', shape=(self.FLAGS['num_h'], self.FLAGS['num_h']),
                                       initializer=self.weight_initializer)
            self.W_4 = tf.get_variable(name='weight_4', shape=(self.FLAGS['num_h'], self.FLAGS['num_v']),
                                       initializer=self.weight_initializer)
        
        with tf.name_scope('biases'):
            self.b1 = tf.get_variable(name='bias_1', shape=(self.FLAGS['num_h']),
                                      initializer=self.bias_initializer)
            self.b2 = tf.get_variable(name='bias_2', shape=(self.FLAGS['num_h']),
                                      initializer=self.bias_initializer)
            self.b3 = tf.get_variable(name='bias_3', shape=(self.FLAGS['num_h']),
                                      initializer=self.bias_initializer)
        
    def _inference(self, x):
        ''' Making one forward pass. Predicting the networks outputs.
        @param x: input ratings
        
        @return : networks predictions
        '''\
        
        with tf.name_scope('inference'):
             a1 = tf.nn.relu(tf.nn.bias_add(tf.matmul(x, self.W_1),self.b1)) # sign(W1T*X+b1)
             a2 = tf.nn.relu(tf.nn.bias_add(tf.matmul(a1, self.W_2),self.b2))
             a3 = tf.nn.relu(tf.nn.bias_add(tf.matmul(a2, self.W_3),self.b3))
             a4 = tf.matmul(a3, self.W_4)
        return a4
    
    def _compute_loss(self, predictions, labels, num_labels):
        ''' Computing the Mean Squared Error loss between the input and output of the network.
            
          @param predictions: predictions of the stacked autoencoder
          @param labels: input values of the stacked autoencoder which serve as labels at the same time
          @param num_labels: number of labels !=0 in the data set to compute the mean
            
          @return mean squared error loss tf-operation
          '''
            
        with tf.name_scope('loss'):
            loss_op = tf.div(tf.reduce_sum(tf.square(tf.subtract(predictions,labels))),num_labels)
            return loss_op
          
    def _l1_loss(self, params):
        '''
        Unfortunately, L1 isn’t a tensorflow function, so we have to create it by ourselves and use it instead of L2
        '''
        return tf.reduce_sum(tf.abs(params))

    def _optimizer(self, x):
        '''Optimization of the network parameter through stochastic gradient descent.
            
            @param x: input values for the stacked autoencoder.
            
            @return: tensorflow training operation
            @return: ROOT!! mean squared error
        '''
        
        outputs = self._inference(x)
        # ? the mask is same as x ?
        mask = tf.where(tf.equal(x,0.0), tf.zeros_like(x), x) # indices of 0 values in the training set
        # tf.zero_like : Creates a tensor with all elements set to zero.
        # The condition tensor acts as a mask that chooses, based on the value at each element, 
        # whether the corresponding element / row in the output should be taken from x (if true) or y (if false).
        num_train_labels = tf.cast(tf.count_nonzero(mask),dtype=tf.float32) # number of non zero values in the training set
        bool_mask = tf.cast(mask,dtype=tf.bool) # boolean mask
        outputs = tf.where(bool_mask, outputs, tf.zeros_like(outputs)) # set the output values to zero if corresponding input values are zero
        # ? why set output values to zero? does'nt it change the output?

        MSE_loss = self._compute_loss(outputs,x,num_train_labels)
        
        if self.FLAGS['l1_reg'] == True:
            l1_loss = tf.add_n([self._l1_loss(v) for v in tf.trainable_variables()]) # Returns all variables created with trainable=True.
            MSE_loss = MSE_loss +  self.FLAGS['lambda_'] * l1_loss
        
        train_op = tf.train.AdamOptimizer(self.FLAGS['learning_rate']).minimize(MSE_loss) #An Operation that updates the variables in var_list
        RMSE_loss = tf.sqrt(MSE_loss)

        return train_op, RMSE_loss
    
    def _validation_loss(self, x_train, x_test):
        ''' Computing the loss during the validation time.
            
          @param x_train: training data samples
          @param x_test: test data samples
            
          @return networks predictions
          @return root mean squared error loss between the predicted and actual ratings
          '''
        
        outputs = self._inference(x_train) # use training sample to make prediction
        mask = tf.where(tf.equal(x_test,0.0), tf.zeros_like(x_test), x_test) # identify the zero values in the test ste
        num_test_labels = tf.cast(tf.count_nonzero(mask),dtype=tf.float32) # count the number of non zero values
        bool_mask = tf.cast(mask,dtype=tf.bool) 
        outputs = tf.where(bool_mask, outputs, tf.zeros_like(outputs))
    
        MSE_loss = self._compute_loss(outputs, x_test, num_test_labels)
        RMSE_loss = tf.sqrt(MSE_loss)
            
        return outputs, RMSE_loss

## Technical Analysis

1)  In this project, we accessed the ml-1m.zip movie rating dataset downloadable from https://grouplens.org/datasets/movielens, then mounted and copied the file to a Google Colab environment. The goal of this project is to build a encoder-decoder deep learning model for movie recommendations.

2)  **Followed Steps Below:**
1.  Define the network structure
2. Activation function: Sigmoid, ReLu
3. Define the cost function
4. RMSE - model evaluation metric
5. Optimizer - apply L1/L2 regularization terms
6. Define the input data 
7. Extract data from input 



3)  **Key Observations:**
*   Model is overfitting without applying L1/L2 regularization terms, the ReLu case: with low training errors and high testing errors. 
*   Unfortunately, L1 isn’t a tensorflow function, so we have to create it by ourselves and use it instead of L2. As a result of it, we pre-defined _11_loss function that takes in a set of parameters.
*   Overall, Sigmoid function outputs better results compared to ReLu activation function. 
*   However, ReLu with L1 regularization outputs minimum testing error after 100 epochs.
*   We evaluated model performance by looking at both training errors and testing errors. In conclusion, Sigmoid with no regularization term and ReLu with L1 seem more appropriate for this case. More generalized for new data.
*  Adding more hidden neurons does not significantly decrease errors, neither on training errors nor on testing errors. Poor model performance especially for training data. Whereas fewer layers result in a high RMSE scenario. In this case, model seems to be underfitting.





## Reference

Regularization with TensorFlow: http://www.godeep.ml/regularization-using-tensorflow

Why are deep neural networks hard to train? http://neuralnetworksanddeeplearning.com/chap5.html