## Regularization

- Applying artifical constraints on the network that implicitly reduce the number of free parameters while not making it more difficult to optimize.

-  Using **L2 Regularization**, we add another term to loss that penalizes large weights - this is typically achieved by adding the L2 norm of the weights to the loss multiplied by a small constant.
-  The **L2 norm** is the sum of the squares of the individual elements in a vector.
-  Another technique for regularization is **Dropout**, where we take a random subset of the activations values (being passed between two layers) and set half of them to 0. Essentially, we take half of the data flowing through network and destroy it. 
-  Using dropout, the network can never rely on any given activation to be present, so it is forced to learn a redundant represention for everything to make sure at least some of the information remains. This prevents overfitting and improves performance - the network takes a consesus over an ensemble of networks.
-  When evaluating a network that's been trained with dropout, we do not want the randomness. Instead, we take the consensus over the redundant models. To get the consensus, average the activations. During training, zero out activations and scale the remaining ones by a factor of 2. Duringe evaluation remove scaling factor and dropouts.
-  If dropout does not work, use a bigger network.

#### Apply dropout to a neural network

-  The `tf.nn.dropout()` function takes in two parameters:

    `hidden_layer`: the tensor to which you would like to apply dropout <br />
    `keep_prob`: the probability of keeping (i.e. not dropping) any given unit
    

-  `keep_prob` allows you to adjust the number of units to drop. In order to compensate for dropped units, `tf.nn.dropout()` multiplies all units that are kept (i.e. not dropped) by 1/keep_prob.

-  During training, a good starting value for keep_prob is 0.5.

-  During testing, use a keep_prob value of 1.0 to keep all units and maximize the power of the model.

In [2]:
import tensorflow as tf

hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[0.0, 2.0, 3.0, 4.0], [0.1, 0.2, 0.3, 0.4], [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model with Dropout
keep_prob = tf.placeholder(tf.float32)
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

# TODO: Print logits from a session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(logits, feed_dict={keep_prob: 0.5}))



[[  8.45999908   9.39999866]
 [  0.11200001   0.67200011]
 [ 43.30000305  48.15999985]]
