Source: https://www.tensorflow.org/guide/eager

In this noteboook, we would see at a high level how layers are composed

**Do not worry** much about the internals such as how data is getting processed for now. 
We will cover it in more detail later

In [1]:
#### Setting up eager
import tensorflow as tf
tf.enable_eager_execution()

#### MLP Model

This Model has three layers: f1, f2 and f3

* f1 and f2 are **hidden layers**
* f3 is the **output layer**


* Keras Dense layer defines the function $y = g(W^T \boldsymbol{x} + b)$

In [2]:
class MLPModel(tf.keras.Model):
    def __init__(self):
        
        #Call init of base class
        super(MLPModel, self).__init__()
        
        #Define the first hidden layer, it outputs 20 units
        self.f1 = tf.keras.layers.Dense(units=20, name='f1')
        
        #Define the second hidden layer, it outputs 100 units
        self.f2 = tf.keras.layers.Dense(units=100, name='f2')
        
        #Define the final layer (output layer), it outputs 10 units
        #Though, we want probability, we keep it at logits, as loss function expects logits
        self.f3 = tf.keras.layers.Dense(units=10, name='f3')
        
    def call(self, inputs):
        result = self.f1(inputs)
        result = self.f2(result)
        result = self.f3(result)
        return result

In [3]:
import dataset
dataset_train = dataset.train('./datasets').shuffle(60000).repeat(4).batch(32)

In [4]:
sample_inputs, sample_labels = iter(dataset_train).next()

In [5]:
sample_labels

<tf.Tensor: id=36, shape=(32,), dtype=int32, numpy=
array([9, 9, 3, 3, 3, 6, 0, 7, 1, 6, 1, 2, 4, 0, 1, 2, 4, 3, 4, 6, 6, 3,
       4, 2, 4, 9, 4, 7, 6, 4, 6, 6], dtype=int32)>

Next, we define **loss** and **gradient** computation. 
Again, don't worry much about this now. We will cover this in more detail!

In [6]:
def loss(model, x, y):
  prediction = model(x)
  return tf.losses.sparse_softmax_cross_entropy(labels=y, logits=prediction)

def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return tape.gradient(loss_value, model.variables)

In [7]:
model = MLPModel()

In [8]:
model.variables

[]

In [9]:
initial_output = model(sample_inputs).numpy()

In [10]:
model.variables[0]

<tf.Variable 'mlp_model/f1/kernel:0' shape=(784, 20) dtype=float32, numpy=
array([[-0.05226948, -0.03380045, -0.08593247, ...,  0.04665427,
         0.07489154, -0.08402053],
       [-0.00887366, -0.04801061,  0.04629155, ...,  0.07592633,
         0.04981589, -0.02024387],
       [-0.06243209,  0.01732839, -0.03583066, ...,  0.04256839,
         0.01026747, -0.06153925],
       ...,
       [-0.07873483, -0.07596295,  0.04450297, ...,  0.03286304,
        -0.07224698, -0.05948465],
       [-0.08518442, -0.04949057, -0.03061088, ..., -0.06696609,
        -0.05911696,  0.00808229],
       [-0.00111673,  0.00836813, -0.04163161, ..., -0.03101798,
         0.0459934 , -0.0301648 ]], dtype=float32)>

In [11]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)

In [None]:
model.variables

In [12]:
for (i, (x, y)) in enumerate(dataset_train):
  # Calculate derivatives of the input function with respect to its parameters.
  grads = grad(model, x, y)
  # Apply the gradient to the model
  optimizer.apply_gradients(zip(grads, model.variables),
                            global_step=tf.train.get_or_create_global_step())
  if i % 200 == 0:
    print("Loss at step {:04d}: {:.3f}".format(i, loss(model, x, y)))

print("Final loss: {:.3f}".format(loss(model, x, y)))

Loss at step 0000: 2.487
Loss at step 0200: 2.256
Loss at step 0400: 1.944
Loss at step 0600: 1.825
Loss at step 0800: 1.947
Loss at step 1000: 1.655
Loss at step 1200: 1.523
Loss at step 1400: 1.227
Loss at step 1600: 1.267
Loss at step 1800: 1.090
Loss at step 2000: 1.132
Loss at step 2200: 1.079
Loss at step 2400: 1.023
Loss at step 2600: 0.699
Loss at step 2800: 0.817
Loss at step 3000: 0.838
Loss at step 3200: 0.850
Loss at step 3400: 0.669
Loss at step 3600: 0.744
Loss at step 3800: 0.613
Loss at step 4000: 0.661
Loss at step 4200: 0.645
Loss at step 4400: 0.457
Loss at step 4600: 0.667
Loss at step 4800: 0.643
Loss at step 5000: 0.571
Loss at step 5200: 0.531
Loss at step 5400: 0.609
Loss at step 5600: 0.853
Loss at step 5800: 0.645
Loss at step 6000: 0.682
Loss at step 6200: 0.659
Loss at step 6400: 0.713
Loss at step 6600: 0.411
Loss at step 6800: 0.333
Loss at step 7000: 0.841
Loss at step 7200: 0.353
Loss at step 7400: 0.461
Final loss: 0.701


In [13]:
model.f1(sample_inputs)

<tf.Tensor: id=669252, shape=(32, 20), dtype=float32, numpy=
array([[ 2.18208611e-01,  4.28503871e-01,  4.53877926e-01,
         2.94955564e+00,  2.19059587e+00,  4.45727021e-01,
         1.06557643e+00,  2.18106091e-01, -8.03283677e-02,
         2.56836265e-01,  1.60441983e+00,  6.09335124e-01,
        -7.75609672e-01,  8.64541292e-01,  1.10543418e+00,
        -3.99508566e-01, -5.24423242e-01,  8.73880208e-01,
        -4.72689047e-02, -1.09459773e-01],
       [-7.80479684e-02,  6.24355614e-01,  1.16547632e+00,
         3.12190580e+00,  1.31616664e+00,  4.43734348e-01,
         7.20943213e-01,  2.35968903e-01, -1.38449764e+00,
         2.38115400e-01,  1.51751077e+00,  8.83800805e-01,
        -3.04454193e-02,  9.83679414e-01,  4.82652068e-01,
         1.25105828e-01, -6.61685467e-01, -2.43675057e-02,
        -1.32365793e-01, -2.10726097e-01],
       [-2.24857569e+00,  5.82521617e-01,  1.15649486e+00,
        -1.20614827e+00,  5.86040735e-01,  7.23572731e-01,
        -1.63251948e+00, -9