
```
Copyright (C) 2019 Software Platform Lab, Seoul National University

Licensed under the Apache License, Version 2.0 (the "License"); 

you may not use this file except in compliance with the License. 

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 

Unless required by applicable law or agreed to in writing, software 

distributed under the License is distributed on an "AS IS" BASIS, 


WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 


See the License for the specific language governing permissions and


limitations under the License.
```

# TF High Level APIs
TensorFlow provides high level APIs to define and execute neural network more easily. In this session, we will take a look at two of them: Keras and Eager execution.

## 1. Keras (tf.keras)
Keras is a high-level API to build and train deep learning models. It's used for fast prototyping, advanced research, and production, with three key advantages:

- **User friendly**: Keras has a simple, consistent interface optimized for common use cases. It provides clear and actionable feedback for user errors.
- **Modular and composable**: Keras models are made by connecting configurable building blocks together, with few restrictions.
- **Easy to extend**: Write custom building blocks to express new ideas for research. Create new layers, loss functions, and develop state-of-the-art models.



### Defining a model
Keras API is accessible in the `tf.keras` namespace. Let's take a look how we can define a model using Keras API.


In [0]:
import tensorflow as tf
from tensorflow.keras import layers

tf.enable_eager_execution()

In [2]:

# Let's build a stack of *sequential* layers, which is
# the most common form of neural network graphs.
model = tf.keras.Sequential() 

# Adds a densely-connected layer with 64 units to the model
model.add(layers.Dense(units=64, activation='relu', input_shape=(32,)))

# Adds another layer, which has L2 regularization applied to the kernel matrix
model.add(layers.Dense(units=64, kernel_regularizer=tf.keras.regularizers.l2(0.01)))

# Adds a softmax layer with 10 output units
model.add(layers.Dense(units=10, activation='softmax'))

Instructions for updating:
Colocations handled automatically by placer.


If you visit https://www.tensorflow.org/api_docs/python/tf/keras/layers, you can find the supported layers, for example, Conv2D, BatchNormalization, LSTM, MaxPool, etc.

### Setting up training
After the model is constructed, `compile` method configures how to learn the model, which allows us to specify the following:
* `optimizer`: This field specifies which optimizer to use. We can pass an optimizer instance (e.g., `tf.train.AdamOptimizer`, `tf.train.RMSPropOptimizer`), which are defined in  `tf.train` module.
* `loss`: The function to minimize during optimization. Common choices include `mean square error (mse)`, `[categorical|binary]_crossentropy`. Loss functions are specified by name or by passing a callable object from the `tf.keras.losses` module.
* `metrics`: Used to monitor training. We can put string names or callables defined in `tf.keras.metrics` module (e.g. `'accuracy'`)

In [0]:
model.compile(optimizer=tf.train.AdamOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

### Preparing Input data
We need datasets to train models. In this example, we will create a small datasets using in-memory NumPy arrays to train and evaluate a model. 

In [0]:
import numpy as np

data = np.random.random((1000, 32))
labels = np.random.random((1000, 10))

### Training a model
Finally, we can train the model using the `fit` method and then the model is "fit" to the training data. We can specify the training data to use (`data` and `labels`), how many epochs we will run (`epochs`), and how many items to be processed in a batch (`batch_size`)

In [5]:
model.fit(data, labels, epochs=10, batch_size=32)

Instructions for updating:
Use tf.cast instead.
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f9bbac19710>

### Quiz 1.
First, define a multi-layer model with 
  - 1 Dense layer with 10 units and `softmax` activation, taking input tensor of `(32,)` shape (hint: [`tf.keras.layers.Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)).
  - Another Dense layer with 10 units and `softmax` activation.

In [0]:
from tensorflow.keras import layers

############# Write here. #############
#model = ...
model = tf.keras.Sequential()
layer1 = layers.Dense(10,input_shape=(32,))
model.add(layer1)
layer2 = layers.Dense(10,activation='softmax')
model.add(layer2)
#######################################
model = tf.keras.Sequential([layers.Dense(10,input_shape=(32,)),layers.Dense(10,activation='softmax')])

Using the model and `(data, labels)` above, let's train the model using the following configuration:
* optimizer: `tf.train.RMSPropOptimizer`
* learning rate: 0.001
* loss: `categorical_crossentropy`
* metrics: `accuracy`
* batch size: 32
* epochs: 20

In [7]:
############# Write here. #############
# model.compile(...)
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),loss='categorical_crossentropy',
              metrics=['accuracy'])
# model.fit(...) 
model.fit(data, labels, epochs=20, batch_size=32)
#######################################
    

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f9bbaded278>

## 2. Eager execution

Eager execution is a flexible machine learning platform for research and experimentation, providing:

* **An intuitive interface** Structure your code naturally and use Python data structures. Quickly iterate on small models and small data.
* **Easier debugging** Call ops directly to inspect running models and test changes. Use standard Python debugging tools for immediate error reporting.
* **Natural control flow** Use Python control flow instead of graph control flow, simplifying the specification of dynamic models.

Eager execution supports most TensorFlow operations and GPU acceleration. For a collection of examples running in eager execution, see: tensorflow/contrib/eager/python/examples.






### Setup

To start eager execution, add `tf.enable_eager_execution()`
 to the **beginning** of the program or console session. In the jupyter (or colab) environment, you need to restart the runtime if the session has executed any command.

In [11]:
import tensorflow as tf
tf.enable_eager_execution()

ValueError: ignored

In [10]:
# Checks whether the eager mode is used (True if enabled)
tf.executing_eagerly()

True

Unlike in the graph mode, all operations (e.g., constant, matrix multiplication) are returned directly. Recall that the operators are not executed until `Session.run()` and thus the values are not available even though we run statement in python-level.

### Constants
Instead, in the eager execution, the values are returned immediately. Let's start with constants that we defined at the previous sessions.

In [11]:
#constant of 1d tensor, or a vector
a = tf.constant([2,2], name = 'vector')

#constant of 2x2 tensor, or a matrix
b = tf.constant([[0,2], [1,3]], name = 'matrix')

print(a)
print(b)

# See what happens if you uncomment the last line
#print(a.name)

tf.Tensor([2 2], shape=(2,), dtype=int32)
tf.Tensor(
[[0 2]
 [1 3]], shape=(2, 2), dtype=int32)


We can see the results are printed. Obviously, the `name` field is not useful since we can now access the values in python code, so uncommenting the last line raises an error in `Tensor.name`.

As you might notice, this way is simpler and more intuitive than in graph mode, where the value can be retrieved only when `session.run()` is called as follows:

In [19]:
# In the graph mode, print(aa) prints only the metadata of aa
graph = tf.Graph()
with graph.as_default():
  aa = tf.constant([2,2], name = 'vector')
  print(aa.name)
print(aa)
  
# The value of the aa can be obtained through sess.run(aa)
with tf.Session(graph=graph) as sess:
  print(sess.run(aa))

vector:0
Tensor("vector:0", shape=(2,), dtype=int32)
[2 2]


### Math Ops
Eager execution also allows the mathmatical operations to be computed immediately. Let's take the same examples that we saw with the graph mode.

In [15]:
div = tf.div(b, a)
print(div)

# See what happens if you uncomment the last line
# print(div.op)

Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
tf.Tensor(
[[0 1]
 [0 1]], shape=(2, 2), dtype=int32)


Even better, TF eager execution supports operator overloading and numpy operations. See how the computation becomes easier.

In [16]:
# Operator overloading
print(b/a)

# Numpy integration
import numpy as np
print(np.divide(b, a))

tf.Tensor(
[[0.  1. ]
 [0.5 1.5]], shape=(2, 2), dtype=float64)
[[0.  1. ]
 [0.5 1.5]]


### Quiz 2.
Define three 2x2 matrices (A, B, C) with the following values, and print the value of A x B + C (use eager execution)
* A: [[3, 4], [2, 1]]
* B: [[1,2], [1,2]]
* C: [[-1,1],[4,1]]

In [18]:
########## Write here. ##########
A = tf.constant([[3,4],[2,1]])
B = tf.constant([[1,2],[1,2]])
C = tf.constant([[-1,1],[4,1]])
res = A*B + C
res2 =  tf.add(tf.matmul(A,B),C)
print(res)
print(res2)

##############################

tf.Tensor(
[[2 9]
 [6 3]], shape=(2, 2), dtype=int32)
tf.Tensor(
[[ 6 15]
 [ 7  7]], shape=(2, 2), dtype=int32)


### Dynamic control flow

A major benefit of eager execution is that all the functionality of the host language is available while your model is executing. So, for example, it is easy to write fizzbuzz game where any number divisible by three is replaced with the word "fizz", and any number divisible by five is replaced with the word "buzz" (similar to 3-6-9 game in Korea).

In [20]:
# Native python code
def fizzbuzz(max_num):
  counter = 0
  for num in range(1, max_num+1):
    if int(num % 3) == 0 and int(num % 5) == 0:
      print('FizzBuzz')
    elif int(num % 3) == 0:
      print('Fizz')
    elif int(num % 5) == 0:
      print('Buzz')
    else:
      print(num)
    counter += 1
    
fizzbuzz(20)

1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz


In eager execution, we need to add minor changes in a few lines.

In [21]:
import tensorflow as tf
def fizzbuzz_eager(max_num):
  counter = tf.constant(0) # counter = 0
  max_num = tf.convert_to_tensor(max_num) #
  for num in range(1, max_num.numpy()+1): #
    num = tf.constant(num) # 
    if int(num % 3) == 0 and int(num % 5) == 0:
      print('FizzBuzz')
    elif int(num % 3) == 0:
      print('Fizz')
    elif int(num % 5) == 0:
      print('Buzz')
    else:
      print(num.numpy()) # print(num)
    counter += 1
    
fizzbuzz_eager(20)

1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz


I searched *fizzbuzz in tensorflow graph* in Google and found a blog post [Tensorflow FizzBuzz Revisited (Ricky Han blog)](https://rickyhan.com/jekyll/update/2018/02/16/tensorflow-fizzbuzz-revisited.html) that shows how the above code can be written in TF graph mode:

In [22]:
import tensorflow as tf
def fizzbuzz_graph(max_num):
  # Define variable and while_loop
  graph = tf.Graph()
  with graph.as_default():
    arr = tf.Variable([str(i) for i in range(1, max_num+1)])
    while_op = tf.while_loop(
      (lambda i, _: tf.less(i, max_num+1)), 
      (lambda i, _: (tf.add(i,1), tf.cond(
          tf.logical_and(tf.equal(tf.mod(i, 3), 0), tf.equal(tf.mod(i, 5), 0)),
          (lambda : tf.assign(arr[(i - 1)], 'FizzBuzz')),
          (lambda : tf.cond(tf.equal(tf.mod(i, 3), 0),
              (lambda : tf.assign(arr[(i - 1)], 'Fizz')),
              (lambda : tf.cond(tf.equal(tf.mod(i, 5), 0),
                  (lambda : tf.assign(arr[(i - 1)], 'Buzz')),
                  (lambda : arr)))))))),
      [1, arr])
  # Call Session.run()
  with tf.Session(graph = graph) as sess:
      sess.run(tf.global_variables_initializer())
      idx, array = sess.run(while_op)
      print(array)
      
fizzbuzz_graph(100)

[b'1' b'2' b'Fizz' b'4' b'Buzz' b'Fizz' b'7' b'8' b'Fizz' b'Buzz' b'11'
 b'Fizz' b'13' b'14' b'FizzBuzz' b'16' b'17' b'Fizz' b'19' b'Buzz' b'Fizz'
 b'22' b'23' b'Fizz' b'Buzz' b'26' b'Fizz' b'28' b'29' b'FizzBuzz' b'31'
 b'32' b'Fizz' b'34' b'Buzz' b'Fizz' b'37' b'38' b'Fizz' b'Buzz' b'41'
 b'Fizz' b'43' b'44' b'FizzBuzz' b'46' b'47' b'Fizz' b'49' b'Buzz' b'Fizz'
 b'52' b'53' b'Fizz' b'Buzz' b'56' b'Fizz' b'58' b'59' b'FizzBuzz' b'61'
 b'62' b'Fizz' b'64' b'Buzz' b'Fizz' b'67' b'68' b'Fizz' b'Buzz' b'71'
 b'Fizz' b'73' b'74' b'FizzBuzz' b'76' b'77' b'Fizz' b'79' b'Buzz' b'Fizz'
 b'82' b'83' b'Fizz' b'Buzz' b'86' b'Fizz' b'88' b'89' b'FizzBuzz' b'91'
 b'92' b'Fizz' b'94' b'Buzz' b'Fizz' b'97' b'98' b'Fizz' b'Buzz']


### Model Subclassing
We can build a fully-customizable model by subclassing [tf.keras.Model](https://www.tensorflow.org/api_docs/python/tf/keras/models/Model) and defining your own forward pass. Layers are created in the `__init__` method and they are set as attributes of the class instance. The forward pass is defined in the `call` method. Model subclassing is particularly useful when eager execution is enabled since the forward pass can be written imperatively.



In [0]:
class MNISTModel(tf.keras.Model):
  def __init__(self):
    """Define layers"""
    super(MNISTModel, self).__init__()
    self.dense1 = tf.keras.layers.Dense(units=10)
    self.dense2 = tf.keras.layers.Dense(units=10)

  def call(self, input):
    """Define forward pass."""
    result = self.dense1(input)
    result = self.dense2(result)
    result = self.dense2(result)  # reuse variables from dense2 layer
    return result

model = MNISTModel()

It's not required to set an input shape for the `tf.keras.Model` class since the parameters are set the first time input is passed to the layer.

tf.keras.layers classes create and contain their own model variables that are tied to the lifetime of their layer objects. To share layer variables, share their objects.

### Eager training

Automatic differentiation is useful for implementing machine learning algorithms such as backpropagation for training neural networks. During eager execution, use tf.GradientTape to trace operations for computing gradients later.

tf.GradientTape is an opt-in feature to provide maximal performance when not tracing. Since different operations can occur during each call, all forward-pass operations get recorded to a "tape". To compute the gradient, play the tape backwards and then discard. A particular tf.GradientTape can only compute one gradient; subsequent calls throw a runtime error.



In [27]:
w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
  loss = w * w

grad = tape.gradient(loss, w)
print(grad)  # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)

tf.Tensor([[2.]], shape=(1, 1), dtype=float32)


Below is an example of a linear regression model to be defined as a subclass of `tf.keras.Model`, and then be trained using loss and gradient function, which is defined imperatively.

In [0]:
class Model(tf.keras.Model):
  def __init__(self):
    """Define layers"""
    super(Model, self).__init__()
    self.W = tf.Variable(5., name='weight')
    self.B = tf.Variable(10., name='bias')
  
  def call(self, inputs):
    """Define forward pass"""
    return inputs * self.W + self.B

# The loss function to be optimized
def loss(model, inputs, targets):
  error = model(inputs) - targets
  return tf.reduce_mean(tf.square(error))

def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return tape.gradient(loss_value, [model.W, model.B])



In [29]:
# Instantiate the model and an optimizer to use.
model = Model()
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)


# Let's train with a toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 2000
training_inputs = tf.random_normal([NUM_EXAMPLES])
noise = tf.random_normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise
print("Initial loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))

# Training loop
for i in range(300):
  grads = grad(model, training_inputs, training_outputs)
  optimizer.apply_gradients(zip(grads, [model.W, model.B]),
                            global_step=tf.train.get_or_create_global_step())
  if i % 20 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss(model, training_inputs, training_outputs)))

print("Final loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))
print("W = {}, B = {}".format(model.W.numpy(), model.B.numpy()))

Initial loss: 68.693
Loss at step 000: 66.015
Loss at step 020: 29.999
Loss at step 040: 13.932
Loss at step 060: 6.765
Loss at step 080: 3.567
Loss at step 100: 2.141
Loss at step 120: 1.504
Loss at step 140: 1.220
Loss at step 160: 1.094
Loss at step 180: 1.037
Loss at step 200: 1.012
Loss at step 220: 1.001
Loss at step 240: 0.996
Loss at step 260: 0.994
Loss at step 280: 0.993
Final loss: 0.992
W = 3.047919988632202, B = 2.0220162868499756


## Wrap-up

So far, we have learned how we can use two types of high-level APIs: Keras and eager execution. For more information about Keras and Eager execution, you can visit https://www.tensorflow.org/guide/keras and https://www.tensorflow.org/guide/eager, respectively (and many other blog posts as well!).

Next week, we will build a Deep Reinforcement Learning application using these APIs.
