# Eager Execution
- eager execution -> chainerのようにDefine-by-Runでモデルを記述する
- 計算グラフの構築をせずに、即座にオペレーションの評価をする計算環境
- eager executionは、実験のための柔軟なプラットフォーム
    - 直感的
    - 簡単なデバッグ
    - 自然な計算フロー:pythonのコードと同じように書ける

https://www.tensorflow.org/guide/eager

# basic usage

In [1]:
from __future__ import absolute_import, division, print_function

import tensorflow as tf

tf.enable_eager_execution() # eager executionの実行用
# 計算グラフを構築しないで、すぐに値を返す

  from ._conv import register_converters as _register_converters


In [2]:
tf.executing_eagerly()

True

In [3]:
x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m)) 

hello, [[4.]]


- Numpyはtf.Tensorを扱える
- TensorFlowのmath operationはNumpyArrayに変換可能(tf.Tensor.numpy method)

In [4]:
a = tf.constant([[1, 2],
                 [3, 4]])
print(a)

tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32)


In [5]:
b = tf.add(a, 1)
print(b)

tf.Tensor(
[[2 3]
 [4 5]], shape=(2, 2), dtype=int32)


In [6]:
import numpy as np
# a,bはtf.Tensor
c = np.multiply(a, b)
print(c)

[[ 2  6]
 [12 20]]


In [7]:
print(type(a))

<class 'EagerTensor'>


In [8]:
# Numpy.arrayに変換
print(a.numpy())

[[1 2]
 [3 4]]


# Dynamic control flow
- eager executionの嬉しいところは、ホスト言語の機能を利用可能なところ

In [9]:
def fizzbuzz(max_num):
    counter = tf.constant(0)
    max_num = tf.convert_to_tensor(max_num)
    for num in range(max_num.numpy()):
        num = tf.constant(num)
        if int(num % 3) == 0 and int(num % 5) == 0:
            print('FizzBuzz')
        elif int(num % 3) == 0:
            print('Fizz')
        elif int(num % 5) == 0:
            print('Buzz')
        else:
            print(num)
        counter += 1
    return counter

In [10]:
fizzbuzz(10)

FizzBuzz
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
Fizz
tf.Tensor(4, shape=(), dtype=int32)
Buzz
Fizz
tf.Tensor(7, shape=(), dtype=int32)
tf.Tensor(8, shape=(), dtype=int32)
Fizz


<tf.Tensor: id=98, shape=(), dtype=int32, numpy=10>

# Build a model
- モデルは層を重ねて表現する
- eager executionを利用する場合、自分で層を書くか、kerasで提供される層を利用する
- 層を自分で書く場合
    - tf.keras.layers.Layerを継承すると便利

In [11]:
class MySimpleLayer(tf.keras.layers.Layer):
    def __init__(self, output_units):
        super(MySimpleLayer, self).__init__()
        self.output_units = output_units

    def build(self, input_shape):
        # The build method gets called the first time your layer is used.
        # Creating variables on build() allows you to make their shape depend
        # on the input shape and hence removes the need for the user to specify
        # full shapes. It is possible to create variables during __init__() if
        # you already know their full shapes.
        self.kernel = self.add_variable(
            "kernel", [input_shape[-1], self.output_units])
    
    def call(self, input):
        # Override call() instead of __call__ so we can perform some bookkeeping.
        return tf.matmul(input, self.kernel)


- 以下では、上記のMySimpleLayerの代わりにtf.keras.layers.Denseを利用する。
- 層を線形にスタックしていくためにtf.keras.Sequentialが使える

In [12]:
model = tf.keras.Sequential([
  tf.keras.layers.Dense(10, input_shape=(784,)),  # must declare input shape
  tf.keras.layers.Dense(10)
])

- または、tf.keras.Modelを継承してモデルを構成することもできる

In [13]:
class MNISTModel(tf.keras.Model):
    def __init__(self):
        super(MNISTModel, self).__init__()
        self.dense1 = tf.keras.layers.Dense(units=10)
        self.dense2 = tf.keras.layers.Dense(units=10)

    def call(self, input):
        """Run the model."""
        result = self.dense1(input)
        result = self.dense2(result)
        result = self.dense2(result)  # reuse variables from dense2 layer
        return result


model = MNISTModel()

# Eager training
## Computing gradients
- 自動微分は機械学習ではとっても便利
- eager executionでは、tf.GradientTapeを使用して勾配を計算できる
- 呼び出し中に異なる動作が発生する可能性があるため、全てのForwardパスの操作は「テープ」に記録される
- 勾配を計算する際には、テープを再生する


In [14]:
w = tf.contrib.eager.Variable([[1.0]])
with tf.GradientTape() as tape:
    loss = w * w

grad = tape.gradient(loss, w)
print(grad)  # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)

tf.Tensor([[2.]], shape=(1, 1), dtype=float32)


- シンプルなモデルを訓練する例

In [15]:
# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 1000
training_inputs = tf.random_normal([NUM_EXAMPLES])
noise = tf.random_normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise


def prediction(input, weight, bias):
    return input * weight + bias


# A loss function using mean-squared error
def loss(weights, biases):
    error = prediction(training_inputs, weights, biases) - training_outputs
    # Forward計算が含まれる
    return tf.reduce_mean(tf.square(error))


# Return the derivative of loss with respect to weight and bias
def grad(weights, biases):
    with tf.GradientTape() as tape:
        # forward計算をテープに記録する
        loss_value = loss(weights, biases)
    # テープを再生して勾配を計算する
    return tape.gradient(loss_value, [weights, biases])



train_steps = 200
learning_rate = 0.01
# Start with arbitrary values for W and B on the same batch of data
W = tf.contrib.eager.Variable(5.)
B = tf.contrib.eager.Variable(10.)

print("Initial loss: {:.3f}".format(loss(W, B)))

for i in range(train_steps):
    dW, dB = grad(W, B)
    W.assign_sub(dW * learning_rate)
    B.assign_sub(dB * learning_rate)
    if i % 20 == 0:
        print("Loss at step {:03d}: {:.3f}".format(i, loss(W, B)))

print("Final loss: {:.3f}".format(loss(W, B)))
print("W = {}, B = {}".format(W.numpy(), B.numpy()))

Initial loss: 69.257
Loss at step 000: 66.539
Loss at step 020: 30.073
Loss at step 040: 13.889
Loss at step 060: 6.707
Loss at step 080: 3.519
Loss at step 100: 2.105
Loss at step 120: 1.477
Loss at step 140: 1.198
Loss at step 160: 1.075
Loss at step 180: 1.020
Final loss: 0.996
W = 3.036214590072632, B = 2.1418139934539795


## train a model


- trainingを除いて、modelをcallして出力を確認できる

In [16]:
# Create a tensor representing a blank image
batch = tf.zeros([1, 1, 784])
print(batch.shape)  # => (1, 1, 784)

# call MNIST model
result = model(batch)
print(result)

(1, 1, 784)
tf.Tensor([[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]], shape=(1, 1, 10), dtype=float32)


- ここのexampleでは、dataset.py moduleを利用する(直下にこのmoduleをdownloadしておく)
- また、MNISTデータを以下のコマンドで取得する

In [17]:
import dataset  # download dataset.py file
dataset_train = dataset.train('./datasets').shuffle(60000).repeat(4).batch(32)

- modelを訓練するために、loss関数を定義する
- 変数の更新のために、optimizerを利用する

In [19]:
def loss(model, x, y):
    prediction = model(x)
    return tf.losses.sparse_softmax_cross_entropy(labels=y, logits=prediction)


def grad(model, inputs, targets):
    with tf.GradientTape() as tape:
        loss_value = loss(model, inputs, targets)
    return tape.gradient(loss_value, model.variables)


optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)

x, y = iter(dataset_train).next()
print("Initial loss: {:.3f}".format(loss(model, x, y)))

# Training loop
for (i, (x, y)) in enumerate(dataset_train):
    # Calculate derivatives of the input function with respect to its parameters.
    grads = grad(model, x, y)
    # Apply the gradient to the model
    optimizer.apply_gradients(zip(grads, model.variables),
                              global_step=tf.train.get_or_create_global_step())
    if i % 200 == 0:
        print("Loss at step {:04d}: {:.3f}".format(i, loss(model, x, y)))

print("Final loss: {:.3f}".format(loss(model, x, y)))

Initial loss: 0.227
Loss at step 0000: 0.722
Loss at step 0200: 0.410
Loss at step 0400: 0.886
Loss at step 0600: 0.532
Loss at step 0800: 0.543
Loss at step 1000: 0.365
Loss at step 1200: 0.421
Loss at step 1400: 0.235
Loss at step 1600: 0.567
Loss at step 1800: 0.414
Loss at step 2000: 0.486
Loss at step 2200: 0.499
Loss at step 2400: 0.809
Loss at step 2600: 0.349
Loss at step 2800: 0.437
Loss at step 3000: 0.617
Loss at step 3200: 0.497
Loss at step 3400: 0.394
Loss at step 3600: 0.730
Loss at step 3800: 0.545
Loss at step 4000: 0.676
Loss at step 4200: 0.458
Loss at step 4400: 0.320
Loss at step 4600: 0.285
Loss at step 4800: 0.327
Loss at step 5000: 0.335
Loss at step 5200: 0.281
Loss at step 5400: 0.314
Loss at step 5600: 0.196
Loss at step 5800: 0.272
Loss at step 6000: 0.522
Loss at step 6200: 0.582
Loss at step 6400: 0.241
Loss at step 6600: 0.585
Loss at step 6800: 0.193
Loss at step 7000: 0.253
Loss at step 7200: 0.328
Loss at step 7400: 0.195
Final loss: 0.376


- GPUを使って高速に計算するには

In [20]:
dataset_train = dataset.train('./datasets').shuffle(60000).repeat(4).batch(32)

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)

x, y = iter(dataset_train).next()
print("Initial loss: {:.3f}".format(loss(model, x, y)))


with tf.device("/gpu:0"):
    for (i, (x, y)) in enumerate(dataset_train):
        # minimize() is equivalent to the grad() and apply_gradients() calls.
        optimizer.minimize(lambda: loss(model, x, y),
                           global_step=tf.train.get_or_create_global_step())

Initial loss: 0.428


## Variables and optimizers
- tf.contrib.eager.Variableはtf.Tensorを格納している。自動微分を容易にするために。
- モデルのパラメータは変数としてクラスにカプセル化できる
- tf.GradientTapeでtf.contrib.eager.Variableを使用することでより良いカプセル化ができる

In [22]:
import tensorflow.contrib.eager as tfe

In [23]:
class Model(tf.keras.Model):
    def __init__(self):
        super(Model, self).__init__()
        self.W = tfe.Variable(5., name='weight')
        self.B = tfe.Variable(10., name='bias')

    def predict(self, inputs):
        return inputs * self.W + self.B


# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 2000
training_inputs = tf.random_normal([NUM_EXAMPLES])
noise = tf.random_normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise

# The loss function to be optimized

# loss functionはMSE
def loss(model, inputs, targets):
    error = model.predict(inputs) - targets
    return tf.reduce_mean(tf.square(error))


def grad(model, inputs, targets):
    with tf.GradientTape() as tape:
        loss_value = loss(model, inputs, targets)
    return tape.gradient(loss_value, [model.W, model.B])


# Define:
# 1. A model.
# 2. Derivatives of a loss function with respect to model parameters.
# 3. A strategy for updating the variables based on the derivatives.
model = Model()
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)

print("Initial loss: {:.3f}".format(
    loss(model, training_inputs, training_outputs)))

# Training loop
for i in range(300):
    grads = grad(model, training_inputs, training_outputs)
    optimizer.apply_gradients(zip(grads, [model.W, model.B]),
                              global_step=tf.train.get_or_create_global_step())
    if i % 20 == 0:
        print("Loss at step {:03d}: {:.3f}".format(
            i, loss(model, training_inputs, training_outputs)))

print("Final loss: {:.3f}".format(
    loss(model, training_inputs, training_outputs)))
print("W = {}, B = {}".format(model.W.numpy(), model.B.numpy()))

Initial loss: 69.338
Loss at step 000: 66.639
Loss at step 020: 30.323
Loss at step 040: 14.105
Loss at step 060: 6.863
Loss at step 080: 3.629
Loss at step 100: 2.184
Loss at step 120: 1.539
Loss at step 140: 1.251
Loss at step 160: 1.122
Loss at step 180: 1.065
Loss at step 200: 1.039
Loss at step 220: 1.028
Loss at step 240: 1.022
Loss at step 260: 1.020
Loss at step 280: 1.019
Final loss: 1.019
W = 2.9842734336853027, B = 1.9938817024230957


# Use objects for state during eager execution
- graph execution(define-and-run)の場合、状態(変数)の管理はtf.Sessionによって行われていた
- 対して、eager executionでは、変数のlifetimeはpythonオブジェクトのライフタイムによって定義される

## Variables are objects
- eager executionでは、オブジェクトの最後の参照が削除されるまで存続する

In [24]:
with tf.device("gpu:0"):
    v = tfe.Variable(tf.random_normal([1000, 1000]))
    v = None  # v no longer takes up GPU memory

In [25]:
v

## Object-based saving
- tfe.Checkpouintはtfe.Variableのsaveとrestoreができる
- tfe.Checkpointでは、model, optimizerなども保存できる。

In [28]:
x = tfe.Variable(10.)
print(x)

checkpoint = tfe.Checkpoint(x=x)  # save as "x"

x.assign(2.)   # Assign a new value to the variables and save.
save_path = checkpoint.save('./ckpt/')
print(x)

x.assign(11.)  # Change the variable after saving.
print(x)

# Restore values from the checkpoint
checkpoint.restore(save_path)

print(x)

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=10.0>
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=11.0>
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>


## Object-oriented metrics
- tfe.metricsはオブジェクトとして格納される
- 新しいデータによってmetricを更新し、tfe.metrics.resultメソッドを利用して結果を取り出す

In [29]:
m = tfe.metrics.Mean("loss")
m(0)
m(5) # (0+5)/2
print(m.result())
m([8, 9]) # (0+5+8+9)/4
print(m.result())

tf.Tensor(2.5, shape=(), dtype=float64)
tf.Tensor(5.5, shape=(), dtype=float64)


## Summaries and TensorBoard
- tf.contrib.summaryはgraph executionもeager executionも共に利用できる
- 以下のように書く

```
writer = tf.contrib.summary.create_file_writer(logdir)
global_step=tf.train.get_or_create_global_step()  # return global step var

writer.set_as_default()

for _ in range(iterations):
  global_step.assign_add(1)
  # Must include a record_summaries method
  with tf.contrib.summary.record_summaries_every_n_global_steps(100):
    # your model code goes here
    tf.contrib.summary.scalar('loss', loss)
     ...
```

# Advanced automatic differentiation topics
## Dynamic models
- tf.GradientTape は動的なモデルも扱うことができる
- example:backtracking line search algorithm
    - 最適化の手法の一つ?

In [30]:
def line_search_step(fn, init_x, rate=1.0):
    with tf.GradientTape() as tape:
        # Variables are automatically recorded, but manually watch a tensor
        tape.watch(init_x)
        value = fn(init_x)
    grad = tape.gradient(value, init_x)
    grad_norm = tf.reduce_sum(grad * grad)
    init_value = value
    while value > init_value - rate * grad_norm:
        x = init_x - rate * grad
        value = fn(x)
        rate /= 2.0
    return x, value

## Additional functions to compute gradients
- tf.GradientTapeは強力なインターフェース。

### tfe.gradients_function
- 引数の関数の導関数を計算する関数を返す。
- 入力関数はスカラ値を返す必要がある
- 戻された関数が実行されると、tf.Tensorのリストが返される：入力関数の各引数にそれぞれ対応する要素

### tfe.value_and_gradients_function
- tfe.gradients_functionと似ているが、返された関数が実行されると、tf.Tensorのリストに加えて、入力関数からの値を返す。

In [31]:
def square(x):
    return tf.multiply(x, x)


grad = tfe.gradients_function(square)

print(square(3.))  # => 9.0
print(grad(3.))    # => [6.0] (x^2)'=2x

# The second-order derivative of square:
gradgrad = tfe.gradients_function(lambda x: grad(x)[0]) # スカラ値を返す必要?
print(gradgrad(3.))  # => [2.0]

# The third-order derivative is None:
gradgradgrad = tfe.gradients_function(lambda x: gradgrad(x)[0])
print(gradgradgrad(3.))  # => [None]


# With flow control:
def abs(x):
    return x if x > 0. else -x


grad = tfe.gradients_function(abs)

print(grad(3.))   # => [1.0]
print(grad(-3.))  # => [-1.0]

tf.Tensor(9.0, shape=(), dtype=float32)
[<tf.Tensor: id=1978883, shape=(), dtype=float32, numpy=6.0>]
[<tf.Tensor: id=1978895, shape=(), dtype=float32, numpy=2.0>]
[None]
[<tf.Tensor: id=159, shape=(), dtype=float32, numpy=1.0>]
[<tf.Tensor: id=1978920, shape=(), dtype=float32, numpy=-1.0>]


## Custom gradients
- custom gradients はgradientsをオーバーライドする簡単な方法
- forward関数内で入力、出力、または中間結果に関する勾配を定義する。
- 以下の例は、backwardパス内で勾配のノルムをクリップする

In [32]:
@tf.custom_gradient
def clip_gradient_by_norm(x, norm):
    y = tf.identity(x)

    def grad_fn(dresult):
        return [tf.clip_by_norm(dresult, norm), None]
    return y, grad_fn

- custom gradientsは数値的に安定した勾配を提供するために使われる

In [33]:
def log1pexp(x):
    return tf.log(1 + tf.exp(x))


grad_log1pexp = tfe.gradients_function(log1pexp)

# The gradient computation works fine at x = 0.
print(grad_log1pexp(0.))  # => [0.5]

# However, x = 100 fails because of numerical instability.
print(grad_log1pexp(100.)) # 数値的に計算できる範囲を超えてしまった

[<tf.Tensor: id=1978929, shape=(), dtype=float32, numpy=0.5>]
[<tf.Tensor: id=1978938, shape=(), dtype=float32, numpy=nan>]


- 上記のlog1pexpはcustom gradientsで解析的に簡略化できる
- 以下の実装では、forward pathで計算されるtf.exp(x)を再利用する

In [34]:
@tf.custom_gradient
def log1pexp(x):
    e = tf.exp(x)

    def grad(dy):
        return dy * (1 - 1 / (1 + e))
    return tf.log(1 + e), grad


grad_log1pexp = tfe.gradients_function(log1pexp)

# As before, the gradient computation works fine at x = 0.
print(grad_log1pexp(0.))  # => [0.5]

# And the gradient computation also works at x = 100.
print(grad_log1pexp(100.))  # => [1.0]

[<tf.Tensor: id=1978948, shape=(), dtype=float32, numpy=0.5>]
[<tf.Tensor: id=1978958, shape=(), dtype=float32, numpy=1.0>]
