# 动态图介绍

Eager Execution 是命令式的、由运行定义的的交互界面，运算在从 Python 中调出时便同时进行。这使得启动 TensorFlow 更加简单，也使得研究与开发更加直观。

## 运用 Eager Execution
当你启用 Eager Execution，运算会立刻执行并把值返回 Python，不必调用一次 `Session.run()`。举个例子，要将两个矩阵相乘，我们就写成这样：

In [1]:
import tensorflow as tf

tf.enable_eager_execution()

import tensorflow.contrib.eager as tfe

  from ._conv import register_converters as _register_converters


Instructions for updating:
Use the retry module or similar alternatives.


In [3]:
x = [[2.]]
m = tf.matmul(x, x)

# 使用 print 或者 Python 调试器检查中间结果非常直接。

print(m)
# The 1x1 matrix [[4.]]

tf.Tensor([[4.]], shape=(1, 1), dtype=float32)


可以用 Python 流控制建立动态模型。这里有个使用 TensorFlow 的算术运算做 Collatz conjecture 的例子：

In [4]:
a = tf.constant(12)
counter = 0
while not tf.equal(a, 1):
    if tf.equal(a % 2, 0):
        a = a / 2
    else:
        a = 3 * a + 1
    print(a)

tf.Tensor(6.0, shape=(), dtype=float64)
tf.Tensor(3.0, shape=(), dtype=float64)
tf.Tensor(10.0, shape=(), dtype=float64)
tf.Tensor(5.0, shape=(), dtype=float64)
tf.Tensor(16.0, shape=(), dtype=float64)
tf.Tensor(8.0, shape=(), dtype=float64)
tf.Tensor(4.0, shape=(), dtype=float64)
tf.Tensor(2.0, shape=(), dtype=float64)
tf.Tensor(1.0, shape=(), dtype=float64)


# 梯度 Gradients
大部分 TensorFlow 使用者都对自动微分（automatic differentiation）有兴趣。因为在每次调用过程中都会产生不同的操作，因此我们将所有的顺序操作记录到“磁带”上，然后在计算梯度时逆序播放。在计算出梯度之后，我们就丢弃磁带。

如果你熟悉`autograd`包，那么这个 API 就非常简单。举个例子：

In [4]:
def square(x):
    return tf.multiply(x, x)

grad = tfe.gradients_function(square)

print('square(3.) = ', square(3.))    # [9.]
print('grad(3.) = ', grad(3.))      # [6.]

square(3.) =  tf.Tensor(9.0, shape=(), dtype=float32)
grad(3.) =  [<tf.Tensor: id=13, shape=(), dtype=float32, numpy=6.0>]


这个`gradients_function`调用需要一个 Python 函数 `square()`作为一个主目(argument)然后返回一个可调用的Python 来根据所输入的值计算`square()`的偏导数。所以要获得`square()` 在`3.0`的导数, 调用 `grad(3.0)`即可，结果是`6``。

相同的`gradients_function`调用可被用于获得平方的二阶导数：

In [5]:
gradgrad = tfe.gradients_function(lambda x: grad(x)[0])
print('gradgrad(3.) = ', gradgrad(3.))  # [2.]

gradgrad(3.) =  [<tf.Tensor: id=25, shape=(), dtype=float32, numpy=2.0>]


In [6]:
def abs(x):
    return x if x > 0. else -x

grad = tfe.gradients_function(abs)
print('grad(2.0) = ', grad(2.0))  # [1.]
print('grad(-2.0) = ', grad(-2.0)) # [-1.]

grad(2.0) =  [<tf.Tensor: id=10, shape=(), dtype=float32, numpy=1.0>]
grad(-2.0) =  [<tf.Tensor: id=40, shape=(), dtype=float32, numpy=-1.0>]


## 自定义梯度

使用者或许会想给一个运算或一个函数定制梯度。该操作有用的原因有很多，比如提高计算效率和数值稳定性。

这里有个关于自定义梯度使用的例子：首先我们来看看这个函数`log(1 + ex)`，此函数通常出现在交叉熵和对数似然的计算中。

In [7]:
def log1pexp(x):
    return tf.log(1 + tf.exp(x))

grad_log1pexp = tfe.gradients_function(log1pexp)

# The gradient computation works fine at x = 0.
print(grad_log1pexp(0.))
# [0.5]

# However it returns a `nan` at x = 100 due to numerical instability.
print(grad_log1pexp(100.))
# [nan]

[<tf.Tensor: id=49, shape=(), dtype=float32, numpy=0.5>]
[<tf.Tensor: id=58, shape=(), dtype=float32, numpy=nan>]


我们可以将自定义梯度应用于上述函数，简化梯度表达式。注意下面的梯度函数实现重用了前向传导中计算的 (`tf.exp(x)`)，避免冗余计算，从而提高梯度计算的效率。

In [8]:
@tfe.custom_gradient
def log1pexp(x):
    e = tf.exp(x)

    def grad(dy):
        return dy * (1 - 1 / (1 + e))

    return tf.log(1 + e), grad


grad_log1pexp = tfe.gradients_function(log1pexp)


# Gradient at x = 0 works as before.
print(grad_log1pexp(0.))
# [0.5]

# And now gradient computation at x=100 works as well.
print(grad_log1pexp(100.))
# [1.0]

[<tf.Tensor: id=68, shape=(), dtype=float32, numpy=0.5>]
[<tf.Tensor: id=78, shape=(), dtype=float32, numpy=1.0>]


# 建模

模型可以按类别划分。这里有个模型，创建了一个 (简单) 双层网络，可以对标准 MNIST 手写数字进行分类：

- 我们推荐在`tf.layers`使用类，因为他们可创立并包含模型参数（变量）。可变寿命与层对象的寿命相关联，所以一定要对其进行跟踪。.

- 为何要使用`tfe.Network`呢？网络是层的容器，而且就是`tf.layer.Layer`本身，允许 `Network` 对象 嵌入其他`Network`对象中。它也包含辅助检查，保存和恢复的功能。

In [10]:
class MNISTModel(tfe.Network):
    def __init__(self):
        super().__init__()
        self.layer1 = self.track_layer(tf.layers.Dense(units=10))
        self.layer2 = self.track_layer(tf.layers.Dense(units=10))

    def call(self, input):
        """Actually runs the model."""
        result = self.layer1(input)
        result = self.layer2(result)
        return result

# 即使不对模型进行训练，我们也会调用它并检查输出：
# Let's make up a blank input image


model = MNISTModel()
batch = tf.zeros([1, 1, 784])

print(batch.shape)
# (1, 1, 784)

result = model(batch)
print(result)
# tf.Tensor([[[ 0.  0., ...., 0.]]], shape=(1, 1, 10), dtype=float32)

(1, 1, 784)
tf.Tensor([[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]], shape=(1, 1, 10), dtype=float32)


In [17]:
class Model(object):
    def __init__(self):
        self.W = tfe.Variable(5., name='weight')
        self.B = tfe.Variable(10., name='bias')

    def predict(self, inputs):
        return inputs * self.W + self.B


# The loss function to be optimized
def loss(model, inputs, targets):
    error = model.predict(inputs) - targets
    return tf.reduce_mean(tf.square(error))


# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 1000
training_inputs = tf.random_normal([NUM_EXAMPLES])
noise = tf.random_normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise

# Define:
# 1. A model
# 2. Derivatives of a loss function with respect to model parameters
# 3. A strategy for updating the variables based on the derivatives
model = Model()
grad = tfe.implicit_gradients(loss)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)

# The training loop
print("Initial loss: %f" %
      loss(model, training_inputs, training_outputs).numpy())
for i in range(201):
    optimizer.apply_gradients(grad(model, training_inputs, training_outputs))
    if i % 20 == 0:
        print("Loss at step %d: %f" %
              (i, loss(model, training_inputs, training_outputs).numpy()))
print("Final loss: %f" % loss(model, training_inputs, training_outputs).numpy())
print("W, B = %s, %s" % (model.W.numpy(), model.B.numpy()))

Initial loss: 67.118286
Loss at step 0: 64.562920
Loss at step 20: 29.936848
Loss at step 40: 14.214672
Loss at step 60: 7.061634
Loss at step 80: 3.800792
Loss at step 100: 2.311368
Loss at step 120: 1.629747
Loss at step 140: 1.317222
Loss at step 160: 1.173664
Loss at step 180: 1.107602
Loss at step 200: 1.077150
Final loss: 1.077150
W, B = 3.069365, 2.1580782


注意，我们不需要任何占位符或会话（`session`）。在数据第一次输入时，就设置了层的参数的尺寸。

训练任何模型我们都得定义一个损失函数来优化、计算梯度，然后使用优化器来更新变量。首先，这里是一个损失函数：
`implicit_gradients()` 根据全部计算时所用的 TensorFlow 变量计算 loss_function 的导数

In [None]:
def loss_function(model, x, y):
    y_ = model(x)
    return tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_)


# 之后是我们的训练过程：
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)

for (x, y) in tfe.Iterator(dataset):
    grads = tfe.implicit_gradients(loss_function)(model, x, y)
    optimizer.apply_gradients(grads)

## Using Eager with Graphs

Eager Execution 使开发和调试更加互动，但 TensorFlow 图拥有诸多关于分布式训练，性能优化和产品部署的优点。
当启用 Eager Execution 时，执行运算的相同代码会构建一个图，描述其非计算时的情况。把你的模型转化为图，只要在未启用 Eager Execution 的新的 Python 模块中运行相同的代码即可，如你所见，在 MNIST example 这个例子中一样。模型变量的值可在还原点储存并恢复，使得我们可以在 Eager Execution（命令式）和图（陈述式）编程间切换自如。有了这个，使用 Eager Execution 开发的模型可以轻易导出用于产品部署。

在不远的将来，我们将提供选择性地将模型的一部分转换为图的功能。用这个方式，你可以把你的计算（比如自定义RNN细胞的内部）同高性能相融合，但也保持灵活性和Eager Execution的可读性。

## 如何修改我的代码？
使用Eager Execution应对当前TensorFlow 使用者直观。目前只有少数针对Eager Execution的API；大多数现有API和计算工作都启用了Eager Execution。一些要点要记住： 
- 总得按 TensorFlow 来讲，如果你还没在输入进程中从队列换成使用 `tf.data` ，我们建议你该换了。这样更好用而且一般也更快。
- 使用面向对象的层，比如 `tf.layer.Conv2D()` 或 `Keras` 层；这些都有变量的显式储存。

对大多数模型而言，你可以写下代码，然后它就可以在 Eager Execution 和图构建情况下一样运行。也有一些例外，比如动态模型会使用 Python 控制流来改变基于输入的计算。

一旦你调用了`tfe.enable_eager_execution()`，就没法关闭了。要获取图行为，就要新开一个 Python 模块。

# GPU 运算

In [12]:
import time


def measure(x):
    # The very first time a GPU is used by TensorFlow, it is initialized.
    # So exclude the first run from timing.
    tf.matmul(x, x)

    start = time.time()
    for i in range(10):
        tf.matmul(x, x)
    end = time.time()

    return "Took %s seconds to multiply a %s matrix by itself 10 times" % (
        end - start, x.shape)


# Run on CPU:
with tf.device("/cpu:0"):
    print("CPU: %s" % measure(tf.random_normal([1000, 1000])))

# If a GPU is available, run on GPU:
if tfe.num_gpus() > 0:
    with tf.device("/gpu:0"):
        print("GPU: %s" % measure(tf.random_normal([1000, 1000])))

CPU: Took 0.1629934310913086 seconds to multiply a (1000, 1000) matrix by itself 10 times
GPU: Took 0.0 seconds to multiply a (1000, 1000) matrix by itself 10 times


In [14]:
x = tf.random_normal([10, 10])

x_gpu = x.gpu()
x_cpu = x.cpu()

_ = tf.matmul(x_cpu, x_cpu)  # Runs on CPU
_ = tf.matmul(x_gpu, x_gpu)  # Runs on GPU:0

In [15]:
def f(x):
    return tf.multiply(x, x)  # Or x * x


assert 9 == f(3.).numpy()

df = tfe.gradients_function(f)
assert 6 == df(3.)[0].numpy()

# Second order deriviative.
d2f = tfe.gradients_function(lambda x: df(x)[0])
assert 2 == d2f(3.)[0].numpy()

# Third order derivative: Will be None
d3f = tfe.gradients_function(lambda x: d2f(x)[0])
assert None == d3f(3.)[0]

In [16]:
def prediction(input, weight, bias):
    return input * weight + bias


# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 1000
training_inputs = tf.random_normal([NUM_EXAMPLES])
noise = tf.random_normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise

# A loss function: Mean-squared error


def loss(weight, bias):
    error = prediction(training_inputs, weight, bias) - training_outputs
    return tf.reduce_mean(tf.square(error))


# Function that returns the derivative of loss with respect to
# weight and bias
grad = tfe.gradients_function(loss)

# Train for 200 steps (starting from some random choice for W and B, on the same
# batch of data).
W = 5.
B = 10.
learning_rate = 0.01
print("Initial loss: %f" % loss(W, B).numpy())
for i in range(200):
    (dW, dB) = grad(W, B)
    W -= dW * learning_rate
    B -= dB * learning_rate
    if i % 20 == 0:
        print("Loss at step %d: %f" % (i, loss(W, B).numpy()))
print("Final loss: %f" % loss(W, B).numpy())
print("W, B = %f, %f" % (W.numpy(), B.numpy()))

Initial loss: 67.932968
Loss at step 0: 65.326385
Loss at step 20: 30.088427
Loss at step 40: 14.177371
Loss at step 60: 6.984028
Loss at step 80: 3.727696
Loss at step 100: 2.251603
Loss at step 120: 1.581550
Loss at step 140: 1.276950
Loss at step 160: 1.138275
Loss at step 180: 1.075044
Final loss: 1.047138
W, B = 3.023534, 2.147858


# Using Keras and the Layers API

In [18]:
class Model(object):
    def __init__(self):
        self.layer = tf.layers.Dense(1)

    def predict(self, inputs):
        return self.layer(inputs)

In [19]:
class MNISTModel(object):
    def __init__(self, data_format):
        # 'channels_first' is typically faster on GPUs
        # while 'channels_last' is typically faster on CPUs.
        # See: https://www.tensorflow.org/performance/performance_guide#data_formats
        if data_format == 'channels_first':
            self._input_shape = [-1, 1, 28, 28]
        else:
            self._input_shape = [-1, 28, 28, 1]
        self.conv1 = tf.layers.Conv2D(32, 5,
                                      padding='same',
                                      activation=tf.nn.relu,
                                      data_format=data_format)
        self.max_pool2d = tf.layers.MaxPooling2D(
            (2, 2), (2, 2), padding='same', data_format=data_format)
        self.conv2 = tf.layers.Conv2D(64, 5,
                                      padding='same',
                                      activation=tf.nn.relu,
                                      data_format=data_format)
        self.dense1 = tf.layers.Dense(1024, activation=tf.nn.relu)
        self.dropout = tf.layers.Dropout(0.5)
        self.dense2 = tf.layers.Dense(10)

    def predict(self, inputs):
        x = tf.reshape(inputs, self._input_shape)
        x = self.max_pool2d(self.conv1(x))
        x = self.max_pool2d(self.conv2(x))
        x = tf.layers.flatten(x)
        x = self.dropout(self.dense1(x))
        return self.dense2(x)


def loss(model, inputs, targets):
    return tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(
            logits=model.predict(inputs), labels=targets))


# Load the training and validation data
from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets("./mnist_data", one_hot=True)

# Train
device = "gpu:0" if tfe.num_gpus() else "cpu:0"
model = MNISTModel('channels_first' if tfe.num_gpus() else 'channels_last')
optimizer = tf.train.AdamOptimizer(learning_rate=1e-4)
grad = tfe.implicit_gradients(loss)
for i in range(20001):
    with tf.device(device):
        (inputs, targets) = data.train.next_batch(50)
        optimizer.apply_gradients(grad(model, inputs, targets))
        if i % 100 == 0:
            print("Step %d: Loss on training set : %f" %
                  (i, loss(model, inputs, targets).numpy()))
print("Loss on test set: %f" %
      loss(model, data.test.images, data.test.labels).numpy())

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ./mnist_data\train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ./mnist_data\train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting ./mnist_data\t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting ./mnist_data\t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/datas

NotFoundError: No registered 'LogSoftmax' OpKernel for GPU devices compatible with node LogSoftmax = LogSoftmax[T=DT_FLOAT](dummy_input)
	.  Registered:  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
 [Op:LogSoftmax]

# Checkpointing trained variables

TensorFlow Variables (`tfe.Variable`) provide a way to represent shared, persistent state of your model. The `tfe.Checkpoint` class provides a means to save and restore variables to and from checkpoints.

In [20]:
# Create variables.
x = tfe.Variable(10.)
y = tfe.Variable(5.)

# Indicate that the variables should be saved as "x" and "y".
checkpoint = tfe.Checkpoint(x=x, y=y)

# Assign new values to the variables and save.
x.assign(2.)
save_path = checkpoint.save('/tmp/ckpt')

# Change the variable after saving.
x.assign(11.)
assert 16. == (x + y).numpy()  # 11 + 5

# Restore the values in the checkpoint.
checkpoint.restore(save_path)  # save_path='/tmp/ckpt-1'

assert 7. == (x + y).numpy()  # 2 + 5