# tf.layer 的API使用

在前面的全部例子中使用的都是 TensorFlow 原生的 API。

好处就是很多东西都要自己写，自己实现，自己对底层的实现也会更加了解。比如写一个全链接层，你需要自己创建权值矩阵 Weight 和 偏置向量 biases，包括它们使用什么初始化方式你都自己写。这样子你对于每一层的参数的维度都会非常清楚。

缺点就是太麻烦了。

这里来探索一下 TensorFlow 中的高级 API， tf.layer 对应的接口。主要是一些常用层的用法：
- 全连接层
- 卷积层
- 转置卷积（反卷积）
- BN 层

相关的例子参考： https://github.com/aymericdamien/TensorFlow-Examples

In [1]:
import warnings
warnings.filterwarnings('ignore')  # 不打印 warning 

import tensorflow as tf

# 设置GPU按需增长
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

import numpy as np

## 1.各种层的用法

#### 各种初始化方式 参考： http://www.cnblogs.com/denny402/p/6932956.html

- 对于 biases 一般使用 常数初始化
- 对于 kernel 一般会使用高斯初始化或者 Xavier 初始化的方式

还有其他一些初始化的方式

In [13]:
# 常用的 kernel 和 biases 的各种初始化方式
constant_init = tf.constant_initializer(dtype=tf.float32, value=0.01)  # 初始化为常数，biases 常用
zeros_init = tf.zeros_initializer(dtype=tf.float32)
ones_init = tf.ones_initializer(dtype=tf.float32)

truncated_normal_init = tf.truncated_normal_initializer(dtype=tf.float32, mean=0, stddev=0.01) # 截断高斯初始化,通常设置 stddev 就够了

uniform_init = tf.random_uniform_initializer(minval=0.0, maxval=1.0, seed=None)  # 均匀初始化

xavier_init = tf.contrib.layers.xavier_initializer(uniform=True)  # xavier 初始化，若 uniform=False,使用 normal distributed random

#### 全连接层 tf.layers.dense
```python
Signature: tf.layers.dense(inputs, units, activation=None, use_bias=True, kernel_initializer=None, bias_initializer=<tensorflow.python.ops.init_ops.Zeros object at 0x7fad69bae908>, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, trainable=True, name=None, reuse=None)
Docstring:
Functional interface for the densely-connected layer.

主要参数
units: Integer or Long, dimensionality of the output space. 输出的维度
activation: Activation function (callable). Set it to None to maintain a
    linear activation. 
```

In [14]:
batch_size = 100
in_dim = 784
X_input = tf.Variable(tf.truncated_normal(dtype=tf.float32, shape=[batch_size, in_dim]))
print(X_input)
print('X_input.shape={}'.format(X_input.get_shape().as_list()))  # 获取 tensor 的维度

fc1 = tf.layers.dense(inputs=X_input, units=1024, activation=tf.nn.relu, kernel_initializer=xavier_init, bias_initializer=constant_init)
print(fc1)
print('fc1.shape={}'.format(fc1.get_shape().as_list()))

<tf.Variable 'Variable_3:0' shape=(100, 784) dtype=float32_ref>
X_input.shape=[100, 784]
Tensor("dense_1/Relu:0", shape=(100, 1024), dtype=float32)
fc1.shape=[100, 1024]


其实 tf.layers.dense 的接口中提供了非常丰富的参数设置，包括是否使用激活函数，使用什么激活函数，初始化方式，正则化方式等等。

这样子我们就没有必要再自己花费大量的功夫来写自己的全连接层了。

#### 卷积层 tf.layers.conv2d
```python
Signature: tf.layers.conv2d(inputs, filters, kernel_size, strides=(1, 1), padding='valid', data_format='channels_last', dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer=None, bias_initializer=<tensorflow.python.ops.init_ops.Zeros object at 0x7fad69bf86d8>, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, trainable=True, name=None, reuse=None)
Docstring:
Functional interface for the 2D convolution layer.

主要的参数：
Arguments: 
  inputs: Tensor input. 这里的 input 应该是一个 [batch_size, img_height, img_width, n_channel] 的向量
  filters: 这一层中卷积核（输出）的深度。Integer, the dimensionality of the output space 
  kernel_size: 卷积核的大小，如 3 或者 (3, 3)
  strides: 滑动步长，如（2,2）表示横向和纵向的滑动步长都是 2.
  padding: One of `"valid"` or `"same"` (case-insensitive).
```

In [29]:
batch_size = 100
img_height = img_width = 256
n_channel = 3
X_input = tf.Variable(tf.truncated_normal(shape=[batch_size, img_height, img_width, n_channel]))
print(X_input)
print('X_input.shape={}'.format(X_input.get_shape().as_list()))

# 卷积层的参数，这里写 3 个卷积层，方便后面转置卷积层
conv_parms = {
    'depth1': 64,       # kernel 深度
    'k_size1': (5, 5),  # 卷积核大小
    'stride1': (2, 2),  # 滑动步长
    'padding1': 'same', # padding 方式
    
    'depth2': 128,      
    'k_size2': (3, 3), 
    'stride2': (2, 2),  
    'padding2': 'same', 
    
    'depth3': 196,      
    'k_size3': (3, 3), 
    'stride3': (2, 2),  
    'padding3': 'same'
}

conv1 = tf.layers.conv2d(X_input, filters=conv_parms['depth1'], kernel_size=conv_parms['k_size1'], 
                         strides=conv_parms['stride1'], padding=conv_parms['padding1'])
print('conv1:', conv1)
conv2 = tf.layers.conv2d(conv1, filters=conv_parms['depth2'], kernel_size=conv_parms['k_size2'], 
                         strides=conv_parms['stride2'], padding=conv_parms['padding2'])
print('conv2:', conv2)
conv3 = tf.layers.conv2d(conv2, filters=conv_parms['depth3'], kernel_size=conv_parms['k_size3'], 
                         strides=conv_parms['stride3'], padding=conv_parms['padding3'])
print('conv3:', conv3)

<tf.Variable 'Variable_10:0' shape=(100, 256, 256, 3) dtype=float32_ref>
X_input.shape=[100, 256, 256, 3]
conv1: Tensor("conv2d_8/BiasAdd:0", shape=(100, 128, 128, 64), dtype=float32)
conv2: Tensor("conv2d_9/BiasAdd:0", shape=(100, 64, 64, 128), dtype=float32)
conv3: Tensor("conv2d_10/BiasAdd:0", shape=(100, 32, 32, 196), dtype=float32)


#### 转置卷积 

```python
Signature: tf.layers.conv2d_transpose(inputs, filters, kernel_size, strides=(1, 1), padding='valid', data_format='channels_last', activation=None, use_bias=True, kernel_initializer=None, bias_initializer=<tensorflow.python.ops.init_ops.Zeros object at 0x7fad69bf8c88>, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, trainable=True, name=None, reuse=None)
Docstring:
Functional interface for transposed 2D convolution layer.

主要参数：和 conv2 的参数基本上是一样的，通过下面的例子就很好理解了。
Arguments:
  inputs: Input tensor.和 conv2d 一样，输入也是一个 [batch_size, img_height, img_width, n_channel] 的向量
  filters: 转置卷积输出的卷积核深度。
  kernel_size: 卷积核的大小。
  strides: 滑动步长。
  padding: one of `"valid"` or `"same"` (case-insensitive).
```

In [30]:
""" 把上面卷积部分的输出作为 transpoed convolution layer 的输入,
下面通过转置卷积来还原原来的图片大小。在 DCGAN 的 generator 就是这样的一个结构。
上面我们一共用了 3 层 conv2d，对应的这里我们应该使用三次反卷积
"""
tr_conv_input = conv3   
print('tr_conv_input:', tr_conv_input)
tr_conv3 = tf.layers.conv2d_transpose(tr_conv_input, filters=conv_parms['depth2'], kernel_size=conv_parms['k_size3'], 
                         strides=conv_parms['stride3'], padding=conv_parms['padding3'])
print('tr_conv3:', tr_conv3)  # 这里得到的维度和 conv3 的输入应该一样，也就是和 conv2 的输出应该一样
tr_conv2 = tf.layers.conv2d_transpose(tr_conv3, filters=conv_parms['depth1'], kernel_size=conv_parms['k_size2'], 
                         strides=conv_parms['stride2'], padding=conv_parms['padding2'])
print('tr_conv2:', tr_conv2) 
tr_conv1 = tf.layers.conv2d_transpose(tr_conv2, filters=n_channel, kernel_size=conv_parms['k_size1'], 
                         strides=conv_parms['stride1'], padding=conv_parms['padding1'])
print('tr_conv1', tr_conv1)  


tr_conv_input: Tensor("conv2d_10/BiasAdd:0", shape=(100, 32, 32, 196), dtype=float32)
tr_conv3: Tensor("conv2d_transpose_4/BiasAdd:0", shape=(100, 64, 64, 128), dtype=float32)
tr_conv2: Tensor("conv2d_transpose_5/BiasAdd:0", shape=(100, 128, 128, 64), dtype=float32)
tr_conv1 Tensor("conv2d_transpose_6/BiasAdd:0", shape=(100, 256, 256, 3), dtype=float32)


上面的结果看起来很好理解吧，至于 conv2d_trasposed 最好使用原生的 API 写一下，这时候你才能比较好的理解参数的维度。可以参考：https://github.com/carpedm20/DCGAN-tensorflow/blob/master/ops.py

但是实际上还是有些问题需要注意的，我们知道在 conv 中，当使用 padding 方式为 'same' 的时候，计算输出维度的时候我们有个上取整的操作。这也就是说不通的输入维度可能会有一样的输出维度，这个时候想要一模一样的回复到原来的维度就不行了。所以在设置图像大小的时候，每次 conv 输出的维度和下一层的 stride 都应该是整除的。

比如下面就是一个恢复不了原图片大小的例子。

In [33]:
batch_size = 100
img_height = img_width = 256
n_channel = 3
X_input = tf.Variable(tf.truncated_normal(shape=[batch_size, img_height, img_width, n_channel]))
print(X_input)
print('X_input.shape={}'.format(X_input.get_shape().as_list()))

conv_parms = {
    'depth1': 64,       
    'k_size1': (5, 5),  
    'stride1': (3, 3),  # 滑动步长     ### 我只改了这里 ###
    'padding1': 'same', 
    
    'depth2': 128,      
    'k_size2': (3, 3), 
    'stride2': (2, 2),  
    'padding2': 'same', 
    
    'depth3': 196,      
    'k_size3': (3, 3), 
    'stride3': (2, 2),  
    'padding3': 'same'
}

conv1 = tf.layers.conv2d(X_input, filters=conv_parms['depth1'], kernel_size=conv_parms['k_size1'], 
                         strides=conv_parms['stride1'], padding=conv_parms['padding1'])
print('conv1:', conv1)
conv2 = tf.layers.conv2d(conv1, filters=conv_parms['depth2'], kernel_size=conv_parms['k_size2'], 
                         strides=conv_parms['stride2'], padding=conv_parms['padding2'])
print('conv2:', conv2)
conv3 = tf.layers.conv2d(conv2, filters=conv_parms['depth3'], kernel_size=conv_parms['k_size3'], 
                         strides=conv_parms['stride3'], padding=conv_parms['padding3'])
print('conv3:', conv3)


tr_conv_input = conv3   
print('tr_conv_input:', tr_conv_input)
tr_conv3 = tf.layers.conv2d_transpose(tr_conv_input, filters=conv_parms['depth2'], kernel_size=conv_parms['k_size3'], 
                         strides=conv_parms['stride3'], padding=conv_parms['padding3'])
print('tr_conv3:', tr_conv3)  
tr_conv2 = tf.layers.conv2d_transpose(tr_conv3, filters=conv_parms['depth1'], kernel_size=conv_parms['k_size2'], 
                         strides=conv_parms['stride2'], padding=conv_parms['padding2'])
print('tr_conv2:', tr_conv2) 
tr_conv1 = tf.layers.conv2d_transpose(tr_conv2, filters=n_channel, kernel_size=conv_parms['k_size1'], 
                         strides=conv_parms['stride1'], padding=conv_parms['padding1'])
print('tr_conv1', tr_conv1)  

<tf.Variable 'Variable_13:0' shape=(100, 256, 256, 3) dtype=float32_ref>
X_input.shape=[100, 256, 256, 3]
conv1: Tensor("conv2d_17/BiasAdd:0", shape=(100, 86, 86, 64), dtype=float32)
conv2: Tensor("conv2d_18/BiasAdd:0", shape=(100, 43, 43, 128), dtype=float32)
conv3: Tensor("conv2d_19/BiasAdd:0", shape=(100, 22, 22, 196), dtype=float32)
tr_conv_input: Tensor("conv2d_19/BiasAdd:0", shape=(100, 22, 22, 196), dtype=float32)
tr_conv3: Tensor("conv2d_transpose_13/BiasAdd:0", shape=(100, 44, 44, 128), dtype=float32)
tr_conv2: Tensor("conv2d_transpose_14/BiasAdd:0", shape=(100, 88, 88, 64), dtype=float32)
tr_conv1 Tensor("conv2d_transpose_15/BiasAdd:0", shape=(100, 264, 264, 3), dtype=float32)


#### tf.layers.dropout
```python
ignature: tf.layers.dropout(inputs, rate=0.5, noise_shape=None, seed=None, training=False, name=None)
Docstring:
Applies Dropout to the input.

需要注意这里的 rate 是丢弃的 rate，也就是 rate 应该等于 1.0 - keep_prob
Arguments:
  inputs: Tensor input.
  rate: The dropout rate, between 0 and 1. E.g. "rate=0.1" would drop out 10% of input units.
```

In [41]:
tr_conv1_drop = tf.layers.dropout(tr_conv1, rate=0.2)
print('tr_conv1_drop:', tr_conv1_drop)

tr_conv1_drop: Tensor("dropout_4/Identity:0", shape=(100, 264, 264, 3), dtype=float32)


#### tf.layers.batch_normalization

关于 batch_normalization 经常会出问题，主要是训练的时候用的是 mini-batch 的均值和方差，而在测试的时候使用的指数平均的均值方差。

理解可以参考： [tensorflow中batch normalization的用法](https://www.cnblogs.com/hrlnw/p/7227447.html)


```python
Signature: tf.layers.batch_normalization(inputs, axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer=<tensorflow.python.ops.init_ops.Zeros object at 0x7fad69bce710>, gamma_initializer=<tensorflow.python.ops.init_ops.Ones object at 0x7fad69bce748>, moving_mean_initializer=<tensorflow.python.ops.init_ops.Zeros object at 0x7fad69bce780>, moving_variance_initializer=<tensorflow.python.ops.init_ops.Ones object at 0x7fad69bce7b8>, beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, training=False, trainable=True, name=None, reuse=None, renorm=False, renorm_clipping=None, renorm_momentum=0.99, fused=None, virtual_batch_size=None, adjustment=None)
Docstring:
Functional interface for the batch normalization layer.

Reference: http://arxiv.org/abs/1502.03167

"Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift"

Sergey Ioffe, Christian Szegedy

Note: when training, the moving_mean and moving_variance need to be updated.
By default the update ops are placed in `tf.GraphKeys.UPDATE_OPS`, so they
need to be added as a dependency to the `train_op`. For example:
```
```python
  update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
  with tf.control_dependencies(update_ops):  # 这句话的意思是当运行下面的内容(train_op) 时，一定先执行 update_ops 的所有操作
    train_op = optimizer.minimize(loss)
```

## 2. 举个简单的 MNIST 分类的例子
参考：https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py

In [14]:
""" Convolutional Neural Network.
Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
"""
from __future__ import division, print_function, absolute_import

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("../data/MNIST_data/", one_hot=False)

import tensorflow as tf

# Training Parameters
learning_rate = 0.001
num_steps = 2000
batch_size = 128

# Network Parameters
num_input = 784 # MNIST data input (img shape: 28*28)
num_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.25 # Dropout, probability to drop a unit


# Create the neural network
def conv_net(x_dict, n_classes, dropout, reuse, is_training):
    # Define a scope for reusing the variables
    with tf.variable_scope('ConvNet', reuse=reuse):
        # TF Estimator input is a dict, in case of multiple inputs
        x = x_dict['images']

        # MNIST data input is a 1-D vector of 784 features (28*28 pixels)
        # Reshape to match picture format [Height x Width x Channel]
        # Tensor input become 4-D: [Batch Size, Height, Width, Channel]
        x = tf.reshape(x, shape=[-1, 28, 28, 1])

        # Convolution Layer with 32 filters and a kernel size of 5
        conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
        # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
        conv1 = tf.layers.max_pooling2d(conv1, 2, 2)

        # Convolution Layer with 64 filters and a kernel size of 3
        conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
        # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
        conv2 = tf.layers.max_pooling2d(conv2, 2, 2)

        # Flatten the data to a 1-D vector for the fully connected layer
        fc1 = tf.contrib.layers.flatten(conv2)

        # Fully connected layer (in tf contrib folder for now)
        fc1 = tf.layers.dense(fc1, 1024)
        # Apply Dropout (if is_training is False, dropout is not applied)
        fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)

        # Output layer, class prediction
        out = tf.layers.dense(fc1, n_classes)

    return out


# Define the model function (following TF Estimator Template)
def model_fn(features, labels, mode):
    # Build the neural network
    # Because Dropout have different behavior at training and prediction time, we
    # need to create 2 distinct computation graphs that still share the same weights.
    logits_train = conv_net(features, num_classes, dropout, reuse=False,
                            is_training=True)
    logits_test = conv_net(features, num_classes, dropout, reuse=True,
                           is_training=False)

    # Predictions
    pred_classes = tf.argmax(logits_test, axis=1)
    pred_probas = tf.nn.softmax(logits_test)

    # If prediction mode, early return
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode, predictions=pred_classes)

        # Define loss and optimizer
    loss_op = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
        logits=logits_train, labels=tf.cast(labels, dtype=tf.int32)))
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
    train_op = optimizer.minimize(loss_op,
                                  global_step=tf.train.get_global_step())

    # Evaluate the accuracy of the model
    acc_op = tf.metrics.accuracy(labels=labels, predictions=pred_classes)

    # TF Estimators requires to return a EstimatorSpec, that specify
    # the different ops for training, evaluating, ...
    estim_specs = tf.estimator.EstimatorSpec(
        mode=mode,
        predictions=pred_classes,
        loss=loss_op,
        train_op=train_op,
        eval_metric_ops={'accuracy': acc_op})

    return estim_specs


# Build the Estimator
model = tf.estimator.Estimator(model_fn)

# Define the input function for training
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': mnist.train.images}, y=mnist.train.labels,
    batch_size=batch_size, num_epochs=None, shuffle=True)
# Train the Model
model.train(input_fn, steps=num_steps)

# Evaluate the Model
# Define the input function for evaluating
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': mnist.test.images}, y=mnist.test.labels,
    batch_size=batch_size, shuffle=False)
# Use the Estimator 'evaluate' method
e = model.evaluate(input_fn)

print("Testing Accuracy:", e['accuracy'])

Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_global_id_in_cluster': 0, '_task_id': 0, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_save_summary_steps': 100, '_session_config': None, '_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_is_chief': True, '_num_worker_replicas': 1, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9ae7a24668>, '_master': '', '_evaluation_master': '', '_tf_random_seed':