本笔记参考了下面的书籍、文献、博客或者官方说明：
* TensorFlow2官方文档：https://tensorflow.google.cn/
* 简单粗暴TensorFlow 2：https://github.com/snowkylin/tensorflow-handbook
* TensorFlow 2.0 学习笔记：https://zhuanlan.zhihu.com/p/74441082

未注明出处的代码示例，`大概`就是我自己编的，`大概`的意思就是也有极小的概率是忘记注明了。。。

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_datasets  as tfds
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras import layers
from tensorflow.keras import preprocessing as prep
from matplotlib import pyplot as plt

# tf.TensorArray

在部分网络结构，尤其是涉及到时间序列的结构中，我们可能需要将一系列张量以数组的方式依次存放起来，以供进一步处理。当然，在即时执行模式下，你可以直接使用一个 Python 列表（List）存放数组。不过，如果你需要基于计算图的特性（例如使用 @tf.function 加速模型运行或者使用 SavedModel 导出模型），就无法使用这种方式了。因此，TensorFlow 提供了 tf.TensorArray ，一种支持计算图特性的 TensorFlow 动态数组。

* `arr = tf.TensorArray(dtype, size, dynamic_size=False) `：声明一个大小为 size ，类型为 dtype 的 TensorArray arr 。如果将 dynamic_size 参数设置为 True ，则该数组会自动增长空间。
* `write(index, value)` ：将 value 写入数组的第 index 个位置；
* `read(index)` ：读取数组的第 index 个值；
* 请注意，由于需要支持计算图， tf.TensorArray 的 write() 方法是不可以忽略左值的！也就是说，在图执行模式下，必须按照以下的形式写入数组：
`arr = arr.write(index, value)`

In [2]:
x = tf.TensorArray(dtype=tf.float32, size=3, infer_shape=False, clear_after_read=False)
a = tf.random.normal([3,2,2])
x.unstack(a)

<tensorflow.python.ops.tensor_array_ops.TensorArray at 0x14be1ebe0>

In [3]:
x.read(0)

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[ 1.866212  , -2.7629073 ],
       [ 0.07647694,  0.12429674]], dtype=float32)>

In [4]:
x.stack().shape

TensorShape([3, 2, 2])

In [5]:
x.gather([1,2])

<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[ 0.20986849,  1.0732858 ],
        [ 0.16812925, -0.8615574 ]],

       [[ 0.5049179 ,  0.75962025],
        [ 0.34235328,  2.0449383 ]]], dtype=float32)>

In [6]:
y = x.scatter([2,1,0], a)  # infer_shape=True时不能用
y.read(0)

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[0.5049179 , 0.75962025],
       [0.34235328, 2.0449383 ]], dtype=float32)>

In [7]:
a = tf.random.normal([5,6])
x.split(a, [1,2,2]) # 长度分别是1，2，2

<tensorflow.python.ops.tensor_array_ops.TensorArray at 0x14be1ebe0>

In [8]:
x.read(0)

<tf.Tensor: shape=(1, 6), dtype=float32, numpy=
array([[ 1.1342567 ,  1.2032253 , -0.5454625 ,  0.6039174 , -0.53120303,
        -0.81809044]], dtype=float32)>

In [9]:
x.read(0)

<tf.Tensor: shape=(1, 6), dtype=float32, numpy=
array([[ 1.1342567 ,  1.2032253 , -0.5454625 ,  0.6039174 , -0.53120303,
        -0.81809044]], dtype=float32)>

In [10]:
x.read(1)

<tf.Tensor: shape=(2, 6), dtype=float32, numpy=
array([[-0.37452495,  1.5670388 ,  0.14918514, -0.13703519, -0.81656194,
         0.03038387],
       [-1.3384736 ,  0.6575544 , -0.9972656 ,  0.11442024,  0.08054624,
         0.03166075]], dtype=float32)>

In [11]:
y = tf.TensorArray(tf.float32, 3, dynamic_size=True, clear_after_read=True)

In [12]:
y.unstack(tf.random.normal([4, 2,3]))

<tensorflow.python.ops.tensor_array_ops.TensorArray at 0x14be477b8>

In [13]:
y.read(0)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 1.7052294 ,  1.0569074 , -0.20574473],
       [ 1.0249538 , -1.5954558 , -0.6838625 ]], dtype=float32)>

In [14]:
y.read(1)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 1.8342725 , -0.18422723, -0.35455483],
       [ 0.19807559, -1.262542  ,  0.38357162]], dtype=float32)>

In [15]:
# y.read(1)  # 由于clear_after_read=True，第二次read(1)时会异常

# tf.save_model

需要注意的是，因为 SavedModel 基于计算图，所以对于使用继承 tf.keras.Model 类建立的 Keras 模型，其需要导出到 SavedModel 格式的方法（比如 call ）都需要使用 @tf.function 修饰

使用`tf.function`一章中的`MyModule`类的实例`m`展示

In [16]:
class MyModule(tf.Module):
    def __init__(self, name, units=10):
        super(MyModule, self).__init__(name=name)
        self.w = None
        self.b = None
        self.units = units
        self.built = False  # tf.keras.layers.Layer会设置此属性，并且子类会继承，用于指示是否建立权重
    @tf.Module.with_name_scope
    def build(self, input_shape):
        if self.w is None:
            self.w = tf.Variable(tf.random.normal([input_shape[-1], self.units]))
        if self.b is None:
            self.b = tf.Variable(tf.random.normal([self.units, ]))
        self.built = True  # 设置为True
    def call(self, input):
        return tf.matmul(input, self.w) + self.b
    @tf.function
    def __call__(self, input):
        if not self.built:  # 第一次调用时built=False，调用build方法，建立权重
          self.build(input.shape)
        return self.call(input)

In [17]:
m = MyModule('testModule')
input = tf.random.normal([5,3])
m(input).shape

TensorShape([5, 10])

In [18]:
tf.saved_model.save(m, "data/modelDir")
tf.saved_model.load("data/modelDir")

INFO:tensorflow:Assets written to: data/modelDir/assets


<tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject at 0x14bfae860>

# tf.random

In [36]:
# 参数可以是log后的概率值，这个写法会以0.1，0.3，0.6的概率从多项式分布中随机抽取
x = tf.random.categorical(tf.math.log([[0.1,0.3,0.6]]), 1000)  

In [37]:
tf.reduce_sum(tf.cast(x==0, tf.int64))

<tf.Tensor: shape=(), dtype=int64, numpy=111>

In [44]:
# 概率分别是[1., 2., 6.]/tf.reduce_sum([1., 2. 6.])
x = tf.random.categorical(tf.math.log([[6., 1. , 1.], [1,1,10]]), 10) 
# 换一种说法就是，按照参数tf.exp变换后所占的比例作为随机抽样的概率，相当于对参数做了softmax

In [45]:
x

<tf.Tensor: shape=(2, 10), dtype=int64, numpy=
array([[0, 0, 0, 0, 2, 0, 1, 0, 1, 0],
       [2, 2, 0, 2, 2, 1, 2, 2, 2, 2]])>

# tf.train

#### tf.train.Checkpoint

Checkpoint只保存模型的参数，不保存模型的计算过程，因此一般用于在具有模型源码的时候恢复之前训练好的模型参数。
```python3
checkpoint = tf.train.Checkpoint(model=model)
checkpoint.save(save_path_with_prefix)
```
* 这里tf.train.Checkpoint接受的参数比较特殊，是一个\*\*kwargs。具体而言，是一系列键值对，键名可以随便起，值为需要保存的对象。
* `save_path_with_prefix`是保存文件的目录+前缀。例如在`checkpoint.save("./save/model.ckpt")`，在save目录下会建立三个文件：`checkpoint, model.ckpt-1.index, model.ckpt-1.data-00000-of-00001`，这些文件记录了变量信息。`checkpoint.save`可以运行多次，每次运行都会得到一个`.index`文件和`.data`文件，序号一次累加。

继续训练模型可以用一下方式实现：
```
checkpoint = tf.train.Checkpoint(myAwesomeModel=model, myAwesomeOptimizer=optimizer)
checkpoint.save(save_path_with_prefix)
model_to_be_restored = MyModel() 
checkpoint = tf.train.Checkpoint(myAwesomeModel=model_to_be_restored)
checkpoint.restore(save_path_with_prefix_and_index)
```
* `save_path_with_prefix_and_index`是之前保存到文件的目录+前缀+编号。例如，调用`checkpoint.restore("./save/model.ckpt-1")`，序号为1的文件来恢复模型。

```
tf.train.latest_checkpoint(save_path)
```
* 返回最近一次的checkpoint的文件名，比如返回`./save/model.ckpt-10`

In [46]:
tf.train.latest_checkpoint('data')

'data/ckpt.save.test-1'

#### tf.train.Feature

In [50]:
value_1 = tf.constant('aaaa')
#x0 = tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) 
# BytesList won't unpack a string from an EagerTensor.
if isinstance(value_1, type(tf.constant(0))):
    value_1 = value_1.numpy() # BytesList won't unpack a string from an EagerTensor.
a = tf.train.BytesList(value=[value_1, b'cccc'])  # 只接受bytes对象
a

value: "aaaa"
value: "cccc"

In [51]:
x = tf.constant(['aaaa', 'cccc'])
x = x.numpy() 
x0 = tf.train.Feature(bytes_list=tf.train.BytesList(value=[b'aaaa', b'cccc']))
x0

bytes_list {
  value: "aaaa"
  value: "cccc"
}

In [52]:
value_2 = tf.constant([4., 5.])
x1 = tf.train.Feature(float_list=tf.train.FloatList(value=value_2))
value_3 = tf.constant([2, 3])
x2 = tf.train.Feature(int64_list=tf.train.Int64List(value=value_3))
feature = {'x1':x1, 'x2':x2, 'x0':x0}

In [53]:
tf.train.Features(feature=feature)

feature {
  key: "x0"
  value {
    bytes_list {
      value: "aaaa"
      value: "cccc"
    }
  }
}
feature {
  key: "x1"
  value {
    float_list {
      value: 4.0
      value: 5.0
    }
  }
}
feature {
  key: "x2"
  value {
    int64_list {
      value: 2
      value: 3
    }
  }
}

In [54]:
tf.train.Example(features=tf.train.Features(feature=feature))

features {
  feature {
    key: "x0"
    value {
      bytes_list {
        value: "aaaa"
        value: "cccc"
      }
    }
  }
  feature {
    key: "x1"
    value {
      float_list {
        value: 4.0
        value: 5.0
      }
    }
  }
  feature {
    key: "x2"
    value {
      int64_list {
        value: 2
        value: 3
      }
    }
  }
}

# tf.initializer

如果深度学习模型的权重初始化得太小，那信号将在每层间传递时逐渐缩小而难以产生作用；如果权重初始化的太大，那信号将在每层间传递时逐渐放大并导致发散和失效。  
Xavier初始化器让初始化权重满足均值为0，方差为$\frac{2}{N_{in}+N_{out}}$均匀分布或者高斯分布；

* `tf.initializers.glorot_normal()(shape=[20,30])`：创建 $N_{in}=20,N_{out}=30$ 服从正态分布的的初始化权重；
* `tf.initializers.glorot_uniform()(shape=[20,30]`：与上面相同，只是服从的是均匀分布。

也可以通过下面的api间接实现：  
* `tf.random_normal_initializer(mean=0.0,stddev=0.05)(shape=[])`
* `tf.random_uniform_initializer(minval=-0.05, maxval=0.05)(shape=[])`

# tf.linalg

```
matrix_band_part(input, num_lower, num_upper)
```
* num_lower: 下三角要保留的对角线数，-1表示全保留；num_upper类似

In [6]:
x = tf.random.normal([4,4])
x

<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[ 0.529435  ,  0.8286487 ,  1.837228  ,  0.09473267],
       [ 0.36858612,  1.0950916 ,  0.62635964,  1.1934665 ],
       [-0.7357971 ,  0.18043841, -0.19846489, -0.9738333 ],
       [-1.5683837 ,  2.8496115 ,  0.01234788, -0.76184636]],
      dtype=float32)>

In [9]:
tf.linalg.band_part(x, 0, -1)  # 变成下三角矩阵

<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[ 0.529435  ,  0.8286487 ,  1.837228  ,  0.09473267],
       [ 0.        ,  1.0950916 ,  0.62635964,  1.1934665 ],
       [ 0.        ,  0.        , -0.19846489, -0.9738333 ],
       [ 0.        ,  0.        ,  0.        , -0.76184636]],
      dtype=float32)>

# tf.math

In [55]:
tf.math.reduce_std # 标准差
tf.math.reduce_variance # 方差
tf.math.reduce_all
tf.math.reduce_any
tf.math.reduce_logsumexp # 相当于 tf.math.log(tf.reduce_sum(tf.exp(x)))
tf.math.argmin
tf.math.argmax

<function tensorflow.python.ops.math_ops.argmax_v2(input, axis=None, output_type=tf.int64, name=None)>

`tf.matmul(a, b)`  # 将最后两个维度用与矩阵乘法，前面的维度必须完全相同。

# tf.GradientTape

在tf.GradientTape上下文中执行的所有操作记录下来，用于计算梯度。默认情况下，tf.GradientTape持有的资源会在调用GradientTape.gradient()方法后立即释放。要在同一计算中计算多个梯度，需要创建一个持久梯度带，这允许多次调用gradient()方法。

In [56]:
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as t:
  t.watch(x)  # 由于x是常数，所以要调用调用watch方法，如果是Variable则不需要这一行
  y = x * x
  z = y * y
dz_dx = t.gradient(z, x)  # 108.0 (4*x^3 at x = 3)
dy_dx = t.gradient(y, x)  # 6.0
del t  # Drop the reference to the tape
dz_dx

<tf.Tensor: shape=(), dtype=float32, numpy=108.0>

在上下文中的梯度计算也会被记录下来，因此可以实现高阶梯度计算。

In [57]:
x = tf.Variable(1.0)
with tf.GradientTape() as t:
    with tf.GradientTape() as t2:
        y = x * x * x
    dy_dx = t2.gradient(y,x)
d2y_dx2 = t.gradient(dy_dx, x)

In [58]:
assert dy_dx.numpy() == 3.0
assert d2y_dx2.numpy() == 6.0

In [65]:
x = tf.Variable(3.)
with tf.GradientTape() as tape:
    x = x * x * 8.
    y = x * x
dydx = tape.gradient(y,x)

In [66]:
dydx

<tf.Tensor: shape=(), dtype=float32, numpy=144.0>

# tf.losses

```
tf.losses.sparse_categorical_crossentropy(y_true, y_pred, from_logits=False)
tf.losses.SparseCategoricalCrossentropy(from_logits=False, reduction='auto')
```
* 其中y_true是稀疏矩阵，直接的label，而不是one-hot向量
* from_logits=False时，y_pred是tf.nn.softmax输出结果，也就是每一个元素都是概率，每一行之和为1
* from_logits=True时，y_pred是上一层的输出结果，也就是说softmax(y_pred)运算在此函数内执行
* reduction='auto', 'sum_over_batch_size', 'sum'


```
tf.losses.categorical_crossentropy
tf.lossed.CategoricalCrossentropy
```
* y_true是one-hot向量

两分类的交叉熵计算，可以看到下面三种计算方式结果一致：

In [85]:
y_true = [[0, 1, 0], [0, 0, 1]]
y_pred = [[0.05, 0.94, 0.01], [0.1, 0.8, 0.1]]

In [86]:
- tf.reduce_sum(tf.multiply(tf.cast(y_true, tf.float32), tf.math.log(y_pred)), axis=1)

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.06187541, 2.3025851 ], dtype=float32)>

In [87]:
tf.losses.categorical_crossentropy(y_true, y_pred)

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.06187541, 2.3025851 ], dtype=float32)>

In [100]:
losser = tf.losses.CategoricalCrossentropy(from_logits=False, reduction='none')
losser(y_true, y_pred)

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.06187541, 2.3025851 ], dtype=float32)>

In [89]:
tf.losses.binary_crossentropy([1,0,1], [0.9,0.3,0.7])

<tf.Tensor: shape=(), dtype=float32, numpy=0.27290332>

In [90]:
tf.reduce_mean(tf.losses.binary_crossentropy([[1],[0],[1]], [[0.9],[0.3],[0.7]]))

<tf.Tensor: shape=(), dtype=float32, numpy=0.27290332>

In [91]:
a = tf.multiply(tf.subtract(1.,[1,0,1.]), tf.math.log(tf.subtract(1.,[0.9,0.3,0.7])))
b = tf.multiply([1,0,1.], tf.math.log([0.9,0.3,0.7]))
-tf.reduce_mean(a+b)

<tf.Tensor: shape=(), dtype=float32, numpy=0.27290347>

# tf.metrics

tf.metrics.categorical_accuracy(y_true, y_pred)
* y_true是one-hot向量；y_pred是softmax输出

tf.metrics.sparse_categorical_accuracy(y_true, y_pred)
* y_true 是稀疏Tensor，y_pred是softmax输出的概率， 或者是logits也可以

In [52]:
a = tf.constant([[1.], [1], [0], [0]])
b = tf.constant([[0.98], [1], [0], [0.55]])

In [54]:
tf.metrics.BinaryAccuracy(threshold=0.5)(a, b)

<tf.Tensor: shape=(), dtype=float32, numpy=0.75>

In [57]:
m = tf.metrics.BinaryAccuracy(threshold=0.5)
m(a, b)

<tf.Tensor: shape=(), dtype=float32, numpy=0.75>

In [58]:
m.result(), m.total

(<tf.Tensor: shape=(), dtype=float32, numpy=0.75>,
 <tf.Variable 'total:0' shape=() dtype=float32, numpy=3.0>)

In [59]:
m.update_state([[1], [1], [0], [0]], [[0.98], [1], [0], [0.6]])

<tf.Variable 'UnreadVariable' shape=() dtype=float32, numpy=8.0>

In [60]:
m.result(), m.total

(<tf.Tensor: shape=(), dtype=float32, numpy=0.75>,
 <tf.Variable 'total:0' shape=() dtype=float32, numpy=6.0>)

In [69]:
a = tf.constant([1., 1, 0, 0])
b = tf.constant([0.98, 1, 0, 0.55])

In [70]:
tf.metrics.binary_accuracy(a, tf.where(b>0.5, 1., 0))

<tf.Tensor: shape=(), dtype=float32, numpy=0.75>

In [71]:
tf.metrics.binary_accuracy(a, b, 0.5)

<tf.Tensor: shape=(), dtype=float32, numpy=0.75>

# tf.optimizer

```python
optimizer = tf.keras.optimizers.Adam()
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradient(zip(grads, model.trainable_variables))
# .apply_gradient(grads_and_vars)
# grads_and_vars: List of (gradient, variable) pairs
```

## tf.nn

#### 激活函数

In [105]:
tf.nn.relu

<function tensorflow.python.ops.gen_nn_ops.relu(features, name=None)>

In [106]:
tf.nn.tanh

<function tensorflow.python.ops.gen_math_ops.tanh(x, name=None)>

In [107]:
tf.nn.sigmoid

<function tensorflow.python.ops.math_ops.sigmoid(x, name=None)>

#### tf.nn.top_k

In [108]:
a = tf.random.normal([6,3])
b = tf.constant([2,1,1,0,0,1])

In [111]:
a

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[-1.7133043 ,  0.3444689 ,  0.5035624 ],
       [ 0.28734338, -0.6957891 , -0.7670209 ],
       [-1.2277938 , -1.5605494 , -0.09386743],
       [ 0.6773767 , -0.6124227 , -1.237662  ],
       [ 0.07477915,  0.94955546, -0.97859645],
       [ 1.6891416 ,  1.896936  , -0.43835676]], dtype=float32)>

In [113]:
tf.nn.top_k(a,  2) # 最后一个轴中最大的前两个值以及它们的下标

TopKV2(values=<tf.Tensor: shape=(6, 2), dtype=float32, numpy=
array([[ 0.5035624 ,  0.3444689 ],
       [ 0.28734338, -0.6957891 ],
       [-0.09386743, -1.2277938 ],
       [ 0.6773767 , -0.6124227 ],
       [ 0.94955546,  0.07477915],
       [ 1.896936  ,  1.6891416 ]], dtype=float32)>, indices=<tf.Tensor: shape=(6, 2), dtype=int32, numpy=
array([[2, 1],
       [0, 1],
       [2, 0],
       [0, 1],
       [1, 0],
       [1, 0]], dtype=int32)>)

In [114]:
b

<tf.Tensor: shape=(6,), dtype=int32, numpy=array([2, 1, 1, 0, 0, 1], dtype=int32)>

In [115]:
tf.nn.in_top_k(b, a,  2)

<tf.Tensor: shape=(6,), dtype=bool, numpy=array([ True,  True, False,  True,  True,  True])>

#### tf.nn.moment

```python
tf.nn.moments(x, axes, keep_dims=False)
# 若axes=[0,1,2]，则沿着[0,1,2]轴计算mean和variance
# keep_dims 返回的结果是否保持原来的维度
```

In [146]:
x = tf.random.normal([128, 32, 32, 64])
m, v = tf.nn.moments(x, [0,1,2], keepdims=True)
assert m.shape == [1,1,1,64]

In [147]:
# 相当于
m2 = tf.reduce_mean(x, axis=[0,1,2], keepdims=True)

In [148]:
tf.math.reduce_all(m == m2).numpy()

True

#### tf.nn.batch_normalization

```pthon
tf.nn.batch_normalization(x, mean, variance, offset, scale, variance_epsilion)
```
* mean, variance可以是`tf.nn.moments`的输出结果
* variance_epsilon一个接近0的值，防止0出现
* 计算公式：
```
tmp = (x-mean)/tf.sqrt(variance + variance_epsilon)
return tmp * scale + offset
```

In [153]:
#tf.nn.batch_normalization(x, m, v, 0, 1, 1e-10)

#### tf.keras.layers.BatchNormalization

```python
tf.keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones')
```
* axis：要标准化的特征轴
* momentum：移动平均系数，一般公式中的 $\alpha$ 为1-momentum
* epsilon：同上面的variance_epsilon
* scale：是否乘上scale
* center：是否加上offset
* gamma：同上面scale
* beta：同上面offset

设当前层norm的状态为：移动平均值$\mu$，标准差$\sigma$；当前mini-batch样本的均值和标准差分别为：$\mu_i,\sigma_i$；  系数为$\alpha$，也就是1 - norm.momentum；则
$$
\mu = (1-\alpha)\mu + \alpha \mu_i \\
\sigma = (1-\alpha)\sigma + \alpha\sigma_i
$$

In [126]:
tmp = tf.random.normal([3, 4, 5])
norm = layers.BatchNormalization()
norm(tmp, training=True)
norm.moving_mean

<tf.Variable 'batch_normalization_1/moving_mean:0' shape=(5,) dtype=float32, numpy=
array([-0.00110975,  0.00036719,  0.0031628 , -0.00068342, -0.00579923],
      dtype=float32)>

In [127]:
norm.trainable_variables

[<tf.Variable 'batch_normalization_1/gamma:0' shape=(5,) dtype=float32, numpy=array([1., 1., 1., 1., 1.], dtype=float32)>,
 <tf.Variable 'batch_normalization_1/beta:0' shape=(5,) dtype=float32, numpy=array([0., 0., 0., 0., 0.], dtype=float32)>]

In [129]:
norm.variables

[<tf.Variable 'batch_normalization_1/gamma:0' shape=(5,) dtype=float32, numpy=array([1., 1., 1., 1., 1.], dtype=float32)>,
 <tf.Variable 'batch_normalization_1/beta:0' shape=(5,) dtype=float32, numpy=array([0., 0., 0., 0., 0.], dtype=float32)>,
 <tf.Variable 'batch_normalization_1/moving_mean:0' shape=(5,) dtype=float32, numpy=
 array([-0.00110975,  0.00036719,  0.0031628 , -0.00068342, -0.00579923],
       dtype=float32)>,
 <tf.Variable 'batch_normalization_1/moving_variance:0' shape=(5,) dtype=float32, numpy=
 array([1.0060341 , 0.99633485, 1.0017756 , 0.9977652 , 0.99740314],
       dtype=float32)>]

##### 计算过程
参考这篇文献: [Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models](https://arxiv.org/pdf/1702.03275.pdf)，其计算过程大概如下：

In [130]:
a = tf.random.normal([6,3])

In [131]:
a

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[-0.91982466,  0.22108793,  1.372667  ],
       [-0.41213787,  1.5481071 , -0.61506   ],
       [-0.16976114, -1.3530649 , -0.65074354],
       [-2.095482  , -0.21522683,  0.11636626],
       [ 0.8393733 , -1.371746  , -0.4672847 ],
       [ 0.2859292 ,  0.6168625 , -0.854694  ]], dtype=float32)>

In [132]:
tf.nn.moments(a, axes=0) # 计算均值和方差

(<tf.Tensor: shape=(3,), dtype=float32, numpy=array([-0.41198388, -0.09233003, -0.18312484], dtype=float32)>,
 <tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.86728626, 1.0889467 , 0.57452846], dtype=float32)>)

In [133]:
norm = layers.BatchNormalization()
norm(a, training=True)  #设置权重

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[-5.44999540e-01,  3.00207287e-01,  2.05077505e+00],
       [-1.65253878e-04,  1.57129216e+00, -5.69357634e-01],
       [ 2.59946227e-01, -1.20759451e+00, -6.16394043e-01],
       [-1.80667984e+00, -1.17716655e-01,  3.94775778e-01],
       [ 1.34291911e+00, -1.22548819e+00, -3.74566823e-01],
       [ 7.48979449e-01,  6.79299772e-01, -8.85232449e-01]], dtype=float32)>

##### 参数

In [134]:
norm.momentum

0.99

In [135]:
norm.epsilon

0.001

In [136]:
norm.moving_mean # 初始化值为0

<tf.Variable 'batch_normalization_2/moving_mean:0' shape=(3,) dtype=float32, numpy=array([-0.00411984, -0.0009233 , -0.00183125], dtype=float32)>

In [137]:
norm.moving_variance # 初始化值为1

<tf.Variable 'batch_normalization_2/moving_variance:0' shape=(3,) dtype=float32, numpy=array([0.99867284, 1.0008894 , 0.9957453 ], dtype=float32)>

In [138]:
norm.beta

<tf.Variable 'batch_normalization_2/beta:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>

In [139]:
norm.gamma

<tf.Variable 'batch_normalization_2/gamma:0' shape=(3,) dtype=float32, numpy=array([1., 1., 1.], dtype=float32)>

参数的更新计算方式与下面的计算方式相同

In [140]:
norm.momentum * 0 + tf.nn.moments(a, axes=0)[0] * (1-norm.momentum)

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([-0.00411984, -0.0009233 , -0.00183125], dtype=float32)>

In [141]:
norm.momentum * 1. + tf.nn.moments(a, axes=0)[1] * (1-norm.momentum)

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.99867284, 1.0008894 , 0.9957453 ], dtype=float32)>

##### `training=False`时计算方式

In [142]:
norm(a, False)

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[-0.9158547 ,  0.22180179,  1.3767406 ],
       [-0.4080848 ,  1.5475692 , -0.61422914],
       [-0.16566841, -1.3508661 , -0.64997095],
       [-2.0917046 , -0.21410136,  0.11839034],
       [ 0.8436312 , -1.3695295 , -0.46621278],
       [ 0.2900965 ,  0.617203  , -0.8542541 ]], dtype=float32)>

In [143]:
#等同于下面的计算：
(a-norm.moving_mean)/(norm.moving_variance+norm.epsilon)**0.5 * norm.gamma+norm.beta

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[-0.9158547 ,  0.2218018 ,  1.3767406 ],
       [-0.40808478,  1.5475692 , -0.61422914],
       [-0.1656684 , -1.3508661 , -0.6499709 ],
       [-2.0917044 , -0.21410136,  0.11839033],
       [ 0.84363115, -1.3695295 , -0.46621278],
       [ 0.29009652,  0.61720294, -0.85425407]], dtype=float32)>

##### `training=True`时的计算方式

我猜测应该是按照下面的方式计算的

In [144]:
(a-tf.nn.moments(a, axes=0)[0])/(tf.nn.moments(a, axes=0)[1] + norm.epsilon)**0.5 * norm.gamma + norm.beta

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[-5.44999480e-01,  3.00207317e-01,  2.05077529e+00],
       [-1.65255959e-04,  1.57129216e+00, -5.69357574e-01],
       [ 2.59946197e-01, -1.20759451e+00, -6.16394103e-01],
       [-1.80667984e+00, -1.17716655e-01,  3.94775808e-01],
       [ 1.34291911e+00, -1.22548819e+00, -3.74566853e-01],
       [ 7.48979390e-01,  6.79299831e-01, -8.85232449e-01]], dtype=float32)>

In [145]:
norm(a, True)

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[-5.44999540e-01,  3.00207287e-01,  2.05077505e+00],
       [-1.65253878e-04,  1.57129216e+00, -5.69357634e-01],
       [ 2.59946227e-01, -1.20759451e+00, -6.16394043e-01],
       [-1.80667984e+00, -1.17716655e-01,  3.94775778e-01],
       [ 1.34291911e+00, -1.22548819e+00, -3.74566823e-01],
       [ 7.48979449e-01,  6.79299772e-01, -8.85232449e-01]], dtype=float32)>

In [177]:
norm = keras.layers.BatchNormalization(axis=[-2, -1])
x = tf.random.normal([6,5,4,3])
_ = norm(x, True)

In [178]:
norm.weights

[<tf.Variable 'batch_normalization_9/gamma:0' shape=(1, 1, 4, 3) dtype=float32, numpy=
 array([[[[1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.]]]], dtype=float32)>,
 <tf.Variable 'batch_normalization_9/beta:0' shape=(1, 1, 4, 3) dtype=float32, numpy=
 array([[[[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]]]], dtype=float32)>,
 <tf.Variable 'batch_normalization_9/moving_mean:0' shape=(1, 1, 4, 3) dtype=float32, numpy=
 array([[[[-0.00221661, -0.00137228,  0.00279443],
          [ 0.00341048, -0.00194845, -0.00075563],
          [-0.00315018, -0.00273164, -0.00047328],
          [-0.00027766, -0.00072617,  0.00263126]]]], dtype=float32)>,
 <tf.Variable 'batch_normalization_9/moving_variance:0' shape=(1, 1, 4, 3) dtype=float32, numpy=
 array([[[[1.0037653 , 1.0002662 , 0.9988896 ],
          [1.0039077 , 1.0024092 , 0.99825966],
          [0.9961058 , 0.99810433, 1.0003287 ],
          [0.999886  , 0.99815106, 0.

#### tf.nn.softmax

In [179]:
x = tf.Variable([[ 3., 11.,  6.],[ 6., 11., 16.]])
tf.nn.softmax(x)
# 相当于：tf.exp(x)/tf.expand_dims(tf.reduce_sum(tf.exp(x), axis=1), 1)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[3.3310644e-04, 9.9297631e-01, 6.6906218e-03],
       [4.5094042e-05, 6.6925492e-03, 9.9326235e-01]], dtype=float32)>

#### tf.nn.softmax_cross_entropy_with_logits

```
tf.nn.softmax_cross_entropy_with_logits(labels, logits)
```
* 计算交叉熵，输入是softmax的输入，也就是说softmax的计算是在此函数的内部完成的；
* 注意返回是的一个batch的所有样本组成的向量，要求批次总的交叉熵，还要使用tf.reduce_sum；
* logits:神经网络最后一层的输出，维度是`[batch_size, num_classes]`，如果是单个样本那维度就是num_classes；
* labels:样本的实际标签，维度与上面相同

In [184]:
x = tf.Variable([[ 3., 11.,  6.],[ 6., 11., 16.]])
tf.nn.softmax(x)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[3.3310644e-04, 9.9297631e-01, 6.6906218e-03],
       [4.5094042e-05, 6.6925492e-03, 9.9326235e-01]], dtype=float32)>

In [185]:
tf.nn.sparse_softmax_cross_entropy_with_logits([1,2], x)

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.0070485 , 0.00676046], dtype=float32)>

In [186]:
tf.nn.softmax_cross_entropy_with_logits(tf.one_hot([1,2], depth=3), x)

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.0070485 , 0.00676046], dtype=float32)>

In [187]:
- tf.reduce_sum(tf.math.log(tf.nn.softmax(x)) * tf.one_hot([1,2], depth=3), axis=1)

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.00704847, 0.00676045], dtype=float32)>

#### tf.nn.conv2d

```
tf.nn.conv2d(input, filters, strides, padding, data_format='NHWC', dilations=None, name=None)
```
* 第一个参数input，要求shape必须满足`[batch, in_height, in_weight, in_channels]`，具体的含义是`[训练时一个batch的图片数量，图片高度，图片宽度，图像通道数]`
* 第二个参数filters是卷积核，要求shape必须满足`[filter_height,filter_width,in_channels,out_channels]`，具体的含义是`[卷积核的高度，卷积核的宽度，图像的通道数，卷积核的个数]`
* 第三个参数strides，卷积时在图像每一维的步长，这是一个一维张量，对于图片来说，`strides=[1, x, y, 1]，strides[0]==strides[3]==1`；
* 第四个参数padding，string类型的量，只能是“SAME”，“VALID”其中之一，这个值决定了不同的卷积方式， padding="SAME"表示有padding，前后补0，保证行列数不变，padding="VALID"表示不加padding；
* 第五个参数，use_cudnn_on_gpu: bool类型，是否使用cudnn加速；

In [203]:
t = layers.Conv2D(filters=8, kernel_size=[4,4])
x = tf.random.normal([6, 10, 10, 3])
t(x).shape

TensorShape([6, 7, 7, 8])

In [200]:
t.variables[0].shape # 卷积核权重的个数，每个卷积核大小为[height,width,in_channels]，卷积核的个数为out_channels

TensorShape([4, 4, 3, 8])

In [201]:
t.variables[1].shape # 卷积运算的bias个数，每个卷积核对应一个bias

TensorShape([8])

## tf.image

In [205]:
img = tf.io.read_file('data/thelight.jpg')
arr_img = tf.image.decode_jpeg(img)

```
tf.image.adjust_brightness(arr_img, 0.2)  # 计算方式：arr_img - 0.2 * 255, 调整亮度，负数的话是减小亮度
tf.image.adjust_contrast(arr_img, 0.2) # 调整对比度，相当于a=np.mean(arr_img.numpy(), axis=(0,1));tf.cast((0.2 * (arr_img.numpy() - a) + a), 'uint8')
tf.image.adjust_gamma(arr_img, gamma=0.2, gain=1）# 大概相当于tf.cast(255 * (arr_img/255)**0.2, 'uint8')
tf.image.random_crop(star, [1000,1000, 3])  # 随机切去[1000,1000]大小的图片，支持批量操作
tf.image.random_crop(tf.stack([star, star], 0), [2, 1000, 1000, 3])


```

```
tf.image.flip_left_right(img)  # 左右镜像对称
tf.image.rgb_to_grayscale  # 转换为灰度图
tf.image.rgb_to_hsv(image)  # image需在[0,1]范围内
tf.image.adjust_saturation(image, 3)  # 将image转换为hsv格式后，再将饱和度通道的值乘以3，再转换为rgb格式
tf.image.rot90  # 旋转90度
tf.image.central_crop(image, central_fraction=0.5) # 剪切，只留下中间50%
tf.image.convert_image_dtype(image, tf.float32) # Cast and normalize the image to [0,1]

```

## tf.feature_column

In [206]:
URL = 'https://storage.googleapis.com/applied-dl/heart.csv'
dataframe = pd.read_csv(URL)
dataframe.head(2)

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,1,145,233,1,2,150,0,2.3,3,0,fixed,0
1,67,1,4,160,286,0,2,108,1,1.5,2,3,normal,1


In [347]:
labels = dataframe.pop('target')
dataset = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
dataset = dataset.batch(8)
one_batch = next(dataset.as_numpy_iterator())[0]
one_batch

{'age': array([63, 67, 67, 37, 41, 56, 62, 57], dtype=int32),
 'sex': array([1, 1, 1, 1, 0, 1, 0, 0], dtype=int32),
 'cp': array([1, 4, 4, 3, 2, 2, 4, 4], dtype=int32),
 'trestbps': array([145, 160, 120, 130, 130, 120, 140, 120], dtype=int32),
 'chol': array([233, 286, 229, 250, 204, 236, 268, 354], dtype=int32),
 'fbs': array([1, 0, 0, 0, 0, 0, 0, 0], dtype=int32),
 'restecg': array([2, 2, 2, 0, 2, 0, 2, 0], dtype=int32),
 'thalach': array([150, 108, 129, 187, 172, 178, 160, 163], dtype=int32),
 'exang': array([0, 1, 1, 0, 0, 0, 0, 1], dtype=int32),
 'oldpeak': array([2.3, 1.5, 2.6, 3.5, 1.4, 0.8, 3.6, 0.6]),
 'slope': array([3, 2, 2, 3, 1, 1, 3, 1], dtype=int32),
 'ca': array([0, 3, 2, 0, 0, 0, 2, 0], dtype=int32),
 'thal': array([b'fixed', b'normal', b'reversible', b'normal', b'normal',
        b'normal', b'normal', b'normal'], dtype=object)}

#### tf.feature_column.numeric_column

In [354]:
one_age = tf.feature_column.numeric_column('age')
feature_layer = layers.DenseFeatures([one_age])

In [355]:
feature_layer(dict(age=[1,2], ppp=[3,4]))

<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[1.],
       [2.]], dtype=float32)>

In [357]:
feature_layer(one_batch)

<tf.Tensor: shape=(8, 1), dtype=float32, numpy=
array([[63.],
       [67.],
       [67.],
       [37.],
       [41.],
       [56.],
       [62.],
       [57.]], dtype=float32)>

#### tf.feature_column.bucketized_column

In [358]:
buck_age = tf.feature_column.bucketized_column(one_age, boundaries=[37, 40, 65, 67, 70]) # 输入是numeric_column
feature_layer = layers.DenseFeatures(buck_age)

In [360]:
feature_layer(one_batch) # 第一个区间是开区间，后面的是左闭右开

<tf.Tensor: shape=(8, 6), dtype=float32, numpy=
array([[0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.]], dtype=float32)>

#### tf.feature_column.categorical_column_with_vocabulary_list

#### tf.feature_column.indicator_column

#### tf.feature_column.embedding_column

DenseFeatures only accepts dense tensors, to inspect a categorical column you need to transform that to a indicator column first:

In [361]:
thal = tf.feature_column.categorical_column_with_vocabulary_list('thal',  ['fixed', 'normal', 'reversible'])
thal_one_hot = tf.feature_column.indicator_column(thal)
feature_layer = layers.DenseFeatures(thal_one_hot)

In [363]:
feature_layer(one_batch)

<tf.Tensor: shape=(8, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.]], dtype=float32)>

In [364]:
thal_embedding = tf.feature_column.embedding_column(thal, 3)
feature_layer = layers.DenseFeatures(thal_embedding)

In [366]:
feature_layer(one_batch)

<tf.Tensor: shape=(8, 3), dtype=float32, numpy=
array([[-0.5390322 , -0.44834587, -0.07046258],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [-0.5288149 ,  0.34261864,  0.02457438],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [ 0.2010104 ,  0.18482254, -0.7510743 ]], dtype=float32)>

In [377]:
features = layers.DenseFeatures([one_age, buck_age, thal_one_hot])

In [379]:
features(one_batch)

<tf.Tensor: shape=(8, 10), dtype=float32, numpy=
array([[63.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.,  0.],
       [67.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  0.],
       [67.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  1.],
       [37.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [41.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [56.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [62.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [57.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.]], dtype=float32)>

## tensorflow_hub Reusable Model

In [32]:
import tensorflow_hub as hub
import copy

In [33]:
class MyModule(tf.Module):
    def __init__(self, name, units=5):
        super(MyModule, self).__init__(name=name)
        self.w = None
        self.b = None
        self.units = units
        self.built = False  # tf.keras.layers.Layer会设置此属性，并且子类会继承，用于指示是否建立权重
    @tf.Module.with_name_scope
    def build(self, input_shape):
        if self.w is None:
            self.w = tf.Variable(tf.random.normal([input_shape[-1], self.units]))
        if self.b is None:
            self.b = tf.Variable(tf.random.normal([self.units, ]))
        self.built = True  # 设置为True
    def call(self, input):
        return tf.matmul(input, self.w) + self.b
    @tf.function
    def __call__(self, input):
        if not self.built:  # 第一次调用时built=False，调用build方法，建立权重
          self.build(input.shape)
        return self.call(input)

In [34]:
class HubModule(tf.train.Checkpoint):
    def __init__(self, model):
        super(HubModule, self).__init__()
        self.model = model
        self.variables = model.variables
        self.trainable_variables = model.trainable_variables
    @tf.function(input_signature=[tf.TensorSpec([2, 3], dtype=tf.float32)])
    def __call__(self, input):
        return self.model(input)

In [35]:
x = tf.random.normal([2,3])

In [36]:
m = MyModule('nnn')
m(x).shape

TensorShape([2, 5])

In [37]:
tf.saved_model.save(HubModule(m), 'model/hub')

INFO:tensorflow:Assets written to: model/hub/assets


INFO:tensorflow:Assets written to: model/hub/assets


In [38]:
# 这三种写法都没有问题
#m2 = tf.saved_model.load('model/hub')
# m2 = hub.load('model/hub') 

m2 = hub.KerasLayer('model/hub', trainable=True)

In [39]:
m2(x).shape

TensorShape([2, 5])

In [40]:
a = copy.deepcopy(m2.trainable_variables)

In [41]:
m2.trainable_variables

[<tf.Variable 'nnn/Variable:0' shape=(5,) dtype=float32, numpy=
 array([ 0.0128301 , -0.38711149, -1.0720037 ,  0.5841067 , -1.4495848 ],
       dtype=float32)>,
 <tf.Variable 'nnn/Variable:0' shape=(3, 5) dtype=float32, numpy=
 array([[-0.9564233 ,  0.51733613, -1.8141801 ,  0.15252951, -0.9327845 ],
        [-1.669865  ,  0.8874694 , -0.14137633,  0.68015844,  0.8777286 ],
        [-0.6864597 ,  0.61776245, -1.0815104 ,  0.9587199 ,  0.7806263 ]],
       dtype=float32)>]

In [42]:
optimizer = tf.optimizers.SGD(0.5)
with tf.GradientTape() as tape:
  y = m2(x)
  loss = tf.math.reduce_euclidean_norm(y-1)
grad = tape.gradient(loss, m2.trainable_variables)
optimizer.apply_gradients(grads_and_vars= zip(grad, m2.trainable_variables))

<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=1>

In [43]:
m2.trainable_variables

[<tf.Variable 'nnn/Variable:0' shape=(5,) dtype=float32, numpy=
 array([ 0.11147295, -0.09057796, -0.81769305,  0.7207276 , -0.9929245 ],
       dtype=float32)>,
 <tf.Variable 'nnn/Variable:0' shape=(3, 5) dtype=float32, numpy=
 array([[-0.9823717 ,  0.47344562, -1.8047373 ,  0.11407161, -0.9993112 ],
        [-1.5960433 ,  0.8954189 , -0.42987692,  0.79820263,  0.8832942 ],
        [-0.83529687,  0.49052802, -0.7487032 ,  0.72893465,  0.59467673]],
       dtype=float32)>]

In [44]:
[a[0] - grad[0]*0.5, a[1] - grad[1]*0.5]

[<tf.Tensor: shape=(5,), dtype=float32, numpy=
 array([ 0.11147295, -0.09057796, -0.81769305,  0.7207276 , -0.9929245 ],
       dtype=float32)>,
 <tf.Tensor: shape=(3, 5), dtype=float32, numpy=
 array([[-0.9823717 ,  0.47344562, -1.8047373 ,  0.11407161, -0.9993112 ],
        [-1.5960433 ,  0.8954189 , -0.42987692,  0.79820263,  0.8832942 ],
        [-0.83529687,  0.49052802, -0.7487032 ,  0.72893465,  0.59467673]],
       dtype=float32)>]

In [51]:
m2.trainable = False

In [52]:
m2.trainable_variables

[]