本笔记参考了下面的书籍、文献、博客或者官方说明：
* TensorFlow2官方文档：https://tensorflow.google.cn/
* 简单粗暴TensorFlow 2：https://github.com/snowkylin/tensorflow-handbook
* TensorFlow 2.0 学习笔记：https://zhuanlan.zhihu.com/p/74441082

未注明出处的代码示例，`大概`就是我自己编的，`大概`的意思就是也有极小的概率是忘记注明了。。。

In [2]:
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_datasets  as tfds
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras import layers
from tensorflow.keras import preprocessing as prep
from matplotlib import pyplot as plt

# tf.TensorArray

In [72]:
x = tf.TensorArray(dtype=tf.float32, size=3, infer_shape=False, clear_after_read=False)
a = tf.random.normal([3,2,2])
x.unstack(a)

<tensorflow.python.ops.tensor_array_ops.TensorArray at 0x141f0af98>

In [73]:
x.read(0)

<tf.Tensor: id=602, shape=(2, 2), dtype=float32, numpy=
array([[-0.10629629, -0.19466183],
       [-0.4460331 , -0.47419053]], dtype=float32)>

In [74]:
x.stack().shape

TensorShape([3, 2, 2])

In [75]:
x.gather([1,2])

<tf.Tensor: id=606, shape=(2, 2, 2), dtype=float32, numpy=
array([[[-1.2129303 ,  0.77940047],
        [-0.21802388,  0.98266596]],

       [[ 1.740146  , -0.34278297],
        [-1.3144659 ,  1.0175093 ]]], dtype=float32)>

In [76]:
y = x.scatter([2,1,0], a)
y.read(0)

<tf.Tensor: id=609, shape=(2, 2), dtype=float32, numpy=
array([[ 1.740146  , -0.34278297],
       [-1.3144659 ,  1.0175093 ]], dtype=float32)>

In [77]:
a = tf.random.normal([5,6])
x.split(a, [1,2,2]) # 长度分别是1，2，2

<tensorflow.python.ops.tensor_array_ops.TensorArray at 0x141f0a438>

In [78]:
x.read(0)

<tf.Tensor: id=620, shape=(1, 6), dtype=float32, numpy=
array([[-3.5588963 , -1.5443984 , -0.6560149 ,  0.4804773 , -0.9900909 ,
         0.86643726]], dtype=float32)>

In [79]:
x.read(1)

<tf.Tensor: id=621, shape=(2, 6), dtype=float32, numpy=
array([[-0.48379177,  1.3902569 ,  0.03354906, -0.9551902 ,  1.7645974 ,
        -0.33056656],
       [ 0.79169416, -0.74314564,  1.0771104 ,  0.33629403, -1.1552415 ,
         0.78788143]], dtype=float32)>

# tf.save_model

使用`tf.function`一章中的`MyModule`类的实例`m`展示

In [80]:
tf.saved_model.save(m, "data/modelDir")
tf.saved_model.load("data/modelDir")

<tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject at 0x1422696d8>

# tf.random

In [3]:
x = tf.random.categorical(tf.math.log([[0.1,0.3,0.6]]), 1000)  # 参数可以是log后的概率值，这个写法会以0.1，0.3，0.6的概率从多项式分布中随机抽取
x = tf.random.categorical(tf.math.log([[1., 2. , 6.]]), 1000) # 概率分别是[1., 2., 6.]/tf.reduce_sum([1., 2. 6.])
# 换一种说法就是，按照参数tf.exp变换后所占的比例作为随机抽样的概率，相当于对参数做了softmax

In [4]:
tf.reduce_sum(tf.cast(x==0, tf.int64))

<tf.Tensor: shape=(), dtype=int64, numpy=103>

# tf.train

#### tf.train.Checkpoint

Checkpoint只保存模型的参数，不保存模型的计算过程，因此一般用于在具有模型源码的时候恢复之前训练好的模型参数。
```python3
checkpoint = tf.train.Checkpoint(model=model)
checkpoint.save(save_path_with_prefix)
```
* 这里tf.train.Checkpoint接受的参数比较特殊，是一个\*\*kwargs。具体而言，是一系列键值对，键名可以随便起，值为需要保存的对象。
* `save_path_with_prefix`是保存文件的目录+前缀。例如在`checkpoint.save("./save/model.ckpt")`，在save目录下会建立三个文件：`checkpoint, model.ckpt-1.index, model.ckpt-1.data-00000-of-00001`，这些文件记录了变量信息。`checkpoint.save`可以运行多次，每次运行都会得到一个`.index`文件和`.data`文件，序号一次累加。

继续训练模型可以用一下方式实现：
```
checkpoint = tf.train.Checkpoint(myAwesomeModel=model, myAwesomeOptimizer=optimizer)
checkpoint.save(save_path_with_prefix)
model_to_be_restored = MyModel() 
checkpoint = tf.train.Checkpoint(myAwesomeModel=model_to_be_restored)
checkpoint.restore(save_path_with_prefix_and_index)
```
* `save_path_with_prefix_and_index`是之前保存到文件的目录+前缀+编号。例如，调用`checkpoint.restore("./save/model.ckpt-1")`，序号为1的文件来恢复模型。

```
tf.train.latest_checkpoint(save_path)
```
* 返回最近一次的checkpoint的文件名，比如返回`./save/model.ckpt-10`

In [81]:
tf.train.latest_checkpoint('data')

'data/ckpt.save.test-1'

#### tf.train.Feature

In [38]:
value_1 = tf.constant('aaaa')
#x0 = tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) 
# BytesList won't unpack a string from an EagerTensor.
if isinstance(value_1, type(tf.constant(0))):
    value_1 = value_1.numpy() # BytesList won't unpack a string from an EagerTensor.
a = tf.train.BytesList(value=[value_1, b'cccc'])
a

value: "aaaa"
value: "cccc"

In [39]:
x = tf.constant(['aaaa', 'cccc'])
x = x.numpy() 
x0 = tf.train.Feature(bytes_list=tf.train.BytesList(value=[b'aaaa', b'cccc']))
x0

bytes_list {
  value: "aaaa"
  value: "cccc"
}

In [50]:
value_2 = tf.constant([4., 5.])
x1 = tf.train.Feature(float_list=tf.train.FloatList(value=value_2))
value_3 = tf.constant([2, 3])
x2 = tf.train.Feature(int64_list=tf.train.Int64List(value=value_3))
feature = {'x1':x1, 'x2':x2, 'x0':x0}

In [51]:
tf.train.Features(feature=feature)

feature {
  key: "x0"
  value {
    bytes_list {
      value: "aaaa"
      value: "cccc"
    }
  }
}
feature {
  key: "x1"
  value {
    float_list {
      value: 4.0
      value: 5.0
    }
  }
}
feature {
  key: "x2"
  value {
    int64_list {
      value: 2
      value: 3
    }
  }
}

In [52]:
tf.train.Example(features=tf.train.Features(feature=feature))

features {
  feature {
    key: "x0"
    value {
      bytes_list {
        value: "aaaa"
        value: "cccc"
      }
    }
  }
  feature {
    key: "x1"
    value {
      float_list {
        value: 4.0
        value: 5.0
      }
    }
  }
  feature {
    key: "x2"
    value {
      int64_list {
        value: 2
        value: 3
      }
    }
  }
}

# tf.initializer

如果深度学习模型的权重初始化得太小，那信号将在每层间传递时逐渐缩小而难以产生作用；如果权重初始化的太大，那信号将在每层间传递时逐渐放大并导致发散和失效。  
Xavier初始化器让初始化权重满足均值为0，方差为$\frac{2}{N_{in}+N_{out}}$均匀分布或者高斯分布；

* `tf.initializers.glorot_normal()(shape=[20,30])`：创建 $N_{in}=20,N_{out}=30$ 服从正态分布的的初始化权重；
* `tf.initializers.glorot_uniform()(shape=[20,30]`：与上面相同，只是服从的是均匀分布。

也可以通过下面的api间接实现：  
* `tf.random_normal_initializer(mean=0.0,stddev=0.05)(shape=[])`
* `tf.random_uniform_initializer(minval=-0.05, maxval=0.05)(shape=[])`

# tf.linalg

```
matrix_band_part(input, num_lower, num_upper)
```
* num_lower: 下三角要保留的对角线数，-1表示全保留；num_upper类似

In [6]:
x = tf.random.normal([4,4])
x

<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[ 0.529435  ,  0.8286487 ,  1.837228  ,  0.09473267],
       [ 0.36858612,  1.0950916 ,  0.62635964,  1.1934665 ],
       [-0.7357971 ,  0.18043841, -0.19846489, -0.9738333 ],
       [-1.5683837 ,  2.8496115 ,  0.01234788, -0.76184636]],
      dtype=float32)>

In [9]:
tf.linalg.band_part(x, 0, -1)  # 变成下三角矩阵

<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[ 0.529435  ,  0.8286487 ,  1.837228  ,  0.09473267],
       [ 0.        ,  1.0950916 ,  0.62635964,  1.1934665 ],
       [ 0.        ,  0.        , -0.19846489, -0.9738333 ],
       [ 0.        ,  0.        ,  0.        , -0.76184636]],
      dtype=float32)>

# tf.math

In [82]:
tf.math.reduce_std # 标准差
tf.math.reduce_variance # 方差
tf.math.reduce_all
tf.math.reduce_any
tf.math.reduce_logsumexp # 相当于 tf.math.log(tf.reduce_sum(tf.exp(x)))
tf.math.argmin
tf.math.argmax

<function tensorflow.python.ops.math_ops.argmax_v2(input, axis=None, output_type=tf.int64, name=None)>

`tf.matmul(a, b)`  # 将最后两个维度用与矩阵乘法，前面的维度必须完全相同。

# tf.GradientTape

在tf.GradientTape上下文中执行的所有操作记录下来，用于计算梯度。默认情况下，tf.GradientTape持有的资源会在调用GradientTape.gradient()方法后立即释放。要在同一计算中计算多个梯度，需要创建一个持久梯度带，这允许多次调用gradient()方法。

In [83]:
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as t:
  t.watch(x)  # 由于x是常数，所以要调用调用watch方法，如果是Variable则不需要这一行
  y = x * x
  z = y * y
dz_dx = t.gradient(z, x)  # 108.0 (4*x^3 at x = 3)
dy_dx = t.gradient(y, x)  # 6.0
del t  # Drop the reference to the tape
dz_dx

<tf.Tensor: id=770, shape=(), dtype=float32, numpy=108.0>

在上下文中的梯度计算也会被记录下来，因此可以实现高阶梯度计算。

In [84]:
x = tf.Variable(1.0)
with tf.GradientTape() as t:
    with tf.GradientTape() as t2:
        y = x * x * x
    dy_dx = t2.gradient(y,x)
d2y_dx2 = t.gradient(dy_dx, x)

In [85]:
assert dy_dx.numpy() == 3.0
assert d2y_dx2.numpy() == 6.0

In [86]:
x = tf.Variable(1.)
with tf.GradientTape() as tape:
    y = x * 8.
    y = x * x
dydx = tape.gradient(y,x)

In [87]:
dydx

<tf.Tensor: id=824, shape=(), dtype=float32, numpy=2.0>

# tf.losses

```
tf.losses.sparse_categorical_crossentropy(y_true, y_pred, from_logits=False)
```
* 其中y_true是系数矩阵，直接的label，而不是one-hot向量
* from_logits=False时，y_pred是tf.nn.softmax输出结果，也就是每一个元素都是概率，每一行之和为1
* from_logits=True时，y_pred是上一层的输出结果，也就是说softmax(y_pred)运算在此函数内执行

两分类的交叉熵计算，可以看到下面三种计算方式结果一致：

In [88]:
tf.losses.binary_crossentropy([1,0,1], [0.9,0.3,0.7])

<tf.Tensor: id=849, shape=(), dtype=float32, numpy=0.27290332>

In [89]:
tf.reduce_mean(tf.losses.binary_crossentropy([[1],[0],[1]], [[0.9],[0.3],[0.7]]))

<tf.Tensor: id=876, shape=(), dtype=float32, numpy=0.27290332>

In [90]:
a = tf.multiply(tf.subtract(1.,[1,0,1.]), tf.math.log(tf.subtract(1.,[0.9,0.3,0.7])))
b = tf.multiply([1,0,1.], tf.math.log([0.9,0.3,0.7]))
-tf.reduce_mean(a+b)

<tf.Tensor: id=892, shape=(), dtype=float32, numpy=0.27290347>

# tf.metrics

tf.metrics.categorical_accuracy(y_true, y_pred)
* y_true是one-hot向量；y_pred是softmax输出

tf.metrics.sparse_categorical_accuracy(y_true, y_pred)
* y_true 是系数Tensor

In [91]:
a = tf.constant([1., 1, 0, 0])
b = tf.constant([0.98, 1, 0, 0.55])

In [92]:
tf.metrics.BinaryAccuracy(threshold=0.55)(a, b)

<tf.Tensor: id=922, shape=(), dtype=float32, numpy=1.0>

In [93]:
tf.metrics.binary_accuracy(a, tf.where(b>0.55, 1., 0))

<tf.Tensor: id=934, shape=(), dtype=float32, numpy=1.0>

In [94]:
tf.metrics.binary_accuracy(a, b, 0.55)

<tf.Tensor: id=941, shape=(), dtype=float32, numpy=1.0>

# tf.optimizer

```python
optimizer = tf.keras.optimizers.Adam()
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradient(zip(grads, model.trainable_variables))
# .apply_gradient(grads_and_vars)
# grads_and_vars: List of (gradient, variable) pairs
```

## tf.nn

#### 激活函数

In [3]:
tf.nn.relu

<function tensorflow.python.ops.gen_nn_ops.relu(features, name=None)>

In [4]:
tf.nn.tanh

<function tensorflow.python.ops.gen_math_ops.tanh(x, name=None)>

In [5]:
tf.nn.sigmoid

<function tensorflow.python.ops.math_ops.sigmoid(x, name=None)>

#### tf.nn.top_k

In [95]:
a = tf.random.normal([6,3])
b = tf.constant([2,1,1,0,0,1])

In [96]:
tf.nn.top_k(a,  2)

TopKV2(values=<tf.Tensor: id=950, shape=(6, 2), dtype=float32, numpy=
array([[-0.16724938, -0.41634324],
       [ 0.053495  , -0.25219882],
       [-0.28541866, -0.3039618 ],
       [ 0.5407113 , -0.07260029],
       [ 0.83685905, -0.5499403 ],
       [ 0.864015  ,  0.46615827]], dtype=float32)>, indices=<tf.Tensor: id=951, shape=(6, 2), dtype=int32, numpy=
array([[1, 0],
       [1, 0],
       [2, 0],
       [0, 1],
       [2, 0],
       [0, 1]], dtype=int32)>)

In [97]:
tf.nn.in_top_k(b, a,  2)

<tf.Tensor: id=953, shape=(6,), dtype=bool, numpy=array([False,  True, False,  True,  True,  True])>

#### tf.nn.moment

```python
tf.nn.moments(x, axes, keep_dims=False)
# 若axes=[0,1,2]，则沿着[0,1,2]轴计算mean和variance
# keep_dims 返回的结果是否保持原来的维度
```

In [98]:
x = tf.random.normal([128, 32, 32, 64])
m, v = tf.nn.moments(x, [0,1,2], keepdims=True)
assert m.shape == [1,1,1,64]

In [99]:
# 相当于
m2 = tf.reduce_mean(x, axis=[0,1,2], keepdims=True)

In [100]:
tf.math.reduce_all(m == m2).numpy()

True

#### tf.nn.batch_normalization

```pthon
tf.nn.batch_normalization(x, mean, variance, offset, scale, variance_epsilion)
```
* mean, variance可以是`tf.nn.moments`的输出结果
* 计算公式：
```
tmp = (x-mean)/tf.sqrt(variance + variance_epsilon)
return tmp * scale + offset
```

#### tf.keras.layers.BatchNormalization

```python
tf.keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones')
```
* axis：要标准化的特征轴
* momentum：移动平均系数
* epsilon：同上面的variance_epsilon
* scale：是否乘上scale
* center：是否加上offset
* gamma：同上面scale
* beta：同上面offset

设当前层norm的状态为：移动平均值$\mu$，标准差$\sigma$；当前mini-batch样本的均值和标准差分别为：$\mu_i,\sigma_i$；  系数为$\alpha$，也就是1 - norm.momentum；则
$$
\mu = (1-\alpha)\mu + \alpha \mu_i \\
\sigma = (1-\alpha)\sigma + \alpha\sigma_i
$$

In [123]:
tmp = tf.random.normal([3, 4, 5])
norm = layers.BatchNormalization()
norm(tmp, training=True)
norm.moving_mean

<tf.Variable 'batch_normalization_1/moving_mean:0' shape=(5,) dtype=float32, numpy=
array([-5.5041215e-05, -2.2435044e-03,  4.9383932e-04,  3.6231871e-04,
        7.5543998e-04], dtype=float32)>

##### [参考这篇文献](https://arxiv.org/pdf/1702.03275.pdf)，其计算过程大概如下：

In [10]:
a = tf.random.normal([6,3])

In [11]:
a

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[ 0.9958078 , -1.1161667 ,  0.5356532 ],
       [-1.4961139 , -0.51845664,  0.5639002 ],
       [ 1.5387233 , -0.23743649,  0.7263154 ],
       [ 2.2434304 , -1.1048131 , -0.6543047 ],
       [-1.310277  , -0.43914226,  0.15493158],
       [ 0.05232201,  0.90955174,  0.10727721]], dtype=float32)>

In [12]:
tf.nn.moments(a, axes=0) # 计算均值和方差

(<tf.Tensor: shape=(3,), dtype=float32, numpy=array([ 0.33731544, -0.41774392,  0.23896213], dtype=float32)>,
 <tf.Tensor: shape=(3,), dtype=float32, numpy=array([1.9445854 , 0.46078062, 0.20890851], dtype=float32)>)

In [13]:
norm = layers.BatchNormalization()
norm(a, training=True)  #设置权重

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[ 0.47209084, -1.027781  ,  0.64757395],
       [-1.3144346 , -0.14820623,  0.70922744],
       [ 0.86132157,  0.26533574,  1.0637237 ],
       [ 1.366545  , -1.0110734 , -1.9496927 ],
       [-1.1812032 , -0.03148925, -0.18340966],
       [-0.20431943,  1.9532139 , -0.2874227 ]], dtype=float32)>

##### 参数

In [14]:
norm.momentum

0.99

In [15]:
norm.epsilon

0.001

In [16]:
norm.moving_mean # 初始化值为0

<tf.Variable 'batch_normalization/moving_mean:0' shape=(3,) dtype=float32, numpy=array([ 0.00337315, -0.00417744,  0.00238962], dtype=float32)>

In [17]:
norm.moving_variance # 初始化值为1

<tf.Variable 'batch_normalization/moving_variance:0' shape=(3,) dtype=float32, numpy=array([1.0094459, 0.9946078, 0.9920891], dtype=float32)>

In [18]:
norm.beta

<tf.Variable 'batch_normalization/beta:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>

In [19]:
norm.gamma

<tf.Variable 'batch_normalization/gamma:0' shape=(3,) dtype=float32, numpy=array([1., 1., 1.], dtype=float32)>

参数的更新计算方式与下面的计算方式相同

In [20]:
norm.momentum * 0 + tf.nn.moments(a, axes=0)[0] * (1-norm.momentum)

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([ 0.00337315, -0.00417744,  0.00238962], dtype=float32)>

In [21]:
norm.momentum * 1. + tf.nn.moments(a, axes=0)[1] * (1-norm.momentum)

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([1.0094459, 0.9946078, 0.9920891], dtype=float32)>

##### `training=False`时计算方式

In [23]:
norm(a, False)

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[ 0.9872914 , -1.1144394 ,  0.5351159 ],
       [-1.491716  , -0.51541233,  0.563461  ],
       [ 1.5273933 , -0.233773  ,  0.72644037],
       [ 2.2284482 , -1.1030607 , -0.6589753 ],
       [-1.3068422 , -0.43592322,  0.1530718 ],
       [ 0.04869518,  0.9157424 ,  0.10525191]], dtype=float32)>

In [24]:
#等同于下面的计算：
(a-norm.moving_mean)/(norm.moving_variance+norm.epsilon)**0.5 * norm.gamma+norm.beta

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[ 0.98729146, -1.1144394 ,  0.53511584],
       [-1.491716  , -0.51541233,  0.56346095],
       [ 1.5273933 , -0.23377301,  0.7264403 ],
       [ 2.2284482 , -1.1030607 , -0.6589753 ],
       [-1.3068422 , -0.4359232 ,  0.1530718 ],
       [ 0.04869518,  0.91574246,  0.10525191]], dtype=float32)>

##### `training=True`时的计算方式

我猜测应该是按照下面的方式计算的，不过好像数值有一点差距，暂且不管了

In [115]:
(a-tf.nn.moments(a, axes=0)[0])/(tf.nn.moments(a, axes=0)[1] + norm.epsilon)**0.5 * norm.gamma + norm.beta

<tf.Tensor: id=1130, shape=(6, 3), dtype=float32, numpy=
array([[ 0.384481  , -0.63107145,  0.46387953],
       [ 0.4694243 , -0.13331562, -0.14479882],
       [-1.5926844 , -0.25282156,  1.5538405 ],
       [-0.37832326, -1.2666231 ,  0.35448807],
       [ 1.6368306 ,  0.33443055, -0.5074017 ],
       [-0.5197283 ,  1.9494011 , -1.7200075 ]], dtype=float32)>

In [116]:
norm(a, True)

<tf.Tensor: id=1159, shape=(6, 3), dtype=float32, numpy=
array([[ 0.384481  , -0.6310714 ,  0.46387953],
       [ 0.4694243 , -0.13331562, -0.14479882],
       [-1.5926844 , -0.25282156,  1.5538404 ],
       [-0.37832326, -1.2666233 ,  0.35448807],
       [ 1.6368306 ,  0.33443055, -0.5074017 ],
       [-0.5197283 ,  1.9494009 , -1.7200077 ]], dtype=float32)>

#### tf.nn.softmax

In [117]:
x = tf.Variable([[ 3., 11.,  6.],[ 6., 11., 16.]])
tf.nn.softmax(x)
# 相当于：tf.exp(x)/tf.expand_dims(tf.reduce_sum(tf.exp(x), axis=1), 1)

<tf.Tensor: id=1168, shape=(2, 3), dtype=float32, numpy=
array([[3.3310644e-04, 9.9297631e-01, 6.6906218e-03],
       [4.5094042e-05, 6.6925492e-03, 9.9326235e-01]], dtype=float32)>

#### tf.nn.softmax_cross_entropy_with_logits

```
tf.nn.softmax_cross_entropy_with_logits(labels, logits)
```
* 计算交叉熵，输入是softmax的输入，也就是说softmax的计算是在此函数的内部完成的；
* 注意返回是的一个batch的所有样本组成的向量，要求交叉熵，还要使用tf.reduce_sum；
* logits:神经网络最后一层的输出，维度是`[batch_size, num_classes]`，如果是单个样本那维度就是num_classes；
* labels:样本的实际标签，维度与上面相同

In [118]:
x = tf.Variable([[ 3., 11.,  6.],[ 6., 11., 16.]])
tf.nn.softmax(x)

<tf.Tensor: id=1177, shape=(2, 3), dtype=float32, numpy=
array([[3.3310644e-04, 9.9297631e-01, 6.6906218e-03],
       [4.5094042e-05, 6.6925492e-03, 9.9326235e-01]], dtype=float32)>

In [119]:
tf.nn.sparse_softmax_cross_entropy_with_logits([1,2], x)

<tf.Tensor: id=1181, shape=(2,), dtype=float32, numpy=array([0.0070485 , 0.00676046], dtype=float32)>

In [120]:
tf.nn.softmax_cross_entropy_with_logits(tf.one_hot([1,2], depth=3), x)

<tf.Tensor: id=1220, shape=(2,), dtype=float32, numpy=array([0.0070485 , 0.00676046], dtype=float32)>

In [121]:
- tf.reduce_sum(tf.math.log(tf.nn.softmax(x)) * tf.one_hot([1,2], depth=3), axis=1)

<tf.Tensor: id=1232, shape=(2,), dtype=float32, numpy=array([0.00704847, 0.00676045], dtype=float32)>

#### tf.nn.conv2d

```
tf.nn.conv2d(input, filters, strides, padding, data_format='NHWC', dilations=None, name=None)
```
* 第一个参数input，要求shape必须满足`[batch, in_height, in_weight, in_channels]`，具体的含义是`[训练时一个batch的图片数量，图片高度，图片宽度，图像通道数]`
* 第二个参数filters是卷积核，要求shape必须满足`[filter_height,filter_width,in_channels,out_channels]`，具体的含义是`[卷积核的高度，卷积核的宽度，图像的通道数，卷积核的个数]`
* 
* 第三个参数strides，卷积时在图像每一维的步长，这是一个一维张量，对于图片来说，`strides=[1, x, y, 1]，strides[0]==strides[3]==1`；
* 第四个参数padding，string类型的量，只能是“SAME”，“VALID”其中之一，这个值决定了不同的卷积方式， padding="SAME"表示有padding，前后补0，保证行列数不变，padding="VALID"表示不加padding；
* 第五个参数，use_cudnn_on_gpu: bool类型，是否使用cudnn加速；

In [122]:
t = layers.Conv2D(filters=2, kernel_size=[4,4])
x = tf.random.normal([6, 10, 10, 3])
t(x).shape

TensorShape([6, 7, 7, 2])

In [123]:
t.variables[0].shape # 卷积核权重的个数，每个卷积核大小为[height,width,in_channels]，卷积核的个数为out_channels

TensorShape([4, 4, 3, 2])

In [124]:
t.variables[1].shape # 卷积运算的bias个数，每个卷积核对应一个bias

TensorShape([2])

## tf.image

In [2]:
img = tf.io.read_file('data/TheStarryNight.jpg')
arr_img = tf.image.decode_jpeg(img)

```
tf.image.adjust_brightness(arr_img, 0.2)  # 计算方式：arr_img - 0.2 * 255, 调整亮度，负数的话是减小亮度
tf.image.adjust_contrast(arr_img, 0.2) # 调整对比度，相当于a=np.mean(arr_img.numpy(), axis=(0,1));tf.cast((0.2 * (arr_img.numpy() - a) + a), 'uint8')
tf.image.adjust_gamma(arr_img, gamma=0.2, gain=1）# 大概相当于tf.cast(255 * (arr_img/255)**0.2, 'uint8')
tf.image.random_crop(star, [1000,1000, 3])  # 随机切去[1000,1000]大小的图片，支持批量操作
tf.image.random_crop(tf.stack([star, star], 0), [2, 1000, 1000, 3])


```

```
tf.image.flip_left_right(img)  # 左右镜像对称
tf.image.rgb_to_grayscale  # 转换为灰度图
tf.image.rgb_to_hsv(image)  # image需在[0,1]范围内
tf.image.adjust_saturation(image, 3)  # 将image转换为hsv格式后，再将饱和度通道的值乘以3，再转换为rgb格式
tf.image.rot90  # 旋转90度
tf.image.central_crop(image, central_fraction=0.5) # 剪切，只留下中间50%
tf.image.convert_image_dtype(image, tf.float32) # Cast and normalize the image to [0,1]

```

## tf.feature_column

In [346]:
URL = 'https://storage.googleapis.com/applied-dl/heart.csv'
dataframe = pd.read_csv(URL)
dataframe.head(2)

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,1,145,233,1,2,150,0,2.3,3,0,fixed,0
1,67,1,4,160,286,0,2,108,1,1.5,2,3,normal,1


In [347]:
labels = dataframe.pop('target')
dataset = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
dataset = dataset.batch(8)
one_batch = next(dataset.as_numpy_iterator())[0]
one_batch

{'age': array([63, 67, 67, 37, 41, 56, 62, 57], dtype=int32),
 'sex': array([1, 1, 1, 1, 0, 1, 0, 0], dtype=int32),
 'cp': array([1, 4, 4, 3, 2, 2, 4, 4], dtype=int32),
 'trestbps': array([145, 160, 120, 130, 130, 120, 140, 120], dtype=int32),
 'chol': array([233, 286, 229, 250, 204, 236, 268, 354], dtype=int32),
 'fbs': array([1, 0, 0, 0, 0, 0, 0, 0], dtype=int32),
 'restecg': array([2, 2, 2, 0, 2, 0, 2, 0], dtype=int32),
 'thalach': array([150, 108, 129, 187, 172, 178, 160, 163], dtype=int32),
 'exang': array([0, 1, 1, 0, 0, 0, 0, 1], dtype=int32),
 'oldpeak': array([2.3, 1.5, 2.6, 3.5, 1.4, 0.8, 3.6, 0.6]),
 'slope': array([3, 2, 2, 3, 1, 1, 3, 1], dtype=int32),
 'ca': array([0, 3, 2, 0, 0, 0, 2, 0], dtype=int32),
 'thal': array([b'fixed', b'normal', b'reversible', b'normal', b'normal',
        b'normal', b'normal', b'normal'], dtype=object)}

#### tf.feature_column.numeric_column

In [354]:
one_age = tf.feature_column.numeric_column('age')
feature_layer = layers.DenseFeatures([one_age])

In [355]:
feature_layer(dict(age=[1,2], ppp=[3,4]))

<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[1.],
       [2.]], dtype=float32)>

In [357]:
feature_layer(one_batch)

<tf.Tensor: shape=(8, 1), dtype=float32, numpy=
array([[63.],
       [67.],
       [67.],
       [37.],
       [41.],
       [56.],
       [62.],
       [57.]], dtype=float32)>

#### tf.feature_column.bucketized_column

In [358]:
buck_age = tf.feature_column.bucketized_column(one_age, boundaries=[37, 40, 65, 67, 70]) # 输入是numeric_column
feature_layer = layers.DenseFeatures(buck_age)

In [360]:
feature_layer(one_batch) # 第一个区间是开区间，后面的是左闭右开

<tf.Tensor: shape=(8, 6), dtype=float32, numpy=
array([[0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.]], dtype=float32)>

#### tf.feature_column.categorical_column_with_vocabulary_list

#### tf.feature_column.indicator_column

#### tf.feature_column.embedding_column

DenseFeatures only accepts dense tensors, to inspect a categorical column you need to transform that to a indicator column first:

In [361]:
thal = tf.feature_column.categorical_column_with_vocabulary_list('thal',  ['fixed', 'normal', 'reversible'])
thal_one_hot = tf.feature_column.indicator_column(thal)
feature_layer = layers.DenseFeatures(thal_one_hot)

In [363]:
feature_layer(one_batch)

<tf.Tensor: shape=(8, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.]], dtype=float32)>

In [364]:
thal_embedding = tf.feature_column.embedding_column(thal, 3)
feature_layer = layers.DenseFeatures(thal_embedding)

In [366]:
feature_layer(one_batch)

<tf.Tensor: shape=(8, 3), dtype=float32, numpy=
array([[-0.5390322 , -0.44834587, -0.07046258],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [-0.5288149 ,  0.34261864,  0.02457438],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [ 0.2010104 ,  0.18482254, -0.7510743 ]], dtype=float32)>

In [377]:
features = layers.DenseFeatures([one_age, buck_age, thal_one_hot])

In [379]:
features(one_batch)

<tf.Tensor: shape=(8, 10), dtype=float32, numpy=
array([[63.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.,  0.],
       [67.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  0.],
       [67.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  1.],
       [37.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [41.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [56.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [62.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [57.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.]], dtype=float32)>