# TensorFlow2学习笔记

本笔记参考了下面的书籍、文献、博客或者官方说明：
* TensorFlow2官方文档：https://tensorflow.google.cn/
* 简单粗暴TensorFlow 2：https://github.com/snowkylin/tensorflow-handbook
* TensorFlow 2.0 学习笔记：https://zhuanlan.zhihu.com/p/74441082

未注明出处的代码示例，`大概`就是我自己编的，`大概`的意思就是也有极小的概率是忘记注明了。。。

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_datasets  as tfds
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras import layers
from tensorflow.keras import preprocessing as prep
from matplotlib import pyplot as plt

# Stateful Container

### Trackable

In [2]:
from tensorflow.python.training.tracking.base import Trackable

In [3]:
x = Trackable()
y = Trackable()
x._track_trackable(y, 'ccc') # x引用y，并且叫该引用命名为'ccc'，或者说x依赖y

<tensorflow.python.training.tracking.base.Trackable at 0x1493a9588>

In [4]:
x._lookup_dependency('ccc') is y  # 返回名称为'ccc'的引用

True

In [5]:
y

<tensorflow.python.training.tracking.base.Trackable at 0x1493a9588>

In [6]:
x._lookup_dependency('ccc')

<tensorflow.python.training.tracking.base.Trackable at 0x1493a9588>

In [7]:
del y

In [8]:
x._lookup_dependency('ccc')

<tensorflow.python.training.tracking.base.Trackable at 0x1493a9588>

可以看到删除y之后，不影响x对其引用。因此只要根节点x没有被回收，那么x所依赖的对象就不会被回收。

### AutoTrackable
AutoTrackabke类继承Trackable类，通过`__setattr__`和`__getattr__`属性拦截访问和设置新属性（访问和建立依赖关系）。

In [9]:
from tensorflow.python.training.tracking.tracking import AutoTrackable

In [10]:
x = AutoTrackable()
y = AutoTrackable()
x.ccc = y

In [11]:
x._lookup_dependency('ccc') is y

True

In [12]:
v = tf.Variable([1,2,3])

In [13]:
x.vvv = v

In [14]:
x._unconditional_checkpoint_dependencies

[TrackableReference(name='ccc', ref=<tensorflow.python.training.tracking.tracking.AutoTrackable object at 0x1493c15c0>),
 TrackableReference(name='vvv', ref=<tf.Variable 'Variable:0' shape=(3,) dtype=int32, numpy=array([1, 2, 3], dtype=int32)>)]

In [15]:
v

<tf.Variable 'Variable:0' shape=(3,) dtype=int32, numpy=array([1, 2, 3], dtype=int32)>

### 可以被保存的对象
**tf.Variable和MutableHashTable**  
tf.Variable类和MutableHashTable类是可以被保存的对象(用于tf.train.Checkpoint)，这两个类继承自Trackable类，并且覆盖了`_gather_saveables_for_checkpoint`方法，用tf.train.Checkpoint来保存。

In [16]:
from tensorflow.python.ops.lookup_ops import MutableHashTable

In [17]:
# 可以看到 x（AutoTrackable实例）的_gather_saveables_for_checkpoint方法并不会收集变量
x._gather_saveables_for_checkpoint()

{}

In [18]:
x.vvv._gather_saveables_for_checkpoint()

{'VARIABLE_VALUE': <tf.Variable 'Variable:0' shape=(3,) dtype=int32, numpy=array([1, 2, 3], dtype=int32)>}

实际上，Checkpoint使用了ObjectGraphView类，遍历整个DAG节点，并调用`_gather_saveables_for_checkpoint`方法类收集可以被保存的对象以及它们的依赖关系并存储。

### Restore-on-Creation

In [22]:
class MyModule(tf.Module):
    def assign(self, init=tf.constant([1., 2., 3.]), name=None):
        with self.name_scope:
          self.w = tf.Variable(init)
    def operate(self, value):
        self.w.assign_add(value)

m = MyModule(name='test')
m.assign()
m.operate([1., 1., 1.])
m.w

<tf.Variable 'test/Variable:0' shape=(3,) dtype=float32, numpy=array([2., 3., 4.], dtype=float32)>

In [23]:
ckpt = tf.train.Checkpoint(module=m)
ckpt.save('data/ckpt.save.test')

'data/ckpt.save.test-1'

In [24]:
module = MyModule(name='test')
try:
    module.w
except AttributeError as e:
    print("w doesn't exist.")
else:
    print("w already exists.")

w doesn't exist.


由于没用调用assign方法，可以看到w属性是不存在的。

In [25]:
ckpt = tf.train.Checkpoint(module=module)
ckpt.restore(tf.train.latest_checkpoint('data'))

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x1493ff080>

In [26]:
try:
    module.w
except AttributeError as e:
    print("w doesn't exist.")
else:
    print("w already exists.")

w doesn't exist.


可以看到由于w属性没有建立，因此restore之后，w依然是不存在的。但是当调用assign方法建立w属性的时候，restore就会起作用了，可以看到结果是restore得到的结果，并不是assign的参数所指定的`tf.constant([1., 1., 1.])`。  

**Restore-on-Creation机制就是在权重没有建立时，暂时不加载checkpoint保存的权重，一旦建立，则立即加载。**

In [27]:
module.assign(tf.constant([1., 1., 1.]))
module.w  # so you see...

<tf.Variable 'test/Variable:0' shape=(3,) dtype=float32, numpy=array([2., 3., 4.], dtype=float32)>

In [28]:
module.assign(tf.constant([2.,2.,2.]))
module.w

<tf.Variable 'test/Variable:0' shape=(3,) dtype=float32, numpy=array([2., 2., 2.], dtype=float32)>

### tf.Module

`tf.variables`：收集所有变量；  
`tf.trainable_variables`：收集所有可训练的变量；  
`tf.submodules`：收集所有子模块，也就是依赖或者引用的tf.Module实例。

> You can enter the name scope explicitly using `with self.name_scope:` or you can annotate methods(apart from `__init__`) with `@tf.Module.with_name_scope`.

注意使用`@tf.Module.with_name_scope`或者`with self.name_scope`，必须在`__init__`中调用`super().__init__`，以此来调用`tf.Module`类的构建函数`__init__`

In [29]:
class Dense(tf.Module):
  def __init__(self, input_features, output_features, name=None):
    super(Dense, self).__init__(name=name)
    with self.name_scope:
      self.w = tf.Variable(tf.random.normal([input_features, output_features], name='w'))
      self.b = tf.Variable(tf.zeros([output_features,]), name='b')
  @tf.Module.with_name_scope
  def __call__(self, x):
    self.test = tf.Variable([2.,3.], name='ahaha')
    y = tf.matmul(x, self.w) + self.b
    return tf.nn.relu(y)

d = Dense(input_features=5, output_features=3, name='Jason')
d(tf.ones([6, 5]))

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[1.38661  , 1.6494907, 2.1893508],
       [1.38661  , 1.6494907, 2.1893508],
       [1.38661  , 1.6494907, 2.1893508],
       [1.38661  , 1.6494907, 2.1893508],
       [1.38661  , 1.6494907, 2.1893508],
       [1.38661  , 1.6494907, 2.1893508]], dtype=float32)>

In [30]:
d.variables[0].name

'Jason/b:0'

In [31]:
d.name_scope.name

'Jason/'

In [32]:
d.name

'Jason'

In [33]:
d.test

<tf.Variable 'Jason/ahaha:0' shape=(2,) dtype=float32, numpy=array([2., 3.], dtype=float32)>

In [34]:
d.test = tf.Variable([1.,1.])

In [35]:
d.test

<tf.Variable 'Variable:0' shape=(2,) dtype=float32, numpy=array([1., 1.], dtype=float32)>

In [36]:
d(tf.ones([6, 5]))
d.test

<tf.Variable 'Jason/ahaha:0' shape=(2,) dtype=float32, numpy=array([2., 3.], dtype=float32)>

In [37]:
list(d._flatten())

['Jason',
 <tensorflow.python.framework.ops.name_scope_v2 at 0x149419208>,
 set(),
 True,
 -1,
 <tf.Variable 'Jason/b:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>,
 <tf.Variable 'Jason/ahaha:0' shape=(2,) dtype=float32, numpy=array([2., 3.], dtype=float32)>,
 <tf.Variable 'Jason/Variable:0' shape=(5, 3) dtype=float32, numpy=
 array([[ 2.2590709 ,  1.0981804 ,  0.7432608 ],
        [-1.3432873 ,  0.5018356 , -0.0742505 ],
        [ 0.1284043 , -0.6313724 , -0.18389162],
        [ 0.7372214 ,  0.5132439 ,  0.25380427],
        [-0.3947993 ,  0.16760309,  1.4504279 ]], dtype=float32)>]

# tf.function

### 基本特征

* tf.function 装饰器返回的是def_function.Function对象；
* Function对象是由一个个的ConcreteFunction函数组成；ConcreteFunction对象是由包含了FunctionGraph和structured_input_signature；
* FunctionGraph是tf.Graph的子类，strucured_input_signature是函数签名；
* 如果传入的参数是一个python值，则会对每一个遇到的pyhon值创建一个ConcreteFunction，实际上python值会成为Graph的一个固定的值，如果创建ConcreteFunction时，参数是一个python的引用，则此时引用的值就被固定在Graph中；
* 这也说明，如果是参数是可变了python值，那么，在函数中就不能运行原处改变的操作，因为该值已经被固定在Graph中了；

### 运行过程

1. 运行函数的每一行代码，代码分为两类：
  * 纯python代码；
  * tensorflow代码，如`tf.add`，以及可以转换为计算节点的python代码；  
运行的结果就是：纯python代码会与运行普通的python代码相同，tensorflow代码与可以转换为计算节点的python代码会构建为计算图。
2. 运行计算图一次
3. 基于函数的名字和输入的函数参数类型生成一个哈希值，并将计算的计算图缓存到一个哈希表中

**AutoGraph与if，while循环：**  
* for：如果iterable是张量，则转换；
* while：如果while条件是张量，则转换。

### 实例

In [38]:
@tf.function
def add(x, y):
    return tf.add(x, y)

In [39]:
add(tf.random.normal((2, 3)), tf.random.normal((3,)))

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[-0.29873416, -2.1888025 , -1.5106347 ],
       [ 0.49492085, -0.5007396 , -0.94732714]], dtype=float32)>

In [40]:
add(tf.random.normal((2, 6)), tf.random.normal((6,)))

<tf.Tensor: shape=(2, 6), dtype=float32, numpy=
array([[ 0.7217116 ,  0.2724216 , -0.14417058, -0.10751617,  0.96084595,
        -0.54465836],
       [ 0.20024541, -0.9428696 , -0.64654046, -0.47454974,  1.4712204 ,
        -0.38784763]], dtype=float32)>

In [41]:
add._list_all_concrete_functions_for_serialization()

[<ConcreteFunction add(x, y) at 0x1499B3B38>,
 <ConcreteFunction add(x, y) at 0x1499B3860>]

In [42]:
add(6,9)

<tf.Tensor: shape=(), dtype=int32, numpy=15>

In [43]:
add._list_all_concrete_functions_for_serialization()

[<ConcreteFunction add(x, y) at 0x1499B3B38>,
 <ConcreteFunction add(x, y) at 0x1499B3860>,
 <ConcreteFunction add(x=6, y=9) at 0x1499EE9E8>]

In [44]:
add._list_all_concrete_functions()  # 跟_list_all_concrete_functions_for_serialization的区别是啥？

[<ConcreteFunction add(x, y) at 0x1499B3B38>,
 <ConcreteFunction add(x, y) at 0x1499B3860>,
 <ConcreteFunction add(x, y) at 0x10EB87860>,
 <ConcreteFunction add(x=6, y=9) at 0x1499EE9E8>,
 <ConcreteFunction add(x, y) at 0x10EB8C2E8>]

In [45]:
add._list_all_concrete_functions_for_serialization()[0].structured_input_signature

((TensorSpec(shape=(2, 6), dtype=tf.float32, name='x'),
  TensorSpec(shape=(6,), dtype=tf.float32, name='y')),
 {})

In [46]:
add._list_all_concrete_functions_for_serialization()[1].structured_input_signature

((TensorSpec(shape=(2, 3), dtype=tf.float32, name='x'),
  TensorSpec(shape=(3,), dtype=tf.float32, name='y')),
 {})

In [47]:
add._list_all_concrete_functions_for_serialization()[2].structured_input_signature

((6, 9), {})

In [50]:
# 参数是python值所对应的ConcreteFunction函数不需要传入参数了，因为参数值已经固定在里面了
# 注意下标是python值6和9为参数的ConcreteFunction
add._list_all_concrete_functions_for_serialization()[2]()

<tf.Tensor: shape=(), dtype=int32, numpy=15>

In [51]:
add._list_all_concrete_functions_for_serialization()

[<ConcreteFunction add(x, y) at 0x1499B3B38>,
 <ConcreteFunction add(x, y) at 0x1499B3860>,
 <ConcreteFunction add(x=6, y=9) at 0x1499EE9E8>]

In [52]:
sig = add._list_all_concrete_functions_for_serialization()[0].structured_input_signature
sig

((TensorSpec(shape=(2, 6), dtype=tf.float32, name='x'),
  TensorSpec(shape=(6,), dtype=tf.float32, name='y')),
 {})

`.get_concrete_function`获取ConcreteFunction，奇怪的是两种方式获得ConcreteFunction并不相等

In [53]:
a = add.get_concrete_function(tf.TensorSpec(shape=[2,6], dtype=tf.float32), tf.TensorSpec(shape=[6,], dtype=tf.float32))
a

<ConcreteFunction add(x, y) at 0x1499FE6D8>

In [58]:
add._list_all_concrete_functions_for_serialization()[0].structured_input_signature

((TensorSpec(shape=(2, 6), dtype=tf.float32, name='x'),
  TensorSpec(shape=(6,), dtype=tf.float32, name='y')),
 {})

In [60]:
add._list_all_concrete_functions_for_serialization()[0]

<ConcreteFunction add(x, y) at 0x1499B3B38>

tf.function只允许在第一次调用函数时，创建tf.Variable；因此典型用法应当是在`__init__`方法中设置权重为`None`，然后在`build`方法中加以判断，如果权重为`None`，则初始化权重。

In [61]:
v = None

def f(x):
    global v
    if v is None:
      v = tf.Variable(x)
    return v
f = tf.function(f)

In [62]:
f._list_all_concrete_functions_for_serialization()

[]

In [63]:
f(tf.constant([2., 3., 4.]))

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([2., 3., 4.], dtype=float32)>

In [64]:
f(tf.constant([2., 3.]))

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([2., 3., 4.], dtype=float32)>

当我把v重新设置成None时，导致再次调用函数f时会试图创建variable，因此抛出异常。

In [65]:
try:
    v = None
    f(tf.constant([1.,2, 3.]))
except ValueError:
    print("ValueError when create variable non-first call")
else:
    print("isn't ok?")

ValueError when create variable non-first call


正确的用法应当是：

In [66]:
class MyModule(tf.Module):
    def __init__(self, name, units=10):
        super(MyModule, self).__init__(name=name)
        self.w = None
        self.b = None
        self.units = units
    @tf.Module.with_name_scope
    def build(self, input_shape):
        if self.w is None:
            self.w = tf.Variable(tf.random.normal([input_shape[-1], self.units]))
        if self.b is None:
            self.b = tf.Variable(tf.random.normal([self.units, ]))
    def call(self, input):
        return tf.matmul(input, self.w) + self.b
    @tf.function
    def __call__(self, input):
        self.build(input.shape)
        return self.call(input)

In [67]:
m = MyModule('testModule')
input = tf.random.normal([5,3])
m(input).shape

TensorShape([5, 10])

In [68]:
m.__call__._list_all_concrete_functions_for_serialization()[0].structured_input_signature

((TensorSpec(shape=(5, 3), dtype=tf.float32, name='input'),), {})

如果注释掉`build`方法中的两个`if`判断语句，导致`ValueError when create variable non-first call`

### 可变类型作为函数的参数

In [69]:
@tf.function
def f(x):
    print(x)
    # 这一行会导致错误，也就是说参数是可变类型的原处操作会导致运行错误
    # x.append(100) 
    return x[-1] + 100

In [70]:
x = [1.,2.]

In [71]:
f(x)

[1.0, 2.0]


<tf.Tensor: shape=(), dtype=float32, numpy=102.0>

In [72]:
f.get_concrete_function(x)()

<tf.Tensor: shape=(), dtype=float32, numpy=102.0>

In [73]:
f._list_all_concrete_functions_for_serialization()[0].structured_input_signature

(([1.0, 2.0],), {})

可以看到上面的例子说明：python的可变类型作为参数时，除了不能用原处操作的方法外，其他的和python值作为参数时是相同的。

下面这个例子来自于TensorFlow 2官方文档：

In [136]:
l = [] 
@tf.function 
def f(x): 
  for i in x: 
    print(i)
    l.append(i)    # Caution! Will only happen once when tracing 
f([1, 2, 3])
l

1
2
3


[1, 2, 3]

In [137]:
f([1,2,3])
l  # 第二次调用并没有改变l的值

[1, 2, 3]

In [138]:
l = []
f(tf.constant([1,2,3]))

Tensor("while/TensorArrayV2Read/TensorListGetItem:0", shape=(), dtype=int32)


In [139]:
l  # 换了参数类型后，print语句也不会再运行了，意外的是append语句居然执行了，得到了奇怪的结果

[<tf.Tensor 'while/TensorArrayV2Read/TensorListGetItem:0' shape=() dtype=int32>]

In [115]:
f._list_all_concrete_functions_for_serialization()[0]

<ConcreteFunction f(x) at 0x14AB559B0>

In [116]:
l

[<tf.Tensor 'while/TensorArrayV2Read/TensorListGetItem:0' shape=() dtype=int32>,
 <tf.Tensor 'while/TensorArrayV2Read/TensorListGetItem:0' shape=() dtype=int32>]

In [117]:
l = []
@tf.function
def f(a):
    for i in range(a):
        l.append(0)  # 只会在构建计算图时运行一次
        tf.print(a)  # 会成为计算图的一个计算节点，每次调用都会运行

In [93]:
f(3)
l

3
3
3


[0, 0, 0]

In [94]:
f(3)  # 第二次调用并不会改变list的值，因为第二次只会运行计算图
l

3
3
3


[0, 0, 0]

### 自定义类的序列化

In [118]:
class Person:
    def __init__(self, age):
        self.age = age

@tf.function
def f(year, p):
    print(year)
    return p.age + year

p = Person(100)

In [119]:
f(1, p)

1


<tf.Tensor: shape=(), dtype=int32, numpy=101>

In [120]:
f(2, p)

2


<tf.Tensor: shape=(), dtype=int32, numpy=102>

In [121]:
f(2,p)

<tf.Tensor: shape=(), dtype=int32, numpy=102>

In [122]:
f.get_concrete_function(2,p).structured_input_signature

((2, <tensorflow.python.framework.func_graph.UnknownArgument at 0x14adadfd0>),
 {})

可能是由于Person类并没有序列化，因此导致`_list_all_concrete_functions_for_serialization`并不能获取`ConcreteFunction`

In [124]:
f._list_all_concrete_functions_for_serialization()

INFO:tensorflow:Unsupported signature for serialization: ((1, <tensorflow.python.framework.func_graph.UnknownArgument object at 0x14ab55ac8>), {}).
INFO:tensorflow:Unsupported signature for serialization: ((2, <tensorflow.python.framework.func_graph.UnknownArgument object at 0x14adadfd0>), {}).


[]

In [67]:
@tf.function
def concat_with_padding():
    x = tf.zeros([5, 10])
    tf.print(x.shape)
    x = x[:4]
    tf.print(x.shape)
    for i in tf.range(4):
        x = tf.concat([x[:i], tf.ones([1, 10])], axis=0) # 循环时张量形状不能改变
        tf.print(x.shape)
        x.set_shape([4, 10])
        tf.print(x.shape)
    return x
concat_with_padding()

TensorShape([5, 10])
TensorShape([4, 10])
TensorShape([None, 10])
TensorShape([4, 10])
TensorShape([None, 10])
TensorShape([4, 10])
TensorShape([None, 10])
TensorShape([4, 10])
TensorShape([None, 10])
TensorShape([4, 10])


<tf.Tensor: id=535, shape=(4, 10), dtype=float32, numpy=
array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=float32)>

# tf.io

In [133]:
x = tf.constant([3.,6.])
a = tf.io.serialize_tensor(x)
a

<tf.Tensor: shape=(), dtype=string, numpy=b'\x08\x01\x12\x04\x12\x02\x08\x02"\x08\x00\x00@@\x00\x00\xc0@'>

In [134]:
tf.io.parse_tensor(a, tf.float32)

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([3., 6.], dtype=float32)>

# tf.data

### tf.data.Dataset

* `drop_remainder=True`：如果最后一个批次样本数不足，则弃之不用
* 常用顺序 `train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()`
* train.prefetch(2)  提前取出两个样本放入内存
* train.batch(20).prefetch(2)  提前取出2个批次放入内存，prefetch方法一般写在最后

#### tf.data.Dataset.range

In [68]:
a = tf.data.Dataset.range(1, 4)  # ==> [ 1, 2, 3]
b = tf.data.Dataset.range(4, 5)  # ==> [ 4,]
c = a.concatenate(b)
list(iter(c))

[<tf.Tensor: id=550, shape=(), dtype=int64, numpy=1>,
 <tf.Tensor: id=551, shape=(), dtype=int64, numpy=2>,
 <tf.Tensor: id=552, shape=(), dtype=int64, numpy=3>,
 <tf.Tensor: id=553, shape=(), dtype=int64, numpy=4>]

#### tf.data.Dataset.from_tensor_slices

In [131]:
a = tf.data.Dataset.from_tensor_slices((tf.random.normal([4, 3]), [99., 0, 1, 0]))
next(iter(a.enumerate()))

(<tf.Tensor: shape=(), dtype=int64, numpy=0>,
 (<tf.Tensor: shape=(3,), dtype=float32, numpy=array([-0.12994805,  0.32083076, -0.3990724 ], dtype=float32)>,
  <tf.Tensor: shape=(), dtype=float32, numpy=99.0>))

In [345]:
dataset = tf.data.Dataset.from_tensor_slices(({"a": [1, 2, 20], "b": [3, 4, 20], "c":[10,11, 20]}, [100,200,300]))
next(dataset.as_numpy_iterator())

({'a': 1, 'b': 3, 'c': 10}, 100)

##### 还有这个不同，啊哈

In [40]:
a = tf.data.Dataset.from_tensor_slices([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ])
next(iter(a))

<tf.Tensor: shape=(3,), dtype=int32, numpy=array([1, 2, 3], dtype=int32)>

In [71]:
a = tf.data.Dataset.from_tensor_slices(( [1, 2, 3], [4, 5, 6], [7, 8, 9] ))
next(iter(a))

(<tf.Tensor: id=592, shape=(), dtype=int32, numpy=1>,
 <tf.Tensor: id=593, shape=(), dtype=int32, numpy=4>,
 <tf.Tensor: id=594, shape=(), dtype=int32, numpy=7>)

#### tf.data.Dataset.from_generator

The constructor takes a callable as input, not an iterator. This allows it to restart the generator when it reaches the end. It takes an optional args argument, which is passed as the callable's arguments.

In [12]:
def count(stop):
  i = 0
  while i<stop:
    yield i
    i += 1
tf.data.Dataset.from_generator(count, args=[25], output_types=tf.int32, output_shapes = (), )

<FlatMapDataset shapes: (), types: tf.int32>

In [231]:
x = [[1, 2, 3, 4, 5, 6], [1, 2], [1, 2], [1, 2, 3, 4], [1, 2], [1, 2, 3]]

d = tf.data.Dataset.from_generator(lambda: x, tf.int32)

In [233]:
try:
    next(iter(d.batch(2)))
except:
    print("一个批次形状不一样，所以错误")

一个批次形状不一样，所以错误


In [240]:
dd = d.padded_batch(2, [-1])
next(iter(dd))

<tf.Tensor: id=33129, shape=(2, 6), dtype=int32, numpy=
array([[1, 2, 3, 4, 5, 6],
       [1, 2, 0, 0, 0, 0]], dtype=int32)>

In [34]:
tf.data.Dataset.from_generator(generator=lambda: [3,4,5], output_types=tf.int32)  # generator必须是callable，并返回支持iter方法的对象

<FlatMapDataset shapes: <unknown>, types: tf.int32>

#### tf.data.Dataset.map

Dataset.map方法是以图模式运行的，Dataset.map接受的是一个Tensor而不是EagerTensor，因此不能直接使用EagerTensor.numpy方法，如果要用.numpy方法，则需要tf.py_function包装。

In [17]:
elements = [(1, "foo"), (2, "bar"), (3, "baz)")]
dataset = tf.data.Dataset.from_generator(lambda: elements, (tf.int32, tf.string))
result = dataset.map(lambda x_int, y_str: y_str)
list(result.as_numpy_iterator())

[b'foo', b'bar', b'baz)']

In [3]:
elements =  ([{"a": 1, "b": "foo"}, {"a": 2, "b": "bar"}, {"a": 3, "b": "baz"}])
dataset = tf.data.Dataset.from_generator(lambda: elements, {"a": tf.int32, "b": tf.string})
result = dataset.map(lambda d: tf.strings.as_string(d["a"]) +'-' + d["b"])
tmp = list(result.as_numpy_iterator())
list(tmp)

[b'1-foo', b'2-bar', b'3-baz']

In [5]:
next(dataset.batch(2).as_numpy_iterator())

{'a': array([1, 2], dtype=int32), 'b': array([b'foo', b'bar'], dtype=object)}

In [121]:
def test(x): return x.numpy()

In [122]:
elements =  ([{"a": 1, "b": "foo"}, {"a": 2, "b": "bar"}, {"a": 3, "b": "baz"}])
dataset = tf.data.Dataset.from_generator(lambda: elements, {"a": tf.int32, "b": tf.string})
#result = dataset.map(lambda x: x['a'])
result = dataset.map(lambda x: tf.py_function(func=test, inp=[x['a']], Tout=tf.int32))
list(result.as_numpy_iterator())

[1, 2, 3]

#### tf.data.TFRecordDataset

In [44]:
n_observations = int(1e4)
feature0 = np.random.choice([False, True], n_observations)
feature1 = np.random.randint(0, 5, n_observations)
strings = np.array([b'cat', b'dog', b'chicken', b'horse', b'goat'])
feature2 = strings[feature1]
feature3 = np.random.randn(n_observations)

In [105]:
f0,f1,f2,f3 = feature0[0], feature1[0], feature2[0], feature3[0]
def serialize_example(f0,f1,f2,f3):
    f2 = isinstance(f2, type(tf.constant(0))) and f2.numpy() or f2  # BytesList won't unpack a string from an EagerTensor.
    feature = {
      'feature0': tf.train.Feature(int64_list=tf.train.Int64List(value=[f0])),
      'feature1': tf.train.Feature(int64_list=tf.train.Int64List(value=[f1])),
      'feature2': tf.train.Feature(bytes_list=tf.train.BytesList(value=[f2])),
      'feature3': tf.train.Feature(float_list=tf.train.FloatList(value=[f3])),
  }
    example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
    return example_proto.SerializeToString()

def tf_serialize_example(f0,f1,f2,f3):
    example_string = tf.py_function(serialize_example, (f0,f1,f2,f3), tf.string)
    #return tf.reshape(example_string, ())  # 这样子返回的是tf.string类型的Tensor
    return example_string  # 这样子返回的是bytes

In [93]:
tf.train.Feature(int64_list=tf.train.Int64List(value=[f0]))

int64_list {
  value: 0
}

##### 序列化Example对象与重建Example对象

In [94]:
feature = {
      'feature0': tf.train.Feature(int64_list=tf.train.Int64List(value=[f0])),
      'feature1': tf.train.Feature(int64_list=tf.train.Int64List(value=[f1])),
      'feature2': tf.train.Feature(bytes_list=tf.train.BytesList(value=[f2])),
      'feature3': tf.train.Feature(float_list=tf.train.FloatList(value=[f3])),
  }
example_proto = tf.train.Example(features=tf.train.Features(feature=feature))

In [95]:
a = example_proto.SerializeToString()
a

b'\nQ\n\x13\n\x08feature2\x12\x07\n\x05\n\x03cat\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\xfci\x94?\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x00'

In [96]:
tf.train.Example.FromString(a) == example_proto

True

In [97]:
example = tf.train.Example()
example.ParseFromString(a)
example == example_proto

True

##### TFRecord的Dataset写入文件

In [107]:
feature_description = {
    'feature0': tf.io.FixedLenFeature([], tf.int64, default_value=0),
    'feature1': tf.io.FixedLenFeature([], tf.int64, default_value=0),
    'feature2': tf.io.FixedLenFeature([], tf.string, default_value=''),
    'feature3': tf.io.FixedLenFeature([], tf.float32, default_value=0.0),
}

In [110]:
dataset = tf.data.Dataset.from_tensor_slices((feature0[:3], feature1[:3], feature2[:3], feature3[:3]))
dataset = dataset.map(tf_serialize_example)
writer1 = tf.data.experimental.TFRecordWriter("data/test_1.tfrecord")
writer1.write(dataset)
tf.io.parse_example(list(dataset), feature_description)

{'feature0': <tf.Tensor: shape=(3,), dtype=int64, numpy=array([0, 1, 0])>,
 'feature1': <tf.Tensor: shape=(3,), dtype=int64, numpy=array([0, 0, 3])>,
 'feature2': <tf.Tensor: shape=(3,), dtype=string, numpy=array([b'cat', b'cat', b'horse'], dtype=object)>,
 'feature3': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([ 1.1594844 , -0.17166306, -1.4441673 ], dtype=float32)>}

In [111]:
list(dataset)

[<tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x00\n\x13\n\x08feature2\x12\x07\n\x05\n\x03cat\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\xfci\x94?'>,
 <tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x00\n\x13\n\x08feature2\x12\x07\n\x05\n\x03cat\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04q\xc8/\xbe'>,
 <tf.Tensor: shape=(), dtype=string, numpy=b'\nS\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x03\n\x15\n\x08feature2\x12\t\n\x07\n\x05horse\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04y\xda\xb8\xbf'>]

In [112]:
dataset = tf.data.Dataset.from_tensor_slices((feature0[3:6], feature1[3:6], feature2[3:6], feature3[3:6]))
dataset = dataset.map(tf_serialize_example)
writer2 = tf.data.experimental.TFRecordWriter("data/test_2.tfrecord")
writer2.write(dataset)
tf.io.parse_example(list(dataset), feature_description)

{'feature0': <tf.Tensor: shape=(3,), dtype=int64, numpy=array([0, 1, 0])>,
 'feature1': <tf.Tensor: shape=(3,), dtype=int64, numpy=array([3, 0, 2])>,
 'feature2': <tf.Tensor: shape=(3,), dtype=string, numpy=array([b'horse', b'cat', b'chicken'], dtype=object)>,
 'feature3': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([ 0.71525556, -0.7446053 ,  1.4935725 ], dtype=float32)>}

##### 读取TFRecord文件到Dataset

In [137]:
files = ["data/test_2.tfrecord", "data/test_1.tfrecord"]
dataset = tf.data.TFRecordDataset(files)

tf.io.parse_example(list(dataset), feature_description)

{'feature0': <tf.Tensor: shape=(6,), dtype=int64, numpy=array([0, 1, 0, 0, 1, 0])>,
 'feature1': <tf.Tensor: shape=(6,), dtype=int64, numpy=array([3, 0, 2, 0, 0, 3])>,
 'feature2': <tf.Tensor: shape=(6,), dtype=string, numpy=
 array([b'horse', b'cat', b'chicken', b'cat', b'cat', b'horse'],
       dtype=object)>,
 'feature3': <tf.Tensor: shape=(6,), dtype=float32, numpy=
 array([ 0.71525556, -0.7446053 ,  1.4935725 ,  1.1594844 , -0.17166306,
        -1.4441673 ], dtype=float32)>}

In [138]:
def _parse_single_example(example):
    return tf.io.parse_single_example(example, feature_description)

In [139]:
list(dataset.map(_parse_single_example).take(2))

[{'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=0>,
  'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=3>,
  'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'horse'>,
  'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.71525556>},
 {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=1>,
  'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=0>,
  'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'cat'>,
  'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=-0.7446053>}]

In [152]:
d = dataset.map(_parse_single_example)

In [171]:
a = [tf.feature_column.numeric_column('feature1'), tf.feature_column.numeric_column('feature0')]

In [172]:
df = layers.DenseFeatures(a)

In [173]:
b = list(d.batch(2).take(1))
b

[{'feature0': <tf.Tensor: shape=(2,), dtype=int64, numpy=array([0, 1])>,
  'feature1': <tf.Tensor: shape=(2,), dtype=int64, numpy=array([3, 0])>,
  'feature2': <tf.Tensor: shape=(2,), dtype=string, numpy=array([b'horse', b'cat'], dtype=object)>,
  'feature3': <tf.Tensor: shape=(2,), dtype=float32, numpy=array([ 0.71525556, -0.7446053 ], dtype=float32)>}]

In [174]:
df(b[0])

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[0., 3.],
       [1., 0.]], dtype=float32)>

##### 写入文件的另一种方法

In [337]:
with tf.io.TFRecordWriter('data/test.tfrecord') as writer:
    for i in dataset:
        writer.write(i.numpy())

In [343]:
tf.io.parse_example(list(tf.data.TFRecordDataset(['data/test.tfrecord']).take(1)), feature_description)

{'feature0': <tf.Tensor: shape=(1,), dtype=int64, numpy=array([1])>,
 'feature1': <tf.Tensor: shape=(1,), dtype=int64, numpy=array([2])>,
 'feature2': <tf.Tensor: shape=(1,), dtype=string, numpy=array([b'chicken'], dtype=object)>,
 'feature3': <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.7652261], dtype=float32)>}

# tf.ragged

官方文档：https://tensorflow.google.cn/guide/ragged_tensor

In [117]:
digits = tf.ragged.constant([[3, 1, 4, 1], [], [5, 9, 2], [6], []])
words = tf.ragged.constant([["So", "long"], ["thanks", "for", "all", "the", "fish"]])
print(tf.add(digits, 3))
print(tf.reduce_mean(digits, axis=1))
print(tf.concat([digits, [[5, 3]]], axis=0))
print(tf.tile(digits, [1, 2]))
print(tf.strings.substr(words, 0, 2))

<tf.RaggedTensor [[6, 4, 7, 4], [], [8, 12, 5], [9], []]>
tf.Tensor([2.25              nan 5.33333333 6.                nan], shape=(5,), dtype=float64)
<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9, 2], [6], [], [5, 3]]>
<tf.RaggedTensor [[3, 1, 4, 1, 3, 1, 4, 1], [], [5, 9, 2, 5, 9, 2], [6, 6], []]>
<tf.RaggedTensor [[b'So', b'lo'], [b'th', b'fo', b'al', b'th', b'fi']]>


In [118]:
tf.ragged.map_flat_values(tf.math.square, digits)

<tf.RaggedTensor [[9, 1, 16, 1], [], [25, 81, 4], [36], []]>

In [120]:
digits.to_list()

[[3, 1, 4, 1], [], [5, 9, 2], [6], []]

In [130]:
tf.RaggedTensor.from_value_rowids(
    values=[3, 1, 4, 1, 5, 9, 2],
    value_rowids=[0, 0, 0, 0, 2, 2, 3])

<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9], [2]]>

In [131]:
tf.RaggedTensor.from_row_lengths(
    values=[3, 1, 4, 1, 5, 9, 2],
    row_lengths=[4, 0, 2, 1])

<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9], [2]]>

In [132]:
tf.RaggedTensor.from_row_splits(
    values=[3, 1, 4, 1, 5, 9, 2],
    row_splits=[0, 4, 4, 6, 7])

<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9], [2]]>

# tf.TensorArray

In [72]:
x = tf.TensorArray(dtype=tf.float32, size=3, infer_shape=False, clear_after_read=False)
a = tf.random.normal([3,2,2])
x.unstack(a)

<tensorflow.python.ops.tensor_array_ops.TensorArray at 0x141f0af98>

In [73]:
x.read(0)

<tf.Tensor: id=602, shape=(2, 2), dtype=float32, numpy=
array([[-0.10629629, -0.19466183],
       [-0.4460331 , -0.47419053]], dtype=float32)>

In [74]:
x.stack().shape

TensorShape([3, 2, 2])

In [75]:
x.gather([1,2])

<tf.Tensor: id=606, shape=(2, 2, 2), dtype=float32, numpy=
array([[[-1.2129303 ,  0.77940047],
        [-0.21802388,  0.98266596]],

       [[ 1.740146  , -0.34278297],
        [-1.3144659 ,  1.0175093 ]]], dtype=float32)>

In [76]:
y = x.scatter([2,1,0], a)
y.read(0)

<tf.Tensor: id=609, shape=(2, 2), dtype=float32, numpy=
array([[ 1.740146  , -0.34278297],
       [-1.3144659 ,  1.0175093 ]], dtype=float32)>

In [77]:
a = tf.random.normal([5,6])
x.split(a, [1,2,2]) # 长度分别是1，2，2

<tensorflow.python.ops.tensor_array_ops.TensorArray at 0x141f0a438>

In [78]:
x.read(0)

<tf.Tensor: id=620, shape=(1, 6), dtype=float32, numpy=
array([[-3.5588963 , -1.5443984 , -0.6560149 ,  0.4804773 , -0.9900909 ,
         0.86643726]], dtype=float32)>

In [79]:
x.read(1)

<tf.Tensor: id=621, shape=(2, 6), dtype=float32, numpy=
array([[-0.48379177,  1.3902569 ,  0.03354906, -0.9551902 ,  1.7645974 ,
        -0.33056656],
       [ 0.79169416, -0.74314564,  1.0771104 ,  0.33629403, -1.1552415 ,
         0.78788143]], dtype=float32)>

# tf.save_model

使用`tf.function`一章中的`MyModule`类的实例`m`展示

In [80]:
tf.saved_model.save(m, "data/modelDir")
tf.saved_model.load("data/modelDir")

<tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject at 0x1422696d8>

# tf.random

In [3]:
x = tf.random.categorical(tf.math.log([[0.1,0.3,0.6]]), 1000)  # 参数可以是log后的概率值，这个写法会以0.1，0.3，0.6的概率从多项式分布中随机抽取
x = tf.random.categorical(tf.math.log([[1., 2. , 6.]]), 1000) # 概率分别是[1., 2., 6.]/tf.reduce_sum([1., 2. 6.])
# 换一种说法就是，按照参数tf.exp变换后所占的比例作为随机抽样的概率，相当于对参数做了softmax

In [4]:
tf.reduce_sum(tf.cast(x==0, tf.int64))

<tf.Tensor: shape=(), dtype=int64, numpy=103>

# tf.train

#### tf.train.Checkpoint

Checkpoint只保存模型的参数，不保存模型的计算过程，因此一般用于在具有模型源码的时候恢复之前训练好的模型参数。
```python3
checkpoint = tf.train.Checkpoint(model=model)
checkpoint.save(save_path_with_prefix)
```
* 这里tf.train.Checkpoint接受的参数比较特殊，是一个\*\*kwargs。具体而言，是一系列键值对，键名可以随便起，值为需要保存的对象。
* `save_path_with_prefix`是保存文件的目录+前缀。例如在`checkpoint.save("./save/model.ckpt")`，在save目录下会建立三个文件：`checkpoint, model.ckpt-1.index, model.ckpt-1.data-00000-of-00001`，这些文件记录了变量信息。`checkpoint.save`可以运行多次，每次运行都会得到一个`.index`文件和`.data`文件，序号一次累加。

继续训练模型可以用一下方式实现：
```
checkpoint = tf.train.Checkpoint(myAwesomeModel=model, myAwesomeOptimizer=optimizer)
checkpoint.save(save_path_with_prefix)
model_to_be_restored = MyModel() 
checkpoint = tf.train.Checkpoint(myAwesomeModel=model_to_be_restored)
checkpoint.restore(save_path_with_prefix_and_index)
```
* `save_path_with_prefix_and_index`是之前保存到文件的目录+前缀+编号。例如，调用`checkpoint.restore("./save/model.ckpt-1")`，序号为1的文件来恢复模型。

```
tf.train.latest_checkpoint(save_path)
```
* 返回最近一次的checkpoint的文件名，比如返回`./save/model.ckpt-10`

In [81]:
tf.train.latest_checkpoint('data')

'data/ckpt.save.test-1'

#### tf.train.Feature

In [38]:
value_1 = tf.constant('aaaa')
#x0 = tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) 
# BytesList won't unpack a string from an EagerTensor.
if isinstance(value_1, type(tf.constant(0))):
    value_1 = value_1.numpy() # BytesList won't unpack a string from an EagerTensor.
a = tf.train.BytesList(value=[value_1, b'cccc'])
a

value: "aaaa"
value: "cccc"

In [39]:
x = tf.constant(['aaaa', 'cccc'])
x = x.numpy() 
x0 = tf.train.Feature(bytes_list=tf.train.BytesList(value=[b'aaaa', b'cccc']))
x0

bytes_list {
  value: "aaaa"
  value: "cccc"
}

In [50]:
value_2 = tf.constant([4., 5.])
x1 = tf.train.Feature(float_list=tf.train.FloatList(value=value_2))
value_3 = tf.constant([2, 3])
x2 = tf.train.Feature(int64_list=tf.train.Int64List(value=value_3))
feature = {'x1':x1, 'x2':x2, 'x0':x0}

In [51]:
tf.train.Features(feature=feature)

feature {
  key: "x0"
  value {
    bytes_list {
      value: "aaaa"
      value: "cccc"
    }
  }
}
feature {
  key: "x1"
  value {
    float_list {
      value: 4.0
      value: 5.0
    }
  }
}
feature {
  key: "x2"
  value {
    int64_list {
      value: 2
      value: 3
    }
  }
}

In [52]:
tf.train.Example(features=tf.train.Features(feature=feature))

features {
  feature {
    key: "x0"
    value {
      bytes_list {
        value: "aaaa"
        value: "cccc"
      }
    }
  }
  feature {
    key: "x1"
    value {
      float_list {
        value: 4.0
        value: 5.0
      }
    }
  }
  feature {
    key: "x2"
    value {
      int64_list {
        value: 2
        value: 3
      }
    }
  }
}

# tf.initializer

如果深度学习模型的权重初始化得太小，那信号将在每层间传递时逐渐缩小而难以产生作用；如果权重初始化的太大，那信号将在每层间传递时逐渐放大并导致发散和失效。  
Xavier初始化器让初始化权重满足均值为0，方差为$\frac{2}{N_{in}+N_{out}}$均匀分布或者高斯分布；

* `tf.initializers.glorot_normal()(shape=[20,30])`：创建 $N_{in}=20,N_{out}=30$ 服从正态分布的的初始化权重；
* `tf.initializers.glorot_uniform()(shape=[20,30]`：与上面相同，只是服从的是均匀分布。

也可以通过下面的api间接实现：  
* `tf.random_normal_initializer(mean=0.0,stddev=0.05)(shape=[])`
* `tf.random_uniform_initializer(minval=-0.05, maxval=0.05)(shape=[])`

# tf.linalg

```
matrix_band_part(input, num_lower, num_upper)
```
* num_lower: 下三角要保留的对角线数，-1表示全保留；num_upper类似

In [6]:
x = tf.random.normal([4,4])
x

<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[ 0.529435  ,  0.8286487 ,  1.837228  ,  0.09473267],
       [ 0.36858612,  1.0950916 ,  0.62635964,  1.1934665 ],
       [-0.7357971 ,  0.18043841, -0.19846489, -0.9738333 ],
       [-1.5683837 ,  2.8496115 ,  0.01234788, -0.76184636]],
      dtype=float32)>

In [9]:
tf.linalg.band_part(x, 0, -1)  # 变成下三角矩阵

<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[ 0.529435  ,  0.8286487 ,  1.837228  ,  0.09473267],
       [ 0.        ,  1.0950916 ,  0.62635964,  1.1934665 ],
       [ 0.        ,  0.        , -0.19846489, -0.9738333 ],
       [ 0.        ,  0.        ,  0.        , -0.76184636]],
      dtype=float32)>

# tf.math

In [82]:
tf.math.reduce_std # 标准差
tf.math.reduce_variance # 方差
tf.math.reduce_all
tf.math.reduce_any
tf.math.reduce_logsumexp # 相当于 tf.math.log(tf.reduce_sum(tf.exp(x)))
tf.math.argmin
tf.math.argmax

<function tensorflow.python.ops.math_ops.argmax_v2(input, axis=None, output_type=tf.int64, name=None)>

`tf.matmul(a, b)`  # 将最后两个维度用与矩阵乘法，前面的维度必须完全相同。

# tf.GradientTape

在tf.GradientTape上下文中执行的所有操作记录下来，用于计算梯度。默认情况下，tf.GradientTape持有的资源会在调用GradientTape.gradient()方法后立即释放。要在同一计算中计算多个梯度，需要创建一个持久梯度带，这允许多次调用gradient()方法。

In [83]:
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as t:
  t.watch(x)  # 由于x是常数，所以要调用调用watch方法，如果是Variable则不需要这一行
  y = x * x
  z = y * y
dz_dx = t.gradient(z, x)  # 108.0 (4*x^3 at x = 3)
dy_dx = t.gradient(y, x)  # 6.0
del t  # Drop the reference to the tape
dz_dx

<tf.Tensor: id=770, shape=(), dtype=float32, numpy=108.0>

在上下文中的梯度计算也会被记录下来，因此可以实现高阶梯度计算。

In [84]:
x = tf.Variable(1.0)
with tf.GradientTape() as t:
    with tf.GradientTape() as t2:
        y = x * x * x
    dy_dx = t2.gradient(y,x)
d2y_dx2 = t.gradient(dy_dx, x)

In [85]:
assert dy_dx.numpy() == 3.0
assert d2y_dx2.numpy() == 6.0

In [86]:
x = tf.Variable(1.)
with tf.GradientTape() as tape:
    y = x * 8.
    y = x * x
dydx = tape.gradient(y,x)

In [87]:
dydx

<tf.Tensor: id=824, shape=(), dtype=float32, numpy=2.0>

# tf.losses

```
tf.losses.sparse_categorical_crossentropy(y_true, y_pred, from_logits=False)
```
* 其中y_true是系数矩阵，直接的label，而不是one-hot向量
* from_logits=False时，y_pred是tf.nn.softmax输出结果，也就是每一个元素都是概率，每一行之和为1
* from_logits=True时，y_pred是上一层的输出结果，也就是说softmax(y_pred)运算在此函数内执行

两分类的交叉熵计算，可以看到下面三种计算方式结果一致：

In [88]:
tf.losses.binary_crossentropy([1,0,1], [0.9,0.3,0.7])

<tf.Tensor: id=849, shape=(), dtype=float32, numpy=0.27290332>

In [89]:
tf.reduce_mean(tf.losses.binary_crossentropy([[1],[0],[1]], [[0.9],[0.3],[0.7]]))

<tf.Tensor: id=876, shape=(), dtype=float32, numpy=0.27290332>

In [90]:
a = tf.multiply(tf.subtract(1.,[1,0,1.]), tf.math.log(tf.subtract(1.,[0.9,0.3,0.7])))
b = tf.multiply([1,0,1.], tf.math.log([0.9,0.3,0.7]))
-tf.reduce_mean(a+b)

<tf.Tensor: id=892, shape=(), dtype=float32, numpy=0.27290347>

# tf.metrics

tf.metrics.categorical_accuracy(y_true, y_pred)
* y_true是one-hot向量；y_pred是softmax输出

tf.metrics.sparse_categorical_accuracy(y_true, y_pred)
* y_true 是系数Tensor

In [91]:
a = tf.constant([1., 1, 0, 0])
b = tf.constant([0.98, 1, 0, 0.55])

In [92]:
tf.metrics.BinaryAccuracy(threshold=0.55)(a, b)

<tf.Tensor: id=922, shape=(), dtype=float32, numpy=1.0>

In [93]:
tf.metrics.binary_accuracy(a, tf.where(b>0.55, 1., 0))

<tf.Tensor: id=934, shape=(), dtype=float32, numpy=1.0>

In [94]:
tf.metrics.binary_accuracy(a, b, 0.55)

<tf.Tensor: id=941, shape=(), dtype=float32, numpy=1.0>

# tf.optimizer

```python
optimizer = tf.keras.optimizers.Adam()
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradient(zip(grads, model.trainable_variables))
# .apply_gradient(grads_and_vars)
# grads_and_vars: List of (gradient, variable) pairs
```

## tf.nn

#### tf.nn.top_k

In [95]:
a = tf.random.normal([6,3])
b = tf.constant([2,1,1,0,0,1])

In [96]:
tf.nn.top_k(a,  2)

TopKV2(values=<tf.Tensor: id=950, shape=(6, 2), dtype=float32, numpy=
array([[-0.16724938, -0.41634324],
       [ 0.053495  , -0.25219882],
       [-0.28541866, -0.3039618 ],
       [ 0.5407113 , -0.07260029],
       [ 0.83685905, -0.5499403 ],
       [ 0.864015  ,  0.46615827]], dtype=float32)>, indices=<tf.Tensor: id=951, shape=(6, 2), dtype=int32, numpy=
array([[1, 0],
       [1, 0],
       [2, 0],
       [0, 1],
       [2, 0],
       [0, 1]], dtype=int32)>)

In [97]:
tf.nn.in_top_k(b, a,  2)

<tf.Tensor: id=953, shape=(6,), dtype=bool, numpy=array([False,  True, False,  True,  True,  True])>

#### tf.nn.moment

```python
tf.nn.moments(x, axes, keep_dims=False)
# 若axes=[0,1,2]，则沿着[0,1,2]轴计算mean和variance
# keep_dims 返回的结果是否保持原来的维度
```

In [98]:
x = tf.random.normal([128, 32, 32, 64])
m, v = tf.nn.moments(x, [0,1,2], keepdims=True)
assert m.shape == [1,1,1,64]

In [99]:
# 相当于
m2 = tf.reduce_mean(x, axis=[0,1,2], keepdims=True)

In [100]:
tf.math.reduce_all(m == m2).numpy()

True

#### tf.nn.batch_normalization

```pthon
tf.nn.batch_normalization(x, mean, variance, offset, scale, variance_epsilion)
```
* mean, variance可以是`tf.nn.moments`的输出结果
* 计算公式：
```
tmp = (x-mean)/tf.sqrt(variance + variance_epsilon)
return tmp * scale + offset
```

#### tf.keras.layers.BatchNormalization

```python
tf.keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones')
```
* axis：要标准化的特征轴
* momentum：移动平均系数
* epsilon：同上面的variance_epsilon
* scale：是否乘上scale
* center：是否加上offset
* gamma：同上面scale
* beta：同上面offset

设当前层norm的状态为：移动平均值$\mu$，标准差$\sigma$；当前mini-batch样本的均值和标准差分别为：$\mu_i,\sigma_i$；  系数为$\alpha$，也就是1 - norm.momentum；则
$$
\mu = (1-\alpha)\mu + \alpha \mu_i \\
\sigma = (1-\alpha)\sigma + \alpha\sigma_i
$$

In [123]:
tmp = tf.random.normal([3, 4, 5])
norm = layers.BatchNormalization()
norm(tmp, training=True)
norm.moving_mean

<tf.Variable 'batch_normalization_1/moving_mean:0' shape=(5,) dtype=float32, numpy=
array([-5.5041215e-05, -2.2435044e-03,  4.9383932e-04,  3.6231871e-04,
        7.5543998e-04], dtype=float32)>

##### [参考这篇文献](https://arxiv.org/pdf/1702.03275.pdf)，其计算过程大概如下：

In [10]:
a = tf.random.normal([6,3])

In [11]:
a

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[ 0.9958078 , -1.1161667 ,  0.5356532 ],
       [-1.4961139 , -0.51845664,  0.5639002 ],
       [ 1.5387233 , -0.23743649,  0.7263154 ],
       [ 2.2434304 , -1.1048131 , -0.6543047 ],
       [-1.310277  , -0.43914226,  0.15493158],
       [ 0.05232201,  0.90955174,  0.10727721]], dtype=float32)>

In [12]:
tf.nn.moments(a, axes=0) # 计算均值和方差

(<tf.Tensor: shape=(3,), dtype=float32, numpy=array([ 0.33731544, -0.41774392,  0.23896213], dtype=float32)>,
 <tf.Tensor: shape=(3,), dtype=float32, numpy=array([1.9445854 , 0.46078062, 0.20890851], dtype=float32)>)

In [13]:
norm = layers.BatchNormalization()
norm(a, training=True)  #设置权重

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[ 0.47209084, -1.027781  ,  0.64757395],
       [-1.3144346 , -0.14820623,  0.70922744],
       [ 0.86132157,  0.26533574,  1.0637237 ],
       [ 1.366545  , -1.0110734 , -1.9496927 ],
       [-1.1812032 , -0.03148925, -0.18340966],
       [-0.20431943,  1.9532139 , -0.2874227 ]], dtype=float32)>

##### 参数

In [14]:
norm.momentum

0.99

In [15]:
norm.epsilon

0.001

In [16]:
norm.moving_mean # 初始化值为0

<tf.Variable 'batch_normalization/moving_mean:0' shape=(3,) dtype=float32, numpy=array([ 0.00337315, -0.00417744,  0.00238962], dtype=float32)>

In [17]:
norm.moving_variance # 初始化值为1

<tf.Variable 'batch_normalization/moving_variance:0' shape=(3,) dtype=float32, numpy=array([1.0094459, 0.9946078, 0.9920891], dtype=float32)>

In [18]:
norm.beta

<tf.Variable 'batch_normalization/beta:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>

In [19]:
norm.gamma

<tf.Variable 'batch_normalization/gamma:0' shape=(3,) dtype=float32, numpy=array([1., 1., 1.], dtype=float32)>

参数的更新计算方式与下面的计算方式相同

In [20]:
norm.momentum * 0 + tf.nn.moments(a, axes=0)[0] * (1-norm.momentum)

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([ 0.00337315, -0.00417744,  0.00238962], dtype=float32)>

In [21]:
norm.momentum * 1. + tf.nn.moments(a, axes=0)[1] * (1-norm.momentum)

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([1.0094459, 0.9946078, 0.9920891], dtype=float32)>

##### `training=False`时计算方式

In [23]:
norm(a, False)

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[ 0.9872914 , -1.1144394 ,  0.5351159 ],
       [-1.491716  , -0.51541233,  0.563461  ],
       [ 1.5273933 , -0.233773  ,  0.72644037],
       [ 2.2284482 , -1.1030607 , -0.6589753 ],
       [-1.3068422 , -0.43592322,  0.1530718 ],
       [ 0.04869518,  0.9157424 ,  0.10525191]], dtype=float32)>

In [24]:
#等同于下面的计算：
(a-norm.moving_mean)/(norm.moving_variance+norm.epsilon)**0.5 * norm.gamma+norm.beta

<tf.Tensor: shape=(6, 3), dtype=float32, numpy=
array([[ 0.98729146, -1.1144394 ,  0.53511584],
       [-1.491716  , -0.51541233,  0.56346095],
       [ 1.5273933 , -0.23377301,  0.7264403 ],
       [ 2.2284482 , -1.1030607 , -0.6589753 ],
       [-1.3068422 , -0.4359232 ,  0.1530718 ],
       [ 0.04869518,  0.91574246,  0.10525191]], dtype=float32)>

##### `training=True`时的计算方式

我猜测应该是按照下面的方式计算的，不过好像数值有一点差距，暂且不管了

In [115]:
(a-tf.nn.moments(a, axes=0)[0])/(tf.nn.moments(a, axes=0)[1] + norm.epsilon)**0.5 * norm.gamma + norm.beta

<tf.Tensor: id=1130, shape=(6, 3), dtype=float32, numpy=
array([[ 0.384481  , -0.63107145,  0.46387953],
       [ 0.4694243 , -0.13331562, -0.14479882],
       [-1.5926844 , -0.25282156,  1.5538405 ],
       [-0.37832326, -1.2666231 ,  0.35448807],
       [ 1.6368306 ,  0.33443055, -0.5074017 ],
       [-0.5197283 ,  1.9494011 , -1.7200075 ]], dtype=float32)>

In [116]:
norm(a, True)

<tf.Tensor: id=1159, shape=(6, 3), dtype=float32, numpy=
array([[ 0.384481  , -0.6310714 ,  0.46387953],
       [ 0.4694243 , -0.13331562, -0.14479882],
       [-1.5926844 , -0.25282156,  1.5538404 ],
       [-0.37832326, -1.2666233 ,  0.35448807],
       [ 1.6368306 ,  0.33443055, -0.5074017 ],
       [-0.5197283 ,  1.9494009 , -1.7200077 ]], dtype=float32)>

#### tf.nn.softmax

In [117]:
x = tf.Variable([[ 3., 11.,  6.],[ 6., 11., 16.]])
tf.nn.softmax(x)
# 相当于：tf.exp(x)/tf.expand_dims(tf.reduce_sum(tf.exp(x), axis=1), 1)

<tf.Tensor: id=1168, shape=(2, 3), dtype=float32, numpy=
array([[3.3310644e-04, 9.9297631e-01, 6.6906218e-03],
       [4.5094042e-05, 6.6925492e-03, 9.9326235e-01]], dtype=float32)>

#### tf.nn.softmax_cross_entropy_with_logits

```
tf.nn.softmax_cross_entropy_with_logits(labels, logits)
```
* 计算交叉熵，输入是softmax的输入，也就是说softmax的计算是在此函数的内部完成的；
* 注意返回是的一个batch的所有样本组成的向量，要求交叉熵，还要使用tf.reduce_sum；
* logits:神经网络最后一层的输出，维度是`[batch_size, num_classes]`，如果是单个样本那维度就是num_classes；
* labels:样本的实际标签，维度与上面相同

In [118]:
x = tf.Variable([[ 3., 11.,  6.],[ 6., 11., 16.]])
tf.nn.softmax(x)

<tf.Tensor: id=1177, shape=(2, 3), dtype=float32, numpy=
array([[3.3310644e-04, 9.9297631e-01, 6.6906218e-03],
       [4.5094042e-05, 6.6925492e-03, 9.9326235e-01]], dtype=float32)>

In [119]:
tf.nn.sparse_softmax_cross_entropy_with_logits([1,2], x)

<tf.Tensor: id=1181, shape=(2,), dtype=float32, numpy=array([0.0070485 , 0.00676046], dtype=float32)>

In [120]:
tf.nn.softmax_cross_entropy_with_logits(tf.one_hot([1,2], depth=3), x)

<tf.Tensor: id=1220, shape=(2,), dtype=float32, numpy=array([0.0070485 , 0.00676046], dtype=float32)>

In [121]:
- tf.reduce_sum(tf.math.log(tf.nn.softmax(x)) * tf.one_hot([1,2], depth=3), axis=1)

<tf.Tensor: id=1232, shape=(2,), dtype=float32, numpy=array([0.00704847, 0.00676045], dtype=float32)>

#### tf.nn.conv2d

```
tf.nn.conv2d(input, filters, strides, padding, data_format='NHWC', dilations=None, name=None)
```
* 第一个参数input，要求shape必须满足`[batch, in_height, in_weight, in_channels]`，具体的含义是`[训练时一个batch的图片数量，图片高度，图片宽度，图像通道数]`
* 第二个参数filters是卷积核，要求shape必须满足`[filter_height,filter_width,in_channels,out_channels]`，具体的含义是`[卷积核的高度，卷积核的宽度，图像的通道数，卷积核的个数]`
* 
* 第三个参数strides，卷积时在图像每一维的步长，这是一个一维张量，对于图片来说，`strides=[1, x, y, 1]，strides[0]==strides[3]==1`；
* 第四个参数padding，string类型的量，只能是“SAME”，“VALID”其中之一，这个值决定了不同的卷积方式， padding="SAME"表示有padding，前后补0，保证行列数不变，padding="VALID"表示不加padding；
* 第五个参数，use_cudnn_on_gpu: bool类型，是否使用cudnn加速；

In [122]:
t = layers.Conv2D(filters=2, kernel_size=[4,4])
x = tf.random.normal([6, 10, 10, 3])
t(x).shape

TensorShape([6, 7, 7, 2])

In [123]:
t.variables[0].shape # 卷积核权重的个数，每个卷积核大小为[height,width,in_channels]，卷积核的个数为out_channels

TensorShape([4, 4, 3, 2])

In [124]:
t.variables[1].shape # 卷积运算的bias个数，每个卷积核对应一个bias

TensorShape([2])

## tf.image

In [2]:
img = tf.io.read_file('data/TheStarryNight.jpg')
arr_img = tf.image.decode_jpeg(img)

```
tf.image.adjust_brightness(arr_img, 0.2)  # 计算方式：arr_img - 0.2 * 255, 调整亮度，负数的话是减小亮度
tf.image.adjust_contrast(arr_img, 0.2) # 调整对比度，相当于a=np.mean(arr_img.numpy(), axis=(0,1));tf.cast((0.2 * (arr_img.numpy() - a) + a), 'uint8')
tf.image.adjust_gamma(arr_img, gamma=0.2, gain=1）# 大概相当于tf.cast(255 * (arr_img/255)**0.2, 'uint8')
tf.image.random_crop(star, [1000,1000, 3])  # 随机切去[1000,1000]大小的图片，支持批量操作
tf.image.random_crop(tf.stack([star, star], 0), [2, 1000, 1000, 3])


```

```
tf.image.flip_left_right(img)  # 左右镜像对称
tf.image.rgb_to_grayscale  # 转换为灰度图
tf.image.rgb_to_hsv(image)  # image需在[0,1]范围内
tf.image.adjust_saturation(image, 3)  # 将image转换为hsv格式后，再将饱和度通道的值乘以3，再转换为rgb格式
tf.image.rot90  # 旋转90度
tf.image.central_crop(image, central_fraction=0.5) # 剪切，只留下中间50%
tf.image.convert_image_dtype(image, tf.float32) # Cast and normalize the image to [0,1]

```

## tf.feature_column

In [346]:
URL = 'https://storage.googleapis.com/applied-dl/heart.csv'
dataframe = pd.read_csv(URL)
dataframe.head(2)

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,1,145,233,1,2,150,0,2.3,3,0,fixed,0
1,67,1,4,160,286,0,2,108,1,1.5,2,3,normal,1


In [347]:
labels = dataframe.pop('target')
dataset = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
dataset = dataset.batch(8)
one_batch = next(dataset.as_numpy_iterator())[0]
one_batch

{'age': array([63, 67, 67, 37, 41, 56, 62, 57], dtype=int32),
 'sex': array([1, 1, 1, 1, 0, 1, 0, 0], dtype=int32),
 'cp': array([1, 4, 4, 3, 2, 2, 4, 4], dtype=int32),
 'trestbps': array([145, 160, 120, 130, 130, 120, 140, 120], dtype=int32),
 'chol': array([233, 286, 229, 250, 204, 236, 268, 354], dtype=int32),
 'fbs': array([1, 0, 0, 0, 0, 0, 0, 0], dtype=int32),
 'restecg': array([2, 2, 2, 0, 2, 0, 2, 0], dtype=int32),
 'thalach': array([150, 108, 129, 187, 172, 178, 160, 163], dtype=int32),
 'exang': array([0, 1, 1, 0, 0, 0, 0, 1], dtype=int32),
 'oldpeak': array([2.3, 1.5, 2.6, 3.5, 1.4, 0.8, 3.6, 0.6]),
 'slope': array([3, 2, 2, 3, 1, 1, 3, 1], dtype=int32),
 'ca': array([0, 3, 2, 0, 0, 0, 2, 0], dtype=int32),
 'thal': array([b'fixed', b'normal', b'reversible', b'normal', b'normal',
        b'normal', b'normal', b'normal'], dtype=object)}

#### tf.feature_column.numeric_column

In [354]:
one_age = tf.feature_column.numeric_column('age')
feature_layer = layers.DenseFeatures([one_age])

In [355]:
feature_layer(dict(age=[1,2], ppp=[3,4]))

<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[1.],
       [2.]], dtype=float32)>

In [357]:
feature_layer(one_batch)

<tf.Tensor: shape=(8, 1), dtype=float32, numpy=
array([[63.],
       [67.],
       [67.],
       [37.],
       [41.],
       [56.],
       [62.],
       [57.]], dtype=float32)>

#### tf.feature_column.bucketized_column

In [358]:
buck_age = tf.feature_column.bucketized_column(one_age, boundaries=[37, 40, 65, 67, 70]) # 输入是numeric_column
feature_layer = layers.DenseFeatures(buck_age)

In [360]:
feature_layer(one_batch) # 第一个区间是开区间，后面的是左闭右开

<tf.Tensor: shape=(8, 6), dtype=float32, numpy=
array([[0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.]], dtype=float32)>

#### tf.feature_column.categorical_column_with_vocabulary_list

#### tf.feature_column.indicator_column

#### tf.feature_column.embedding_column

DenseFeatures only accepts dense tensors, to inspect a categorical column you need to transform that to a indicator column first:

In [361]:
thal = tf.feature_column.categorical_column_with_vocabulary_list('thal',  ['fixed', 'normal', 'reversible'])
thal_one_hot = tf.feature_column.indicator_column(thal)
feature_layer = layers.DenseFeatures(thal_one_hot)

In [363]:
feature_layer(one_batch)

<tf.Tensor: shape=(8, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.]], dtype=float32)>

In [364]:
thal_embedding = tf.feature_column.embedding_column(thal, 3)
feature_layer = layers.DenseFeatures(thal_embedding)

In [366]:
feature_layer(one_batch)

<tf.Tensor: shape=(8, 3), dtype=float32, numpy=
array([[-0.5390322 , -0.44834587, -0.07046258],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [-0.5288149 ,  0.34261864,  0.02457438],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [ 0.2010104 ,  0.18482254, -0.7510743 ],
       [ 0.2010104 ,  0.18482254, -0.7510743 ]], dtype=float32)>

In [377]:
features = layers.DenseFeatures([one_age, buck_age, thal_one_hot])

In [379]:
features(one_batch)

<tf.Tensor: shape=(8, 10), dtype=float32, numpy=
array([[63.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.,  0.],
       [67.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  0.],
       [67.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  1.],
       [37.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [41.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [56.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [62.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [57.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.]], dtype=float32)>

# Keras

In [3]:
from tensorflow import keras

### toy dataset

In [1]:
def process_toy(x, y):
    x = tf.cast(x, tf.float32)/255.0
    y = tf.cast(y, tf.int64)
    return x,y
def toy_dataset(n):
    (x,y), _ = keras.datasets.mnist.load_data()
    idx = np.random.choice(np.arange(x.shape[0]), n, replace=False)
    x,y = x[idx], y[idx]
    x = tf.expand_dims(x, 3)
    train_data = tf.data.Dataset.from_tensor_slices((x, y))
    train_data = train_data.map(process_toy).repeat()
    return train_data.shuffle(64).batch(32)

## 函数式API

In [2]:
inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(filters=8, kernel_size=[4,4], activation='relu')(inputs)
x = layers.Flatten()(x)
x = layers.Dense(32, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)
model = keras.Model(inputs=inputs, outputs=outputs, name='KerasFunctionAPIModel')

NameError: name 'keras' is not defined

## Sequential

In [127]:
model = keras.Sequential()
model.add(layers.Conv2D(filters=8, kernel_size=[4,4], activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

In [128]:
train_data = toy_dataset(100)
test_data = toy_dataset(100)
model.compile(optimizer=tf.optimizers.Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_data, epochs=2, steps_per_epoch=3, validation_data=test_data, validation_steps=3)

Train for 3 steps, validate for 3 steps
Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x14b1dc048>

## subclass

为了把Layer在此一并讲清楚，先定制一个Linear层：

In [129]:
class MyDense(keras.layers.Layer):
    def __init__(self, units, name="MyDense"):
        super(MyDense, self).__init__(name=name)
        self.units = units
    def build(self, input_shape):
        self.w = self.add_weight("w", shape=[int(input_shape[-1]), self.units],
                                 initializer=tf.initializers.glorot_normal(),
                                 trainable=True, regularizer=keras.regularizers.l1(0.001),
                                 )
        self.b = self.add_weight("b", shape=[self.units, ],
                                 initializer=tf.initializers.glorot_uniform(),
                                 trainable=True,
                                 )
    @tf.function
    def call(self, inputs):
        y = tf.add(tf.matmul(inputs, self.w) , self.b)
        return y

In [130]:
class MyModel(tf.keras.Model):
    def __init__(self, name="MyModel", **kwargs):
        super(MyModel, self).__init__(name=name, **kwargs)
        self.conv = layers.Conv2D(filters=8, kernel_size=3, activation='relu', name='conv')
        self.flatten = layers.Flatten(name='flatten')
        self.mydense = MyDense(64, "mydense")
        self.dense = layers.Dense(32, activation='relu',
                                  use_bias=True,
                                  bias_initializer=tf.initializers.glorot_uniform(),
                                  kernel_regularizer=keras.regularizers.l2(0.01),
                                  bias_regularizer=keras.regularizers.l2(0.01),
                                  name="dense")
        self.dropout = tf.keras.layers.Dropout(0.5, name='dropout')
        self.y = layers.Dense(10, 'softmax', name='y')
    @tf.function
    def call(self, inputs, training=False):
        conv = self.conv(inputs)
        flatten = self.flatten(conv)
        mydense = self.mydense(flatten)
        dense = self.dense(mydense)
        dropout = self.dropout(dense, training=training)
        y = self.y(dropout)
        return y

In [131]:
model = MyModel("HHH")
train_data = toy_dataset(100)
a = next(iter(train_data))[0]
model(a).shape

TensorShape([32, 10])

In [132]:
tf.saved_model.save(model, "test/minimodel")

W0221 22:17:23.362029 4642928064 save_impl.py:77] Skipping full serialization of Keras model <__main__.MyModel object at 0x141f03cc0>, because its inputs are not defined.


In [133]:
model.summary()

Model: "HHH"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv (Conv2D)                multiple                  80        
_________________________________________________________________
flatten (Flatten)            multiple                  0         
_________________________________________________________________
mydense (MyDense)            multiple                  346176    
_________________________________________________________________
dense (Dense)                multiple                  2080      
_________________________________________________________________
dropout (Dropout)            multiple                  0         
_________________________________________________________________
y (Dense)                    multiple                  330       
Total params: 348,666
Trainable params: 348,666
Non-trainable params: 0
_________________________________________________________

In [134]:
model.losses

[<tf.Tensor: id=2761, shape=(), dtype=float32, numpy=5.4401956>,
 <tf.Tensor: id=2769, shape=(), dtype=float32, numpy=0.4249903>,
 <tf.Tensor: id=2777, shape=(), dtype=float32, numpy=0.010632831>]

In [135]:
model.get_layer('mydense').losses

[<tf.Tensor: id=2785, shape=(), dtype=float32, numpy=5.4401956>]

In [136]:
model.get_layer('mydense').get_weights()[0].shape

(5408, 64)

**关于定制的层**  
* `__init__`：可以执行与输入无关的初始化；
* `build`：按照输入张量的shape初始化权重，也可以进行其他的初始化；
* `call`：进行正向计算。


* 第一次调用`__call__`时会首先调用`build`，建立权重；之后调用`call`进行运算；  
* `call`不会自动调用`build`，因此在手动调用`call`之前必须保证权重张量已经存在了；
* 用`build`而不是`__init__`初始化权重是好处是：可以不必过早的指定输入数据的维度，而是在需要计算的时候指定输入数据，再根据输入数据确定权重的shape，初始化权重。也就是可以直到调用`model.fit`方法进行训练时，才根据输入数据的shape初始化权重。

**关于Layer的help信息如下，help(tf.keras.layers.Layer)：**
* `__init__()`: Save configuration in member variables
* `build()`: Called once from `__call__`, when we know the shapes of inputs and `dtype`. Should have the calls to `add_weight()`, and then call the super's `build()` (which sets `self.built = True`, which is nice in case the user wants to call `build()` manually before the first `__call__`).
 * `call()`: Called in `__call__` after making sure `build()` has been called once. Should actually perform the logic of applying the layer to the input tensors (which should be passed in as the first argument).

## layers

#### layers.Dense

```python
tf.keras.layers.Dense(units, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None)
```
* units: 输出神经元个数
* activation：激活函数
* use_bias: Boolean, whether the layer uses a bias vector
* kernel_initializer: Initializer for the `kernel` weights matrix.
* bias_initializer: Initializer for the bias vector.
* kernel_regularizer: Regularizer function applied to the `kernel` weights matrix.
* bias_regularizer: Regularizer function applied to the bias vector.
* `Dense` implements the operation: `output = activation(dot(input, kernel) + bias)`；因此如果是输入没有经过Flatten，也就是多个维度，那么输出本层相当于进行了一次$1\times1$的卷积；实际上kernel就是一个两个维度的矩阵，第一个维度由输入决定，第二个维度units决定；输入和输出的差别就是最后一个维度不同：$input.shape[-1] \to units$
* input一行是一个样本，那么kernel的行数就是输入神经元的个数，kernel的列数就是输出神经元的个数。
* Input shape: (batch_size, input_dim)
* Output shape: (batch_size, units)
* `__call__(self, inputs, *args, **kwargs)`方法：会调用`call`方法，返回输出张量
* 如果本层在第一层，可以通过参数`input_dim=512`指定输入的长度

#### layers.Embedding

```python
tf.keras.layers.Embedding(input_dim, output_dim, embeddings_initializer='uniform',input_length=None)
```
* `input_dim`是单词表的长度+1，`output_dim`是嵌入向量的长度，`input_length`：仅截取每个样本的前`input_length`个词
* 输入是一个batch的数据，其中每个样本的每个单词是其在单词表中的下标；输出增加了一个维度，就是单词用嵌入向量表示了
* `.get_weights()`返回该层的参数，`shape=[input_dim, output_dim]`，每一行代表了一个单词的嵌入向量（跳字模型的中心词向量）
* input_length: Length of input sequences, when it is constant. This argument is required if you are going to connect `Flatten`, then `Dense` layers upstream (without it, the shape of the dense outputs cannot be computed).

In [137]:
embed = layers.Embedding(10, 3, input_length=8)
x = np.random.randint(8, size=[20, 10])
embed(x).shape

TensorShape([20, 10, 3])

#### layers.SimpleRNN

```python
layers.SimpleRNN(units, return_sequences)
```
* 零时间步的初始状态为全0向量，之后的每一步输出是下一个时间步的状态；
* rnn的每一个时间步的输入shape是`[batch_size,timesteps, input_features]`
* 输出的shape是`[batch_size, output_features]`，`output_features`就是参数中的units
* 如果参数设置为`return_sequences=True`，则输出的shape为`[batch_size, timesteps, output_features]`，也就是每个时间步都输出，如果作为中间层，则应该设置为`return_sequences=True`
* `self.build(input_shape=[])`，初始化权重，第一个权重的第一个维度是`input_shape[-1]`，第二个维度是输出的维度units

In [138]:
rnn = layers.SimpleRNN(5, return_sequences=True)
x = tf.random.normal([6, 8, 3])
rnn(x).shape

TensorShape([6, 8, 5])

In [139]:
for i in rnn.trainable_variables:
    print(i.shape)

(3, 5)
(5, 5)
(5,)


In [140]:
# 下面是第一个样本，第二个时间步的计算过程，数值稍微有点出入，且不深究
a = np.dot(x[0,1], rnn.get_weights()[0])
b = np.dot(rnn(x)[0,0], rnn.get_weights()[1])
r = tf.nn.tanh(a+b+rnn.get_weights()[2])
r

<tf.Tensor: id=3142, shape=(5,), dtype=float32, numpy=
array([-0.8712967 , -0.09798537, -0.9270312 , -0.39548007,  0.7941837 ],
      dtype=float32)>

In [141]:
rnn(x)[0,1]

<tf.Tensor: id=3282, shape=(5,), dtype=float32, numpy=
array([-0.8712967 , -0.09798539, -0.9270312 , -0.39548007,  0.7941837 ],
      dtype=float32)>

#### layers.GRU

#### layers.LSTM

stateful=True: 上一批次的第i个样本的输出状态（包括输出和“传送带”），作为下一个批次第i个样本的起始状态（包括隐藏状态和“传送带”）。这也导致了当设置stateful=True时，batch size必须是固定的，如果需要改变batch size的大小，可以考虑checkpoint保存权重，重新建立模型，再加载权重。


If a RNN is stateful, it needs to know its batch size. Specify the batch size of your input tensors:
- If using a Sequential model, specify the batch size by passing a `batch_input_shape` argument to your first layer.
- If using the functional API, specify the batch size by passing a `batch_shape` argument to your Input layer.
- 下面的测试说明，如果是用subclass则不需要任何改变

model.reset_states() 或者lstm.reset_states() 可以将状态设置为全0.

In [51]:
class TestStateful(keras.Model):
    def __init__(self, name='TestStateful', **kwargs):
        super().__init__(name=name, **kwargs)
        self.embed = layers.Embedding(10, 3)
        self.lstm = layers.LSTM(4, return_sequences=True, stateful=True)
        #self.lstm = layers.LSTM(4, return_sequences=True, stateful=False)
    def call(self, inputs, training=False):
        embed = self.embed(inputs)
        lstm = self.lstm(embed)
        return lstm

In [52]:
model = TestStateful()
x = np.random.randint(8, size=[20, 10])
model(x).shape

TensorShape([20, 10, 4])

In [53]:
tf.concat(model.lstm.weights[:2], axis=0).shape  # 实际上这就是4个权重变量，axis=0就是把输入和隐藏状态的concate 做点积

TensorShape([7, 16])

In [54]:
len(model.lstm.states)

2

In [55]:
np.all(model(x)[:,-1,:] == model.lstm.states[0])  # states的第一个值就是隐藏状态，第二个值应该是“传送带”

True

可以看到有两个状态，应该一个是最后一个输出，另一个是“传送带”

In [56]:
lstm = layers.LSTM(4, return_state=True, return_sequences=True, stateful=True)
x = tf.random.normal([20, 10, 3])

In [57]:
lstm(x)[1].shape

TensorShape([20, 4])

In [61]:
tf.reduce_all(lstm(x)[1] == lstm.states[0])

<tf.Tensor: shape=(), dtype=bool, numpy=True>

#### layers.Bidirectional

双向RNN利用的RNN的顺序敏感性：它包含两个普通RNN，每个RNN分别沿一个方向对输入序列进行处理（时间正序和时间逆序），然后将它们的表示合并到一起（concat）。通过沿这两个方向处理序列，双向RNN能捕捉到可能被单向RNN忽略的模式。

#### layers.Conv1D

```python
layers.Conv1D(filters, kernel_size, strides=1, padding='valid', data_format='channels_last', dilation_rate=1, activation=None)
```
* 一维卷积神经网络用于文本和序列
* 输入的形状是(batch_size, timesteps, features)，在时间轴上做卷积；

#### layers.SeparableConv2D

* 深度可分离卷积层：每个层分别进行卷积操作，卷积结果concatenate到一起形成多个层，再用pointwise（$1\times 1$）卷积，将各个通道混合。
* 这么做相当于是把空间特征学习和通道特征学习分开，如果你假设输入中的空间位置高度相关，但不同的通道之间相互独立，那么这么做是很有意义的。

In [125]:
tmp = tf.random.normal([8, 10, 10, 3])
test = layers.SeparableConv2D(filters=16, kernel_size=4, strides=2, padding='same')
test(tmp).shape

TensorShape([8, 5, 5, 16])

#### layers.MaxPooling

```python
layers.MaxPooling(pool_size=2, strides=None, padding='valid', data_format='channels_last')
```

#### layers.GlobalAveragePooling1D

```python
layers.GlobalAveragePooling1D(data_format='channels_last')
```
* 默认输入为`[batch,timesteps,features]`，在`timesteps`维度上做池化；形象点就是在这一批次的词中，在每一个词向量的同一维度上做池化

#### layers.Conv2DTranspose

原理参考[这篇文献](https://arxiv.org/pdf/1603.07285.pdf)，在第20页，4.1节

In [7]:
tmp = tf.random.normal([1, 4, 4, 1])
test = layers.Conv2DTranspose(2, 3, strides=1, padding='same')
test(tmp).shape

TensorShape([1, 4, 4, 2])

In [9]:
test.variables[0].shape  # 就是想说明实际上参数还是kernel

TensorShape([3, 3, 2, 1])

# keras.preprocessing

## image

### PIL

In [390]:
from PIL import Image
from PIL import ImageDraw

PIL和keras对于图片的坐标系统都是左上角为（0，0），但是keras的坐标点是（y_height, x_width)，而PIL的坐标点是（x_width, y_height)

In [381]:
kimg = tf.keras.preprocessing.image.load_img('data/TheStarryNight.jpg')
kimg.size

(3359, 2304)

In [382]:
kimg.width, kimg.height

(3359, 2304)

In [383]:
tf.keras.preprocessing.image.img_to_array(kimg)[100, 3000]

array([236., 179.,  46.], dtype=float32)

In [386]:
pimg = Image.open("data/TheStarryNight.jpg")
pimg.size

(3359, 2304)

In [387]:
pimg.width,pimg.height

(3359, 2304)

In [389]:
pimg.getpixel((3000, 100))

(236, 179, 46)

In [468]:
np.array(pimg)[100, 3000]

array([236, 179,  46], dtype=uint8)

In [463]:
newimg = Image.new ("RGB", (300, 300), (255, 0, 0))
draw = ImageDraw.Draw(newimg)

In [464]:
draw.chord((10, 50, 40, 100), 0, 360, fill='green')
draw.chord((150,150, 200,200), 0, 360)
draw.rectangle((150, 150, 200, 200))
draw.text((150, 150), "HelloWorld", fill='blue')
#newimg

In [476]:
#help(imgs.flow_from_directory)

### ImageDataGenerator

In [17]:
fnames = ['/Users/user/.keras/datasets/flower_photos/roses', '/Users/user/.keras/datasets/flower_photos/sunflowers']
imgs = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, 
                                          rotation_range=40, #随机旋转角度
                                          width_shift_range=0.2, height_shift_range=0.2, # 水平或垂直方向平移的距离（相对于总宽度或总高度的比例）
                                          shear_range=0.2, # 随机错切变换的角度
                                          zoom_range=0.2, # 图像随机缩放的范围
                                          horizontal_flip=True, #随机将一半的图片水平翻转
                                          fill_mode='nearest')
imgs_generator = imgs.flow_from_directory('/Users/user/.keras/datasets/flower_photos', target_size=(150, 150), 
                                          batch_size=20, class_mode='categorical')


Found 3670 images belonging to 5 classes.


In [18]:
imgs_generator.class_indices

{'daisy': 0, 'dandelion': 1, 'roses': 2, 'sunflowers': 3, 'tulips': 4}

In [19]:
a = imgs_generator.next()
#plt.imshow(a[0][0])

In [7]:
d = keras.preprocessing.image.ImageDataGenerator(validation_split=0.25)
d_train = d.flow_from_directory('data/flowers', subset='training', shuffle=True)
d_train.filenames

Found 6 images belonging to 2 classes.


['daisy/2019064575_7656b9340f_m.jpg',
 'daisy/3415180846_d7b5cced14_m.jpg',
 'daisy/4144275653_7c02d47d9b.jpg',
 'sunflowers/8481979626_98c9f88848_n.jpg',
 'sunflowers/9555824387_32b151e9b0_m.jpg',
 'sunflowers/9555827829_74e6f60f1d_m.jpg']

## text

```python
tokenizer = keras.preprocessing.text.Tokenizer(num_words=None, filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~ ', split=' ')
```
* `num_words`：只有最常出现的`num_words`个词会被保留
* `filters`：会被过滤掉的，实际上可以认为被替换成`split`指定的分割字符串
* `split`：指定分割文本用的字符串，默认是空格

In [46]:
tokenizer = prep.text.Tokenizer(5)

In [47]:
x = "if you want to sound like a native speaker , you must be willing to practice saying the want to to sound native"

In [48]:
tokenizer.fit_on_texts([x, ])  # 参数字符串组成的list，得到按词频排序的单词表

In [49]:
tokenizer.fit_on_texts(['you you you you you']) # 接着训练

In [53]:
tokenizer.index_word  # 下标-单词 组成的字典，包括所有单词，词频大的下标小；.word_index与之相反

{1: 'you',
 2: 'to',
 3: 'want',
 4: 'sound',
 5: 'native',
 6: 'if',
 7: 'like',
 8: 'a',
 9: 'speaker',
 10: 'must',
 11: 'be',
 12: 'willing',
 13: 'practice',
 14: 'saying',
 15: 'the'}

In [51]:
# tokenizer.word_counts # [(单词，频率), (...), ...]，是一个OrderedDict，按训练样本中单词出现的顺序排序

In [52]:
tokenizer.texts_to_sequences(["you are want to a", "a to want are you"])

[[1, 3, 2], [2, 3, 1]]

In [86]:
tokenizer.sequences_to_texts([[1,3,2, 0, 0, 0]])

['you want to']

In [154]:
tokenizer.texts_to_matrix(['you are wang to'])

array([[0., 1., 1., 0., 0.]])

In [155]:
prep.text.text_to_word_sequence('you are my best friend you')

['you', 'are', 'my', 'best', 'friend', 'you']

## sequence

```python
keras.preprocessing.sequence.TimeseriesGenerator(data, targets, length, sampling_rate=1, stride=1, start_index=0, end_index=None, shuffle=False, reverse=False, batch_size=128)
```
* `data`：是可索引的生成器（例如元组，列表或numpy数组），第0个轴是时间维度
* `targets`：对应的data时间步的目标值，第0个维度与data的时间维度长度相同
* `length`：每个样本有考虑多少个时间步，或者说当sampling_rate=1时，生成的结果中一个targets值对应多少个data值
* `sampling_rate`：时间步的采样周期，例如当`length=10, sampleing_rate=2`时，每2个时间步取一次，结果就是每个目标值只对应5个时间步，但是跨越了10个时间步
* `stride`：目标值的采样周期
* `start_index,end_index`：data和targets的下标在`[start_index, end_index]`之间的时间步才会被用到
* `shuffle`：是否打乱样本
* `reverse`：是否按时间步的倒序输出
* `batch_size`：每个批次的样本数

In [156]:
x = np.array([chr(i) for i in range(97, 117)])

In [157]:
data = np.array([[i] for i in range(50)])
targets = np.random.normal(size=[50, 3])

In [158]:
data_gen = prep.sequence.TimeseriesGenerator(data, targets, length=10, sampling_rate=2, batch_size=2)

In [159]:
data_gen[0]

(array([[[0.],
         [2.],
         [4.],
         [6.],
         [8.]],
 
        [[1.],
         [3.],
         [5.],
         [7.],
         [9.]]]), array([[-0.9120223 , -0.44903765, -0.45272137],
        [-1.5934117 ,  0.01738436,  0.91008626]]))

In [160]:
targets[10]

array([-0.9120223 , -0.44903765, -0.45272137])

## pad_sequences

```python
prep.sequence.pad_sequences(sequences, maxlen=None, dtype='int32', padding='pre', truncating='pre', value=0.0)
```
* sequences：列表的列表，每一个元素是一个序列
* maxlen：默认是所有序列中最长的长度
* value：浮点数，用来补齐的数

In [161]:
x = [[1,2,3], [3,], [4,5]]
prep.sequence.pad_sequences(x, maxlen=2)

array([[2, 3],
       [0, 3],
       [4, 5]], dtype=int32)

In [90]:
x = tf.random.categorical([tf.nn.softmax(tf.random.normal([5]))], 20)
x

<tf.Tensor: id=32801, shape=(1, 20), dtype=int64, numpy=array([[0, 0, 0, 0, 2, 1, 0, 2, 2, 3, 3, 0, 1, 4, 2, 2, 3, 3, 3, 0]])>

# tensorflow_datasets

In [3]:
# tfds.list_builders() # 可用数据集

In [24]:
(raw_train, raw_validation, raw_test), metadata = tfds.load(
    'cats_vs_dogs',
    split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
    with_info=True,
    as_supervised=True
)
# as_supervised=True 返回数据+标签

```
tfds.features.text.Tokenizer(alphanum_only=True, reserved_tokens=None)
```

In [4]:
x = "one one two two th,ree one"

In [5]:
tokenizer = tfds.features.text.Tokenizer(reserved_tokens=['th,ree',])
tokenizer.tokenize(x)

['one', 'one', 'two', 'two', 'th,ree', 'one']

In [6]:
tokenizer = tfds.features.text.Tokenizer(alphanum_only=False)
tokenizer.tokenize(x)

['one', ' ', 'one', ' ', 'two', ' ', 'two', ' ', 'th', ',', 'ree', ' ', 'one']

In [7]:
tokenizer = tfds.features.text.Tokenizer()
tokenizer.tokenize(x)

['one', 'one', 'two', 'two', 'th', 'ree', 'one']

In [9]:
r = tfds.features.text.TokenTextEncoder(tokenizer.tokenize(x))

In [10]:
r.tokens

['one', 'one', 'two', 'two', 'th', 'ree', 'one']

In [18]:
r.tokenizer.tokenize(x)

['one', 'one', 'two', 'two', 'th', 'ree', 'one']

In [31]:
encoder = tfds.features.text.SubwordTextEncoder.build_from_corpus(['one two aa aaa ', 'you are one two aa aa bbb,bb'], target_vocab_size=2**15)

In [32]:
encoder.subwords

['aa_', 'two_', 'one_', 'you_', 'bbb', 'bb', 'are_', 'aaa_']

In [33]:
encoder.encode('aa aaa aa T t')

[1, 8, 1, 93, 41, 125]

In [34]:
encoder.encode('aaa aaa %% two ') # 为啥第二个aaa_被分割成字母了

[8, 106, 106, 106, 41, 46, 46, 41, 2]

In [35]:
#encoder.decode([3, 1, 0, 3]) # ValueError，0只能在最后
encoder.decode([3, 1, 3, 44, 0])

'one aa one #'

In [43]:
encoder.encode('bb bbbbb')

[6, 41, 5, 6]