## Content

1. 各种动态梯度的介绍和代码。
2. 使用Keras快速搭建神经网络。
3. 使用Keras和自定义类搭建神经网络。

## Common structure
已有参数的梯度：grads

In [None]:
# 参数设定：
############################

############################
for epoch in range(epoch):  # 数据集级别的循环，每个epoch循环一次数据集
    for step, (x_train, y_train) in enumerate(train_db):  # batch级别的循环 ，每个step循环一个batch
        with tf.GradientTape() as tape:  # with结构记录梯度信息
            y = tf.matmul(x_train, w1) + b1  # 神经网络乘加运算
            y = tf.nn.softmax(y)  # 使输出y符合概率分布（此操作后与独热码同量级，可相减求loss）
            y_ = tf.one_hot(y_train, depth=3)  # 将标签值转换为独热码格式，方便计算loss和accuracy
            loss = tf.reduce_mean(tf.square(y_ - y))  # 采用均方误差损失函数mse = mean(sum(y-out)^2)
            loss_all += loss.numpy()  # 将每个step计算出的loss累加，为后续求loss平均值提供数据，这样计算的loss更准确
        # 计算loss对各个参数的梯度
        grads = tape.gradient(loss, [w1, b1])

        # 实现梯度更新：
        ############################

        ############################


    # 每个epoch，打印loss信息
    print("Epoch {}, loss: {}".format(epoch, loss_all / 4))
    train_loss_results.append(loss_all / 4)  # 将4个step的loss求平均记录在此变量中
    loss_all = 0  # loss_all归零，为记录下一个epoch的loss做准备

## SGD

In [None]:
# 梯度更新
w1.assign_sub(lr * grads[0])  # 参数w1自更新
b1.assign_sub(lr * grads[1])  # 参数b自更新

## SGDM
在SGD基础上加入动量/历史影响。貌似有多种实现方法？？

In [None]:
# 额外参数设定
m_w, m_b = 0, 0
beta = 0.9

# 梯度更新
m_w = beta * m_w + (1 - beta) * grads[0]
m_b = beta * m_b + (1 - beta) * grads[1]
w1.assign_sub(lr * m_w)
b1.assign_sub(lr * m_b)

## Adagrad
加入二次导数，在鞍点也可以继续移动。当前梯度过大时加以限制，过小时保持梯度。

In [None]:
# 额外参数设定
v_w, v_b = 0, 0

# 梯度更新
v_w += tf.square(grads[0])
v_b += tf.square(grads[1])
w1.assign_sub(lr * grads[0] / tf.sqrt(v_w))
b1.assign_sub(lr * grads[1] / tf.sqrt(v_b))

## Rmsprop
在Adagrad基础上加入二次导数的动量/历史影响。

In [None]:
# 额外参数设定
v_w, v_b = 0, 0
beta = 0.9

# 梯度更新
v_w = beta * v_w + (1 - beta) * tf.square(grads[0])
v_b = beta * v_b + (1 - beta) * tf.square(grads[1])
w1.assign_sub(lr * grads[0] / tf.sqrt(v_w))
b1.assign_sub(lr * grads[1] / tf.sqrt(v_b))

## Adam
结合SGDM和Rmsprop，有一次导数和二次导数，有动量，并有超参数辅助。

In [None]:
# 额外参数设定
m_w, m_b = 0, 0
v_w, v_b = 0, 0
beta1, beta2 = 0.9, 0.999
delta_w, delta_b = 0, 0
global_step = 0

# 梯度更新
global_step += 1

m_w = beta1 * m_w + (1 - beta1) * grads[0]
m_b = beta1 * m_b + (1 - beta1) * grads[1]
v_w = beta2 * v_w + (1 - beta2) * tf.square(grads[0])
v_b = beta2 * v_b + (1 - beta2) * tf.square(grads[1])

m_w_correction = m_w / (1 - tf.pow(beta1, int(global_step)))
m_b_correction = m_b / (1 - tf.pow(beta1, int(global_step)))
v_w_correction = v_w / (1 - tf.pow(beta2, int(global_step)))
v_b_correction = v_b / (1 - tf.pow(beta2, int(global_step)))

w1.assign_sub(lr * m_w_correction / tf.sqrt(v_w_correction))
b1.assign_sub(lr * m_b_correction / tf.sqrt(v_b_correction))

## Use keras to build neural network
6 steps:
1. Import libraries <br>
2. Build training set and testing set <br>
3. model = tf.keras.models.Sequential <br>
4. model.compile <br>
5. model.fit <br>
6. model.summary <br>

### Construct neural network structure
```model = tf.keras.models.Sequential([<network structure>])```

* 拉直层： <br>
```tf.keras.layers.Flatten()```


* 全连接层： <br>
```tf.keras.layers.Dense(神经元个数, activation = "激活函数", kernal_regularizer = 正则化方法)```  <br>
activation（字符串给出）可选: relu、softmax、sigmoid 、tanh <br>
kernel_regularizer可选: ```tf.keras.regularizers.l1()```、```tf.keras.regularizers.l2()``` <br>


* 卷积层： <br>
```tf.keras.layers.Conv2D(filters = 卷积核个数, kernel_size = 卷积核尺寸, strides = 卷积步长， padding = " valid" or "same")```


* LSTM层： <br>
```tf.keras.layers.LSTM()```

### Select optimizer and loss function
```model.compile(optimizer = 优化器, loss = 损失函数, metrics = [“准确率”] )```

**Optimizer可选(使用函数可以调参数):** <br>
* 'sgd' or ```tf.keras.optimizers.SGD (lr=学习率,momentum=动量参数)``` <br>
* 'adagrad' or ```tf.keras.optimizers.Adagrad (lr=学习率)``` <br>
* 'adadelta' or ```tf.keras.optimizers.Adadelta (lr=学习率)``` <br>
* 'adam' or ```tf.keras.optimizers.Adam (lr=学习率, beta_1=0.9, beta_2=0.999)``` <br>


**loss可选:** <br>
* Mean square error <br>
```tf.keras.losses.MeanSquaredError()```
  
  
* Binary cross entropy <br>
Use this cross-entropy loss when there are only two label classes (assumed to be 0 and 1) <br>
```tf.keras.losses.BinaryCrossentropy(from_logits = )```
  
  
* Categorical cross entroty <br>
Use this cross-entropy loss when the labels were provided in a one-hot representation. <br>
```tf.keras.losses.CategoricalCrossentropy(from_logits = )```
  
  
* Sparse Categorical cross entropy <br>
Use this cross-entropy loss when the labels were provided in integer representation. <br>
```tf.keras.losses.SparseCategoricalCrossentropy(from_logits = )```

  
```from_logits = True```: The model's output didn't go through sigmoid or softmax <br>
```from_logits = False```: The model's ouput went through sigmoid or softmax, now in probability distribution format. <br>


**Metrics可选:** <br>
* 'accuracy' ：y_和y都是数值，如y_=[1] y=[1] <br>
* 'categorical_accuracy' ：y_和y都是独热码(概率分布)，如y_=[0,1,0] y=[0.256,0.695,0.048] <br>
* 'sparse_categorical_accuracy' ：y_是数值，y是独热码(概率分布),如y_=[1] y=[0.256,0.695,0.048] <br>

Metrics 选择 'accuracy' 即可，TF会自动转化成 'BinaryAccarcy', 'CategoricalAccuracy' 或 'SparseCategoricalAccuracy' 其中一种。

### Start fitting
```
model.fit (训练集的输入特征, 训练集的标签, batch_size= , epochs= ,
validation_data = (测试集的输入特征，测试集的标签),
validation_split = 从训练集划分多少比例给测试集, 
validation_freq = 多少次epoch测试一次)
```
```validation_data``` 和 ```validation_split``` 二选一。

所有详见：https://keras.io/zh/models/sequential/

### Make prediction
* ```model()```<br>
Computation is done in batches (4D). Return in ```tf.tensor```.


* ```model.predict()```<br>
```
predict(x, batch_size=None, verbose=0, steps=None, callbacks=None, max_queue_size=10,workers=1, use_multiprocessing=False)
```
Computation is done in batches (4D). Return in ```numpy array```. This method is designed for performance in large scale inputs. For small amount of inputs that fit in one batch, directly using __call__ is recommended for faster execution, e.g., ```model(x)```, or ```model(x, training=False)```


* ```model.predict()```<br>
Returns predictions for a single batch of samples.

## Fast build example

In [2]:
import tensorflow as tf
from sklearn import datasets
import numpy as np

x_train = datasets.load_iris().data
y_train = datasets.load_iris().target

np.random.seed(116)
np.random.shuffle(x_train)
np.random.seed(116)
np.random.shuffle(y_train)
tf.random.set_seed(116)

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(3, activation = 'softmax', kernel_regularizer = tf.keras.regularizers.l2())
])

model.compile(optimizer = tf.keras.optimizers.SGD(lr=0.1),
              loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics = ['sparse_categorical_accuracy'])

# By default, fit() will print out the progress of every iteration, set verbose = 0 can disable print out.
history = model.fit(x_train, y_train, batch_size=32, epochs=500, validation_split=0.2, validation_freq=20, verbose = 0)
# 进行了500次迭代，所以有500次数据，取最后一次显示。
print(history.history.keys())
print('loss: ', history.history['loss'][-1])
print('sparse_categorical_accuracy: ', history.history['sparse_categorical_accuracy'][-1])
print('val_loss: ', history.history['val_loss'][-1])
print('val_sparse_categorical_accuracy: ', history.history['val_sparse_categorical_accuracy'][-1])
model.summary()

dict_keys(['loss', 'sparse_categorical_accuracy', 'val_loss', 'val_sparse_categorical_accuracy'])
loss:  0.3332972486813863
sparse_categorical_accuracy:  0.96666664
val_loss:  0.40018776059150696
val_sparse_categorical_accuracy:  1.0
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              multiple                  15        
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________


## Plot graph (Complete procedure)

In [1]:
import matplotlib.pyplot as plt

# print(history.history.keys())
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss=history.history['loss']
val_loss=history.history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation Accuracy')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.ylim([0,1.0])
plt.title('Training and Validation Loss')
plt.show()

NameError: name 'history' is not defined

## Fast build + customized class example
### Class definition

In [None]:
class IrisModel(Model):
    def __init__(self):
        super(IrisModel, self).__init__()
        # Define all the modules used
        self.d1 = Dense(3, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l2())
    
    def call(self, x):
        # Define the forward propagation structure
        y = self.d1(x)
        return y

In [None]:
# Continue
model = IrisModel()

model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.1),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

h = model.fit(x_train, y_train, batch_size=32, epochs=500, validation_split=0.2, validation_freq=20, , verbose = 0)
print('loss: ', h.history['loss'][-1])
print('sparse_categorical_accuracy: ', h.history['sparse_categorical_accuracy'][-1])
print('val_loss: ', h.history['val_loss'][-1])
print('val_sparse_categorical_accuracy: ', h.history['val_sparse_categorical_accuracy'][-1])
model.summary()