# 5章 誤差逆伝播法

## 5.3 逆伝播

### 5.3.1 加算ノードの逆伝播

$$
z = x + y \Longrightarrow \frac{\partial z}{\partial x} = 1, \hspace{2mm} \frac{\partial z}{\partial y} = 1
$$
従って、逆伝播は$\displaystyle \frac{\partial L}{\partial z}$の値がそのまま伝わる。

<img src="./figures/Fig05-9.jpg" width=700 alt="Fig05-9">

### 5.3.2 乗算ノードの逆伝播

$$
z = x y \Longrightarrow \frac{\partial z}{\partial x} = y, \hspace{2mm} \frac{\partial z}{\partial y} = x
$$

<img src="./figures/Fig05-12.jpg" width=700 alt="Fig05-12">

## 5.4 単純なレイヤの実装

### 5.4.1 乗算レイヤの実装

In [1]:
class MulLayer:
    def __init__(self):
        self.x = None
        self.y = None

    def forward(self, x, y):
        self.x = x
        self.y = y                
        out = x * y

        return out

    def backward(self, dout):
        dx = dout * self.y
        dy = dout * self.x

        return dx, dy

リンゴの例

<img src="./figures/Fig05-16.jpg" width=700 alt="Fig05-16">

In [2]:
apple = 100
apple_num = 2
tax = 1.1

mul_apple_layer = MulLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = mul_apple_layer.forward(apple, apple_num)
price = mul_tax_layer.forward(apple_price, tax)

# backward
dprice = 1
dapple_price, dtax = mul_tax_layer.backward(dprice)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)

print("price:", int(price))
print("dApple:", dapple)
print("dApple_num:", int(dapple_num))
print("dTax:", dtax)

price: 220
dApple: 2.2
dApple_num: 110
dTax: 200


### 5.4.2 加算レイヤの実装

In [3]:
class AddLayer:
    def __init__(self):
        pass

    def forward(self, x, y):
        out = x + y

        return out

    def backward(self, dout):
        dx = dout * 1
        dy = dout * 1

        return dx, dy

リンゴとみかんの例

<img src="./figures/Fig05-17.jpg" width=700 alt="Fig05-17">

In [4]:
apple = 100
apple_num = 2
orange = 150
orange_num = 3
tax = 1.1

# layer
mul_apple_layer = MulLayer()
mul_orange_layer = MulLayer()
add_apple_orange_layer = AddLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = mul_apple_layer.forward(apple, apple_num)  # (1)
orange_price = mul_orange_layer.forward(orange, orange_num)  # (2)
all_price = add_apple_orange_layer.forward(apple_price, orange_price)  # (3)
price = mul_tax_layer.forward(all_price, tax)  # (4)

# backward
dprice = 1
dall_price, dtax = mul_tax_layer.backward(dprice)  # (4)
dapple_price, dorange_price = add_apple_orange_layer.backward(dall_price)  # (3)
dorange, dorange_num = mul_orange_layer.backward(dorange_price)  # (2)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)  # (1)

print("price:", int(price))
print("dApple:", dapple)
print("dApple_num:", int(dapple_num))
print("dOrange:", dorange)
print("dOrange_num:", int(dorange_num))
print("dTax:", dtax)

price: 715
dApple: 2.2
dApple_num: 110
dOrange: 3.3000000000000003
dOrange_num: 165
dTax: 650


## 5.5 活性化関数レイヤの実装

### 5.5.1 ReLU (Rectified Linear Unit) レイヤ

$$
y = \begin{cases} x & (x > 0) \\ 0 & (x \le 0) \end{cases} \Longrightarrow \frac{\partial y}{\partial x} = \begin{cases} 1 & (x > 0) \\ 0 & (x \le 0) \end{cases}
$$

In [5]:
import numpy as np

class Relu:
    def __init__(self):
        self.mask = None

    def forward(self, x):
        self.mask = (x <= 0)
        out = x.copy()
        out[self.mask] = 0

        return out

    def backward(self, dout):
        dout[self.mask] = 0
        dx = dout

        return dx

### 5.5.2 Sigmoidレイヤ

$$
y = \frac{1}{1 + \exp (-x)}
$$

直接微分した場合
$$
\frac{\partial y}{\partial x} = - \frac{1}{(1 + \exp (-x) )^2} (- \exp (-x)) = y^2 \exp (-x)
$$

計算グラフの場合

<img src="./figures/Fig05-20.jpg" width=700 alt="Fig05-20">

「$/$」ノード $\hspace{2mm} \displaystyle y = \frac{1}{x} \Longrightarrow \frac{\partial y}{\partial x} = - \frac{1}{x^2} = - y^2$

「$+$」ノード $\hspace{2mm}$値はそのまま

「$\exp$」ノード $\hspace{2mm} \displaystyle y = \exp (x) \Longrightarrow \frac{\partial y}{\partial x} = \exp (x)$

「$\times$」ノード $\hspace{2mm} \displaystyle y = -1 \cdot x \Longrightarrow \frac{\partial y}{\partial x} = -1$

$$
\frac{\partial y}{\partial x} = y^2 \exp (-x) = \frac{1}{1 + \exp (-x)} \frac{\exp (-x)}{1 + \exp (-x)}= y (1-y)
$$

In [6]:
class Sigmoid:
    def __init__(self):
        self.out = None

    def forward(self, x):
        out = sigmoid(x)
        self.out = out
        return out

    def backward(self, dout):
        dx = dout * (1.0 - self.out) * self.out

        return dx

## 5.6 Affine / Softmax レイヤの実装

### 5.6.1 Affineレイヤ

#### 入力が1つ (行ベクトル) の場合

<img src="./figures/Fig05-25.jpg" width=700 alt="Fig05-25">

##### 数式の導出

$\displaystyle \frac{\partial L}{\partial \boldsymbol{x}} = \frac{\partial L}{\partial \boldsymbol{y}} \cdot W^T$であること

$\boldsymbol{x} = (x_i), \boldsymbol{y} = (y_i), W = (a_{ij})$とする。$x_i W_{ij} = y_j$とすると
$$
\left( \frac{\partial L}{\partial \boldsymbol{x}} \right)_k = \frac{\partial L}{\partial x_k} = \frac{\partial L}{\partial y_i} \frac{\partial y_i}{\partial x_k} = \frac{\partial L}{\partial y_i} \frac{\partial}{\partial x_k} (x_j W_{ji}) = \frac{\partial L}{\partial y_i} \delta_{kj} W_{ji} = \frac{\partial L}{\partial y_i} W_{ki} = \left( \frac{\partial L}{\partial \boldsymbol{y}} \cdot W^T \right)_k
$$

$\displaystyle \frac{\partial L}{\partial W} = \boldsymbol{x}^T \cdot \frac{\partial L}{\partial \boldsymbol{y}}$であること

$$
\left( \frac{\partial L}{\partial W} \right)_{kl} = \frac{\partial L}{\partial W_{kl}} = \frac{\partial L}{\partial y_i} \frac{\partial y_i}{\partial W_{kl}} =\frac{\partial L}{\partial y_i} \frac{\partial}{\partial W_{kl}} (x_j W_{ji}) = \frac{\partial L}{\partial y_i} x_j \delta_{kj} \delta_{li} = \frac{\partial L}{\partial y_l} x_k = x_k \frac{\partial L}{\partial y_l} = \left( \boldsymbol{x}^T \cdot \frac{\partial L}{\partial \boldsymbol{y}} \right)_{kl}
$$

#### バッチ処理の場合

<img src="./figures/Fig05-27.jpg" width=700 alt="Fig05-27">

##### 数式の導出

[2]と[3]が個々のサンプルの結果の和であることから導く。

入力、出力ともに行ベクトルなので、バッチ処理の場合
$$
X = \begin{pmatrix} \boldsymbol{x}_1 \\ \boldsymbol{x}_2 \end{pmatrix}, \hspace{2mm} \frac{\partial L}{\partial Y} = \begin{pmatrix} d\boldsymbol{y}_1 \\ d\boldsymbol{y}_2 \end{pmatrix}
$$
と置く。

まず、[2]の場合、
$$
\boldsymbol{x}_1^T d\boldsymbol{y}_1 + \boldsymbol{x}_2^T d\boldsymbol{y}_2 = \begin{pmatrix} \boldsymbol{x}_1^T & \boldsymbol{x}_2^T \end{pmatrix} \begin{pmatrix} d\boldsymbol{y}_1 \\ d\boldsymbol{y}_2 \end{pmatrix} = X^T \frac{\partial L}{\partial Y}
$$

次に[3]の場合

$$
d\boldsymbol{y}_1 + d\boldsymbol{y}_2
$$

は$\displaystyle \frac{\partial L}{\partial Y}$の最初の軸 (第0軸) に関する和となる。

In [7]:
class Affine:
    def __init__(self, W, b):
        self.W =W
        self.b = b
        self.x = None
        self.dW = None
        self.db = None

    def forward(self, x):
        # テンソル対応
        self.x = x
        out = np.dot(self.x, self.W) + self.b

        return out

    def backward(self, dout):
        dx = np.dot(dout, self.W.T)
        self.dW = np.dot(self.x.T, dout)
        self.db = np.sum(dout, axis=0)
        
        return dx

### 5.6.3 Softmax-with-Lossレイヤ

ソフトマックス関数
$$
S = \sum_{i=1}^n \exp(a_i)
$$

$$
y_k = \frac{\exp (a_k)}{\displaystyle \sum_{i=1}^n \exp(a_i)} = \frac{\exp (a_k)}{S}
$$

Cross Entropy Errorレイヤ

$$
L = - \sum_{k} t_k \log y_k
$$

Cross Entropy Errorレイヤの逆伝播

<img src="./figures/FigA-4.jpg" width=700 alt="FigA-4">

分岐レイヤの逆伝播

$x \rightarrow u, v \rightarrow L$で$u = x, v = x, L (u, v)$とすると

$$
\frac{\partial L}{\partial x} = \frac{\partial L}{\partial u} \frac{\partial u}{\partial x} + \frac{\partial L}{\partial v} \frac{\partial v}{\partial x} = \frac{\partial L}{\partial u} + \frac{\partial L}{\partial v}
$$

Softmaxレイヤの逆伝播

<img src="./figures/FigA-5.jpg" width=700 alt="FigA-5">


「$\times$」ノード
$$
- \frac{t_1}{y_1} \exp(a_1) = - t_1 \frac{S}{\exp(a_1)} \exp(a_1) = - t_1 S
$$
$$
- \frac{t_1}{y_1} \frac{1}{S} = - t_1 \frac{S}{\exp(a_1)} \frac{1}{S} = - \frac{t_1}{\exp (a_1)}
$$

「逆数 ($/$)」と分岐のノード
$$
y = \frac{1}{x} \rightarrow \frac{\partial y}{\partial x} = - \frac{1}{x^2}
$$
$$
- (t_1 S + t_2 S + t_3 S) \frac{-1}{S^2} = \frac{1}{S} (t_1 + t_2 + t_3)
$$
one-hot表現なので$t_1, t_2, t_3$のどれかが1でその他は0なので$t_1 + t_2 + t_3 = 1$。よって
$$
\frac{1}{S}
$$

「$\exp$」ノード
$$
\frac{\partial}{\partial x} \exp (x) = \exp (x)
$$
$$
\left( \frac{1}{S} - \frac{t_1}{\exp (a_1)} \right) \exp (a_1) = \frac{\exp (a_1)}{S} - t_1 = y_1 - t_1
$$

## 5.7 誤差逆伝播法の実装

### 5.7.2 誤差逆伝播法に対応したニューラルネットワークの実装

functions.py

layers.py

two_layer_net.py

In [8]:
import sys, os
sys.path.append(os.pardir)  # 親ディレクトリのファイルをインポートするための設定
import numpy as np
from common.layers import *
from common.gradient import numerical_gradient
from collections import OrderedDict


class TwoLayerNet:

    def __init__(self, input_size, hidden_size, output_size, weight_init_std = 0.01):
        # 重みの初期化
        self.params = {}
        self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size) 
        self.params['b2'] = np.zeros(output_size)

        # レイヤの生成
        self.layers = OrderedDict()
        self.layers['Affine1'] = Affine(self.params['W1'], self.params['b1'])
        self.layers['Relu1'] = Relu()
        self.layers['Affine2'] = Affine(self.params['W2'], self.params['b2'])

        self.lastLayer = SoftmaxWithLoss()
        
    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)
        
        return x
        
    # x:入力データ, t:教師データ
    def loss(self, x, t):
        y = self.predict(x)
        return self.lastLayer.forward(y, t)
    
    def accuracy(self, x, t):
        y = self.predict(x)
        y = np.argmax(y, axis=1)
        if t.ndim != 1 : t = np.argmax(t, axis=1)
        
        accuracy = np.sum(y == t) / float(x.shape[0])
        return accuracy
        
    # x:入力データ, t:教師データ
    def numerical_gradient(self, x, t):
        loss_W = lambda W: self.loss(x, t)
        
        grads = {}
        grads['W1'] = numerical_gradient(loss_W, self.params['W1'])
        grads['b1'] = numerical_gradient(loss_W, self.params['b1'])
        grads['W2'] = numerical_gradient(loss_W, self.params['W2'])
        grads['b2'] = numerical_gradient(loss_W, self.params['b2'])
        
        return grads
        
    def gradient(self, x, t):
        # forward
        self.loss(x, t)

        # backward
        dout = 1
        dout = self.lastLayer.backward(dout)
        
        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)

        # 設定
        grads = {}
        grads['W1'], grads['b1'] = self.layers['Affine1'].dW, self.layers['Affine1'].db
        grads['W2'], grads['b2'] = self.layers['Affine2'].dW, self.layers['Affine2'].db

        return grads


train_neuralnet.py

In [9]:
# coding: utf-8
import sys, os
sys.path.append(os.pardir)

import numpy as np
from dataset.mnist import load_mnist
from ch05.two_layer_net import TwoLayerNet

# データの読み込み
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

iters_num = 10000
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    
    # 勾配
    #grad = network.numerical_gradient(x_batch, t_batch)
    grad = network.gradient(x_batch, t_batch)
    
    # 更新
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]
    
    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)
    
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print(train_acc, test_acc)

0.11661666666666666 0.1169
0.90535 0.9116
0.922 0.9258
0.9344333333333333 0.9357
0.94375 0.9421
0.9505 0.9471
0.9562666666666667 0.9518
0.9610666666666666 0.9588
0.9641833333333333 0.9602
0.9675166666666667 0.9611
0.9695166666666667 0.9637
0.9723666666666667 0.9667
0.9713833333333334 0.9665
0.9749666666666666 0.9691
0.9768 0.9693
0.9786333333333334 0.9717
0.97865 0.9716
