# 通过深度卷积神经网络分类图片

- 理解在1到2个维度上的卷积操作
- 了解CNN架构的构建模块
- 用TensorFlow实现深度卷积神经网络

## 卷积神经网络的构建模块

### 了解CNN和学习功能层次结构

- Convolutional layers
- Pooling layers
- Fully Connected layers

### 执行离散卷积

下面出现的$*$是指的卷积操作,而不是乘法

#### 执行1维下的离散卷积

\begin{equation}
\boldsymbol{y}=\boldsymbol{x} * \boldsymbol{w} \rightarrow \boldsymbol{y}[i]=\sum_{k=-\infty}^{+\infty} \boldsymbol{x}[i-k] \boldsymbol{w}[k]
\end{equation}

假设x和w有n和m个元素
\begin{equation}
y = x * w \rightarrow y[i] = \sum_{k=0}^{k=m-1}x^p[i+m-k]w[k]
\end{equation}

![1.png](1.png)

#### 零填充在卷积中的效果

The following figure illustrates the three different padding modes for a simple 5 x 5
pixel input with a kernel size of 3 x 3 and a stride of 1
![2](2.png)

#### 决定卷积输出的大小

\begin{equation}
O=\left\lfloor\frac{n+2 p-m}{s}\right\rfloor+ 1
\end{equation}

n是x的大小, m是kernel的大小, p是padding, s是stride

- same model
\begin{equation}
n=10, m=5, p=2, s=1 \rightarrow o=\left\lfloor\frac{10+2 \times 2-5}{1}\right\rfloor+ 1=10
\end{equation}

In [2]:
import numpy as np


def conv1d(x, w, p=0, s=1):
    w_rot = np.array(w[::-1])
    x_padded = np.array(x)
    if p > 0:
        zero_pad = np.zeros(shape=p)
        x_padded = np.concatenate([zero_pad, x_padded, zero_pad])

    res = []
    for i in range(0, int(len(x)/s), s):
        res.append(np.sum(x_padded[i:i+w_rot.shape[0]] * w_rot))

    return np.array(res)

In [3]:
# Testing
x = [1, 3, 2, 4, 5, 6, 1, 3]
w = [1, 0, 3, 1, 2]
print('Conv1d Implementation:', conv1d(x, w, p=2, s=1))

Conv1d Implementation: [ 5. 14. 16. 26. 24. 34. 19. 22.]


In [4]:
print('Numpy Results:', np.convolve(x, w, mode='same'))

Numpy Results: [ 5 14 16 26 24 34 19 22]


#### 执行2维下的离散卷积

\begin{equation}
\boldsymbol{Y}=\boldsymbol{X} * \boldsymbol{W} \rightarrow \boldsymbol{Y}[i, j]=\sum_{k_{1}=-\infty}^{+\infty} \sum_{k_{2}=-\infty}^{+\infty} \boldsymbol{X}\left[i-k_{1}, j-k_{2}\right] \boldsymbol{W}\left[k_{1}, k_{2}\right]
\end{equation}

![3](3.png)

\begin{equation}
\boldsymbol{W}^{r}=\left[\begin{array}{lll}{0.5} & {1} & {0.5} \\ {0.1} & {0.4} & {0.3} \\ {0.4} & {0.7} & {0.5}\end{array}\right]
\end{equation}

p=(1,1), s=(2,2)

![4](4.png)

In [1]:
import numpy as np
import scipy.signal


def conv2d(X, W, p=(0, 0), s=(1, 1)):
    W_rot = np.array(W)[::-1, ::-1]
    X_orig = np.array(X)
    n1 = X_orig.shape[0] + 2*p[0]
    n2 = X_orig.shape[1] + 2*p[1]
    X_padded = np.zeros(shape=(n1, n2))
    X_padded[p[0]:p[0]+X_orig.shape[0], p[1]:p[1]+X_orig.shape[1]] = X_orig

    res = []
    for i in range(0, int((X_padded.shape[0] - W_rot.shape[0])/s[0])+1, s[0]):
        res.append([])
        for j in range(0, int((X_padded.shape[1] - W_rot.shape[1])/s[1])+1, s[1]):
            X_sub = X_padded[i:i+W_rot.shape[0], j:j+W_rot.shape[1]]
            res[-1].append(np.sum(X_sub*W_rot))
    return (np.array(res))

  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)


In [2]:
X = [[1, 3, 2, 4], [5, 6, 1, 3], [1, 2, 0, 2], [3, 4, 3, 2]]
W = [[1, 0, 3], [1, 2, 1], [0, 1, 1]]

print('Conv2d Implementation:\n', conv2d(X, W, p=(1, 1), s=(1, 1)))

Conv2d Implementation:
 [[11. 25. 32. 13.]
 [19. 25. 24. 13.]
 [13. 28. 25. 17.]
 [11. 17. 14.  9.]]


In [3]:
print('Scipy Results:\n', scipy.signal.convolve2d(X, W, mode='same'))

Scipy Results:
 [[11 25 32 13]
 [19 25 24 13]
 [13 28 25 17]
 [11 17 14  9]]


### 子采样

![5](5.png)

优点有两点:
1. Pooling(max-pooling)有更好的robust性能
2. 减少了特征数量, 增加了计算效率并同时降低了过拟合的程度

## 融合所有东西构建一个CNN

### 使用多个输入或颜色通道

![6](6.png)

若有多个feature map,则变化如下
![7](7.png)

卷积层后接一个采样层的示例图
![8](8.png)