# 个人笔记——softmax回归

softmax与线性回归相似，也是单层神经网络，只是它有多个输出值，因此常用来处理分类问题
![Image Name](https://cdn.kesci.com/upload/image/q5mmhujmua.png?imageView2/0/w/960/h/960)
根据上图，最终的输出值可表示为：
![Image Name](https://cdn.kesci.com/upload/image/q5mmlt8rkj.PNG?imageView2/0/w/960/h/960)
更进一步，为：
![Image Name](https://cdn.kesci.com/upload/image/q5mmmib52x.PNG?imageView2/0/w/960/h/960)


## 获取图像数据集
通过一个图片分类的问题学习用pytorch实现softmax回归

In [1]:
# 获取数据集
%matplotlib inline
from IPython import display
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.nn import init
import torchvision
import torchvision.transforms as transforms
import time
import sys
import numpy as np
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l

print(torch.__version__)
print(torchvision.__version__)

1.3.0
0.4.1a0+d94043a


In [2]:
# 获取数据集
mnist_train = torchvision.datasets.FashionMNIST(root='/home/kesci/Datasets/FashionMNIST', train=True, download=True, transform=transforms.ToTensor())
mnist_test = torchvision.datasets.FashionMNIST(root='/home/kesci/Datasets/FashionMNIST', train=False, download=True, transform=transforms.ToTensor())

0it [00:00, ?it/s]

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/train-images-idx3-ubyte.gz


26427392it [00:07, 3537223.63it/s]                              


Extracting /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/train-images-idx3-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw


0it [00:00, ?it/s]

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/train-labels-idx1-ubyte.gz


32768it [00:01, 22919.62it/s]                           
0it [00:00, ?it/s]

Extracting /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/train-labels-idx1-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


 99%|█████████▉| 4390912/4422102 [00:20<00:00, 366238.47it/s]
0it [00:00, ?it/s][A

Extracting /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz



  0%|          | 0/5148 [00:00<?, ?it/s][A
8192it [00:00, 10479.56it/s]            [A

Extracting /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw
Processing...
Done!


4423680it [00:40, 366238.47it/s]                             

In [3]:
# 查看数据集大小
print(type(mnist_train))
print(len(mnist_train), len(mnist_test))
# 数据集操作
feature, label = mnist_train[0]  # 通过下标访问样本
print(feature.shape, label)
# 封装函数将数值标签转成相应文本标签
def get_fashion_mnist_labels(labels):
    text_labels = ['t-shirt','trouser','pullover','dress',
                   'coat','sandal','shirt','sneaker','bag','ankle boot']
    return [text_labels[int(i)] for i in labels]
# 定义函数，功能：在一行中画出多张图像和对应标签
def show_fashion_mnist(images, labels):
    d2l.use_svg_display()
    _, figs = plt.subplots(1, len(images), figsize=(12, 12))
    for f, img, lbl in zip(figs, images, labels):
        f.imshow(img.view((28, 28)).numpy())
        f.set_title(lbl)
        f.axes.get_xaxis().set_visible(False)
        f.axes.get_yaxis().set_visible(False)
    plt.show()

# 查看训练数据集前9个样本图像和文本标签
X, y = [], []
for i in range(10):
    X.append(mnist_train[i][0])
    y.append(mnist_train[i][1])
show_fashion_mnist(X, get_fashion_mnist_labels(y))

<class 'torchvision.datasets.mnist.FashionMNIST'>
60000 10000
torch.Size([1, 28, 28]) 9


笔记：实践中可以通过多进程加速数据的读取，pytorch有相应的功能

In [4]:
# 多进程读取数据
# 读取数据
batch_size = 256
if sys.platform.startswith('win'):
    num_workers = 0  # 不用额外进程加速读取数据
else:
    num_workers = 4
# num_workers = 4
train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)
# 读取一遍训练数据所需时间
start = time.time()
for X, y in train_iter:
    continue
print('{} sec'.format(time.time() - start))

4.745744466781616 sec


## pytorch实现softmax回归

In [5]:
# 获取数据
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
# 初始化参数
num_inputs = 784
num_outputs = 10

class LinearNet(nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super(LinearNet, self).__init__()
        self.linear = nn.Linear(num_inputs, num_outputs)
    def forward(self, x): # x 的形状: (batch, 1, 28, 28)
        y = self.linear(x.view(x.shape[0], -1))
        return y
net = LinearNet(num_inputs, num_outputs)
# 定义函数记录转换功能
class FlattenLayer(nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()
    def forward(self, x): # x 的形状: (batch, *, *, ...)
        return x.view(x.shape[0], -1)

In [6]:
# d定义模型并初始化
from collections import OrderedDict
net = nn.Sequential(
        # FlattenLayer(),
        # LinearNet(num_inputs, num_outputs) 
        OrderedDict([
           ('flatten', FlattenLayer()),
           ('linear', nn.Linear(num_inputs, num_outputs))]))
init.normal_(net.linear.weight, mean=0, std=0.01)
init.constant_(net.linear.bias, val=0)

Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)

使用交叉熵损失函数
笔记：这里处理的是分类问题，只要选出预测值中最大者即可得到结果，因此无需像线性回归那样使用均方误差，使用均方误差对于分类模型可能太过严格，这里使用交叉熵损失函数即可，只要分类正确就行

In [7]:
# 损失函数；
# 为避免softmax运算和交叉熵运算分开定义造成的数值不稳定，
# 调用同时定义的损失函数
loss = nn.CrossEntropyLoss()

# 优化算法
optimizer = torch.optim.SGD(net.parameters(), lr=0.1)

In [8]:
# 训练算法
num_epochs = 5 
d2l.train_ch3(net, train_iter, test_iter, loss, 
    num_epochs, batch_size, None, None, optimizer)

epoch 1, loss 0.0031, train acc 0.750, test acc 0.760
epoch 2, loss 0.0022, train acc 0.813, test acc 0.808
epoch 3, loss 0.0021, train acc 0.825, test acc 0.818
epoch 4, loss 0.0020, train acc 0.832, test acc 0.805
epoch 5, loss 0.0019, train acc 0.836, test acc 0.822


In [9]:
# 模型预测
X, y = iter(test_iter).next()

true_labels = d2l.get_fashion_mnist_labels(y.numpy())
pred_labels = d2l.get_fashion_mnist_labels(net(X).argmax(dim=1).numpy())
titles = [true + '\n' + pred for true, pred in zip(true_labels, pred_labels)]

d2l.show_fashion_mnist(X[0:9], titles[0:9])

线性回归：https://www.kesci.com/org/boyuai/project/share/4b72c6515eb2ec7f
多层感知机：https://www.kesci.com/org/boyuai/project/share/fe09890d6a297909