## 逻辑回归

softmax with logits with cross entropy，二分类就是logistic regression，多分类就是softmax regression。

释义
* logits - 是softmax的输入，即不做sigmoid只做sum的输出值
* softmax - 是简单的计算方式
* cross entropy - 是训练所使用的cost函数

二分类问题的cost函数是最小二乘的极大似然估计，推广到多分类上就是cross entropy交叉熵，熵、互信息这些概念在后面的无监督学习也常用到。比如VAE中提到的kl divergence，KL散度

二分类的极大似然估计 $L(\theta) = y*log(h_{\theta}(X)) + (1-y) * log(1 - log(h_{\theta}(X)) $

多分类的交叉熵 $-\sum p(x)logp(x)$ => $ -\sum y_i * log(h_{\theta}(X)) $ 可见$-L(\theta)$就是二分类的交叉熵了。

下面先用pytorch来实现一个简单的逻辑回归

In [2]:
from itertools import count
import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable

from torchvision import datasets, transforms

# read mnist
trainLoader = torch.utils.data.DataLoader(datasets.MNIST('./data', train = True, download = True, 
                                                         transform = transforms.Compose([
                                                            transforms.ToTensor(),
                                                            transforms.Normalize((0.1307,), (0.3081,)), # ????? 
                                                             # I think those are the mean and std deviation of the MNIST dataset.
                                                         ])),
                                         batch_size = 32, shuffle = True)
testLoader = torch.utils.data.DataLoader(datasets.MNIST('./data', train = False, download = True,
                                                       transform = transforms.Compose([
                                                           transforms.ToTensor(),
                                                           transforms.Normalize((0.1307,), (0.3081,))
                                                       ])))


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!


https://discuss.pytorch.org/t/normalization-in-the-mnist-example/457
I think those are the mean and std deviation of the MNIST dataset. 这句话的出处， 伟大的avijit_dasgupta
但问题是，为什么pytorch要这么做，要用户自己去计算数据集的mean和std？？？？

In [21]:
def buildLRModel():
    model = nn.Sequential()
    model.add_module("simple", nn.Linear(784, 10, bias = True))
    return model

注意哦，这里的输入可以二维的图像，而我们的输入希望是一个全连接，所以需要加一个view，这个名字，给pytorch100分

In [13]:
help(Variable.view)

Help on function view in module torch.autograd.variable:

view(self, *sizes)



In [40]:
it = iter(trainLoader)
model = buildLRModel()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr = 0.001, momentum = 0.9)

for epoch in range(1000):
    imgs, labels = next(it)
    imgs, labels = Variable(imgs), Variable(labels)
    
    model.zero_grad()
    x = imgs.view(imgs.size(0), 784)
    y = model.forward(x)
    loss = criterion(y, labels)
    loss.backward()
    
    optimizer.step()
    
    if epoch % 100 == 0:
        _, predicted = torch.max(y, 1)
        total = labels.size(0)
        correct = (predicted == labels).sum()
        print("loss: %.2f, Accuracy: %.2f" % (loss.data[0], correct.data[0] / total * 100 ))
    
    

loss: 2.49, Accuracy: 3.12
loss: 0.58, Accuracy: 81.25
loss: 0.47, Accuracy: 84.38
loss: 0.65, Accuracy: 81.25
loss: 0.41, Accuracy: 78.12
loss: 0.34, Accuracy: 90.62
loss: 0.27, Accuracy: 90.62
loss: 0.37, Accuracy: 90.62
loss: 0.18, Accuracy: 96.88
loss: 0.15, Accuracy: 100.00


实际上Sequential是一个比较麻烦的东西，显然pytorch通过制造这个恶心玩意儿，比我们只能使用类，继承Module，好吧，这么个简单模型我么也得这么干，重来一遍

In [42]:
class LRModel(nn.Module):
    def __init__(self):
        super(LRModel, self).__init__()
        self.fc = nn.Linear(784, 10, bias = True)
        
    def forward(self, x):
        x = x.view(-1, 784)
        x = self.fc(x)
        return x
    


In [43]:
it = iter(trainLoader)
model = LRModel()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr = 0.001, momentum = 0.9)

for epoch in range(1000):
    imgs, labels = next(it)
    imgs, labels = Variable(imgs), Variable(labels)
    
    model.zero_grad()
    x = imgs.view(imgs.size(0), 784)
    y = model(x)
    loss = criterion(y, labels)
    loss.backward()
    
    optimizer.step()
    
    if epoch % 100 == 0:
        _, predicted = torch.max(y, 1)
        total = labels.size(0)
        correct = (predicted == labels).sum()
        print("loss: %.2f, Accuracy: %.2f" % (loss.data[0], correct.data[0] / total * 100 ))

loss: 2.60, Accuracy: 6.25
loss: 0.66, Accuracy: 84.38
loss: 0.48, Accuracy: 84.38
loss: 0.42, Accuracy: 84.38
loss: 0.31, Accuracy: 90.62
loss: 0.41, Accuracy: 87.50
loss: 0.34, Accuracy: 90.62
loss: 0.40, Accuracy: 87.50
loss: 0.48, Accuracy: 84.38
loss: 0.35, Accuracy: 93.75


OK, Done!