## 前情提要
若一次跑完所有資料的訓練會非常耗時，用所有訓練集計算梯度
遇到大資料集，較好的方式是將訓練集切成多組小樣本（batches）

### 批次訓練（batch training）
名詞定義：
* epoch：所有訓練集樣本做完一次 forward & backward pass
* batch size：訓練集中部分做完一次 forward & backward pass 的樣本數
* number of iteration：迭代次數，每次迭代會用到 batch size 個樣本
* EX：假設有 10000 個樣本，batch size 為 200，則一個 epoch 要做 10000/200=50 次迭代

In [1]:
import torch
import torchvision
from torch.utils.data import Dataset, DataLoader
import numpy as np
import math

In [3]:
class WineDataset(Dataset):
    def __init__(self):
        # load data
        dataset = np.loadtxt('./Data/wine.csv', delimiter=',', dtype=np.float32, skiprows=1)
        self.x = torch.from_numpy(dataset[:, 1:])
        self.y = torch.from_numpy(dataset[:, [0]])
        self.n_samples = dataset.shape[0]

    def __getitem__(self, index):
        # dataset[0]
        return self.x[index], self.y[index]
    def __len__(self):
        # len(dataset)
        return self.n_samples

dataset = WineDataset()
features_1, label_1 = dataset[0]
print(features_1, label_1)

tensor([1.4230e+01, 1.7100e+00, 2.4300e+00, 1.5600e+01, 1.2700e+02, 2.8000e+00,
        3.0600e+00, 2.8000e-01, 2.2900e+00, 5.6400e+00, 1.0400e+00, 3.9200e+00,
        1.0650e+03]) tensor([1.])


In [8]:
'''
num_workers > 0 可以讓讀資料更快
但在 windows 上執行會有問題
詳情請見：https://blog.csdn.net/qq_38662733/article/details/108549461
由於此份資料不大，暫用 num_workers=0
'''
dataset = WineDataset()
dataloader = DataLoader(dataset=dataset, batch_size=4, shuffle=True, num_workers=0) 

dataiter = iter(dataloader)
data = dataiter.next()
features, label = data
print(features, label)

tensor([[1.2600e+01, 2.4600e+00, 2.2000e+00, 1.8500e+01, 9.4000e+01, 1.6200e+00,
         6.6000e-01, 6.3000e-01, 9.4000e-01, 7.1000e+00, 7.3000e-01, 1.5800e+00,
         6.9500e+02],
        [1.2930e+01, 3.8000e+00, 2.6500e+00, 1.8600e+01, 1.0200e+02, 2.4100e+00,
         2.4100e+00, 2.5000e-01, 1.9800e+00, 4.5000e+00, 1.0300e+00, 3.5200e+00,
         7.7000e+02],
        [1.3080e+01, 3.9000e+00, 2.3600e+00, 2.1500e+01, 1.1300e+02, 1.4100e+00,
         1.3900e+00, 3.4000e-01, 1.1400e+00, 9.4000e+00, 5.7000e-01, 1.3300e+00,
         5.5000e+02],
        [1.2720e+01, 1.7500e+00, 2.2800e+00, 2.2500e+01, 8.4000e+01, 1.3800e+00,
         1.7600e+00, 4.8000e-01, 1.6300e+00, 3.3000e+00, 8.8000e-01, 2.4200e+00,
         4.8800e+02]]) tensor([[3.],
        [1.],
        [3.],
        [2.]])


In [1]:
import pandas as pd
aa = pd.read_csv('./Data/wine.csv')
aa.head()

Unnamed: 0,Wine,Alcohol,Malic.acid,Ash,Acl,Mg,Phenols,Flavanoids,Nonflavanoid.phenols,Proanth,Color.int,Hue,OD,Proline
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735
