## 简介
GRU层: LSTM层的简化版,只有两个控制单元
与LSTM的最大区别是LSTM通过更新记忆单元来控制状态,而GRU是直接控制状态
### reset gate(重置门:R)
生成一个新的隐藏状态,内容是对历史状态控制影响后的状态
计算: Hy_1=tanh(x*W_xh+(R.*H_pre)*W_hh+b_h)
### update gate(更新门:Z)
控制当前状态依赖历史状态多,还是隐藏状态多
计算: H=Z.*H_pre+(1-Z).*Hy_1
当Z都为1时全部保留历史状态; 当Z等于0时历史状态都需要控制


## 模型定义
GRU层原始实现

In [1]:
import torch
import pytorch_lightning as pl
from pytorch_lightning.callbacks import ModelCheckpoint
from pytorch_lightning.loggers import TensorBoardLogger

class Lit_GRUModel(pl.LightningModule):
    def __init__(self, vocab_size, nums_hidden, nums_layers, lr,sigma=0.01):
        super(Lit_GRUModel, self).__init__()
        self.save_hyperparameters()
        #reset gate
        self.w_xr=torch.nn.Parameter(torch.randn(vocab_size, nums_hidden)*sigma)
        self.w_hr=torch.nn.Parameter(torch.randn(nums_hidden, nums_hidden)*sigma)
        self.b_r=torch.nn.Parameter(torch.zeros(nums_hidden))
        #update gate
        self.w_xz=torch.nn.Parameter(torch.randn(vocab_size, nums_hidden)*sigma)
        self.w_hz=torch.nn.Parameter(torch.randn(nums_hidden, nums_hidden)*sigma)
        self.b_z=torch.nn.Parameter(torch.zeros(nums_hidden))
        #hidden state
        self.w_xh=torch.nn.Parameter(torch.randn(vocab_size, nums_hidden)*sigma) 
        self.w_hh=torch.nn.Parameter(torch.randn(nums_hidden, nums_hidden)*sigma)
        self.b_h=torch.nn.Parameter(torch.zeros(nums_hidden))
        #y
        self.w_hy=torch.nn.Parameter(torch.randn(nums_hidden, vocab_size)*sigma)
        self.b_y=torch.nn.Parameter(torch.zeros(vocab_size))

    def forward(self, x, h=None):
        if h is None:
            h=torch.zeros(x.size(0), self.hparams.nums_hidden)
        x=torch.nn.functional.one_hot(x, num_classes=self.hparams.vocab_size).float()
        output=[]
        for i in range(x.size(1)):
            r=torch.sigmoid(x[:,i]@self.w_xr+self.b_r+h@self.w_hr)
            z=torch.sigmoid(x[:,i]@self.w_xz+self.b_z+h@self.w_hz)
            h_hat=torch.tanh(x[:,i]@self.w_xh+self.b_h+r*(h@self.w_hh))
            h=z*h+(1-z)*h_hat
            output.append(h@self.w_hy+self.b_y)
        return torch.stack(output, dim=1), h

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_pred,_ = self(x) #y_pred.shape: (batch_size, seq_len, output_size)
        loss= torch.nn.functional.cross_entropy(y_pred.view(-1, y_pred.size(-1)), y.view(-1)) #输入是(batchsize* seq_len, vocab_size)和(batcsize* seq_len),拉平子序列计算单字符损失
        self.log('train_loss', loss, prog_bar=True, logger=True, on_epoch=True,on_step=True) 
        #perplexeity用于评估大段文本的好坏,单字符loss不适合评估大段文本
        self.log('train_perplexity', torch.exp(loss), prog_bar=True, logger=True, on_epoch=True,on_step=True)
        return loss
    
    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_pred,_ = self(x)
        loss= torch.nn.functional.cross_entropy(y_pred.view(-1, y_pred.size(-1)), y.view(-1))
        self.log('val_loss', loss, prog_bar=True, logger=True,on_epoch=True)
        self.log('val_perplexity', torch.exp(loss), prog_bar=True, logger=True,on_epoch=True)
        return loss
    
    
    def configure_optimizers(self):
        return torch.optim.SGD(self.parameters(), lr=self.hparams.lr)
    

## 模型定义(api)
和torch.RNN相同GRU也因为有num_layers参数,h需要转为3D:(num_layers,batch_size,hidden_size)


In [5]:
import torch
import pytorch_lightning as pl

import torch.nn as nn

class GRUModel_api(Lit_GRUModel):
    def __init__(self, vocab_size, nums_hidden, nums_layers, lr,sigma=0.01):
        super(GRUModel_api, self).__init__(vocab_size, nums_hidden, nums_layers, lr,sigma)
        self.gru=nn.GRU(input_size=vocab_size, hidden_size=nums_hidden, num_layers=nums_layers, batch_first=True)
        self.fc=nn.Linear(nums_hidden, vocab_size)

    def forward(self, x, h=None):
        if h is None:
            h=torch.zeros(self.hparams.nums_layers, x.size(0), self.hparams.nums_hidden)
        x=torch.nn.functional.one_hot(x, num_classes=self.hparams.vocab_size).float()
        x,h=self.gru(x,h)
        x=self.fc(x)
        return x,h
    
    
   
    



## 数据集加载

In [3]:
import requests
import os
import re
class LitLoadData_timeMachine(pl.LightningDataModule):
    def __init__(self, batch_size=32,seq_length=5,pin_memory=True,nums_train=10000,nums_val=5000):
        super().__init__()
        self.batch_size = batch_size
        self.seq_length = seq_length
        self.pin_memory = pin_memory
        self.nums_train = nums_train
        self.nums_val = nums_val
        self.prepare_data()
        self.corpus_indices, self.char_to_idx, self.idx_to_char, self.vocab_size = self.load_data_time_machine()
        
    def prepare_data(self):
        url = 'http://d2l-data.s3-accelerate.amazonaws.com/timemachine.txt'
        #文件是否存在
        if os.path.exists('../data/timemachine.txt'):
            return
        #下载文件
        r = requests.get(url, stream=True)
        with open('../data/timemachine.txt', 'wb') as f:
            f.write(r.content)

    def load_data_time_machine(self):
        with open('../data/timemachine.txt') as f:
            corpus_chars = f.read()
        #非字母替换为空格,并转为小写
        corpus_chars = re.sub('[^A-Za-z]+', ' ', corpus_chars).lower()
        #corpus_chars统计字符集,共26个字母+1个空格
        char_set=set(corpus_chars) 
        #增加'<unknown>'字符,防止用户输入非上述字母内容
        char_set.add('<unknown>')
        #索引到字符的映射
        idx_to_char = list(char_set) 
        #字符到索引的映射
        char_to_idx = dict([(char, i) for i, char in enumerate(idx_to_char)])
        vocab_size = len(char_to_idx)  #28个字符
        corpus_indices = [char_to_idx[char] for char in corpus_chars] # 将每个字符转化为索引
        return corpus_indices, char_to_idx, idx_to_char, vocab_size #返回索引列表,字符到索引的映射,索引到字符的映射,字典大小

    def setup(self, stage=None):
        self.corpus_indices, self.char_to_idx, self.idx_to_char, self.vocab_size = self.load_data_time_machine()
        #self.corpus_indices = torch.tensor(self.corpus_indices) 
        #self.train_indices = self.corpus_indices[0: int(len(self.corpus_indices) * 0.8)] #前80%作为训练集
        #self.valid_indices = self.corpus_indices[int(len(self.corpus_indices) * 0.8):] #后20%作为验证集
        
        #d2l: step=1提取子序列,子序列个数=字符总数-子序列长度; 常规方法是等分,子序列个数=字符总数/子序列长度
        array=torch.tensor([self.corpus_indices[i:i+self.seq_length+1] for i in range(len(self.corpus_indices)-self.seq_length)])
        self.train_indices = array[0: self.nums_train] 
        self.valid_indices = array[self.nums_train: self.nums_train + self.nums_val]

    def train_dataloader(self):
        train_dataset = self.__dateset_d2l(self.train_indices)
        return torch.utils.data.DataLoader(train_dataset, batch_size=self.batch_size, shuffle=True, num_workers=4, pin_memory=self.pin_memory)
    
    def val_dataloader(self):
        valid_dataset = self.__dateset_d2l(self.valid_indices)
        return torch.utils.data.DataLoader(valid_dataset, batch_size=self.batch_size, shuffle=False, num_workers=4, pin_memory=self.pin_memory)

    #子序列个数=N,输入取1:N-1,输出取2:N
    def __dateset_d2l(self, data_indices):
        return torch.utils.data.TensorDataset(data_indices[:, :-1], data_indices[:, 1:])

    #用于创建数据集对象。它根据序列长度将数据索引分割成多个样本，并将每个样本的输入和目标数据分别返回
    def __dataset(self, data_indices):
        num_samples = (len(data_indices) - 1) // self.seq_length #样本个数
        data_indices = data_indices[:num_samples * self.seq_length] #只取前num_samples * self.seq_length个字符
        data_indices = data_indices.reshape((num_samples, self.seq_length)) 
        return torch.utils.data.TensorDataset(data_indices[:, :-1], data_indices[:, 1:]) #每个样本的输入是前seq_length-1个字符,输出是后seq_length-1个字符

## 工作流程

In [None]:
if __name__ == '__main__':
    data_module = LitLoadData_timeMachine(batch_size=1024,seq_length=32,pin_memory=False)
    data_module.setup()

    ##############RNN模型训练################
    model=Lit_GRUModel(
        vocab_size=data_module.vocab_size,
        nums_hidden=32,
        nums_layers=1,
        lr=4
    )
    model_api=GRUModel_api(
        vocab_size=data_module.vocab_size,
        nums_hidden=32,
        nums_layers=1,
        lr=4
    )

    checkpoint_callback=pl.callbacks.ModelCheckpoint(
        monitor='val_perplexity',
        dirpath='checkPoint-logs/RNNModel_v3',
        filename='RNNModel_v3_{epoch:02d}_{val_perplexity:.2f}',
        #save_top_k=3, # save the top 3 models
        mode='min',
    )
    trainer = pl.Trainer(
        max_epochs=50,
        gradient_clip_algorithm='norm', #梯度裁剪算法,等同clip_gradients(self, grad_clip_val, model)
        gradient_clip_val=1,
        accelerator='cpu',
        #devices=1,
        logger=TensorBoardLogger('tensorBoard-logs/', name='RNNModel_v3'),
        callbacks=[checkpoint_callback]
                        )
    trainer.fit(model, data_module) 
    #trainer.fit(model_api, data_module)

                                                                   

GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

  | Name         | Type | Params | Mode
---------------------------------------------
  | other params | n/a  | 6.8 K  | n/a 
---------------------------------------------
6.8 K     Trainable params
0         Non-trainable params
6.8 K     Total params
0.027     Total estimated model params size (MB)
0         Modules in train mode
0         Modules in eval mode


Epoch 3:   0%|          | 0/10 [00:00<?, ?it/s, v_num=1, train_loss_step=2.810, train_perplexity_step=16.70, val_loss=2.790, val_perplexity=16.20, train_loss_epoch=2.840, train_perplexity_epoch=17.00]         

# tensorboard

In [23]:
%load_ext tensorboard
%tensorboard --logdir pytorch/tensorBoard-logs/RNNModel_v1

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard
