<a href="https://colab.research.google.com/github/ttogle918/AI_practice/blob/main/%EA%B2%BD%EB%9F%89%ED%99%94/Dynamic_Quantization_PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dynamic Quantization(양자화) 튜토리얼 따라하기
[(베타) LSTM 기반 단어 단위 언어 모델의 동적 양자화, PyTorch 튜토리얼](https://tutorials.pytorch.kr/advanced/dynamic_quantization_tutorial.html)

양자화는 float타입인 Tensor 가중치와 활성화 함수를 int형으로 변환하여 모델의 크기를 줄이고 추론(test) 속도를 높이는 방법이다.

- float -> int
- 모델의 크기 ↓, 속도 ↑(빠르게), 성능(loss)은 그대로

In [None]:
!pip install torchtext torchdata 

In [1]:
import os
from io import open
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchtext.datasets import WikiText2

In [2]:
class LSTMModel(nn.Module):
    def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5):
        super(LSTMModel, self).__init__()
        self.drop = nn.Dropout(dropout)
        self.encoder = nn.Embedding(ntoken, ninp)
        self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout)
        self.decoder = nn.Linear(nhid, ntoken)

        self.init_weights()

        self.nhid = nhid
        self.nlayers = nlayers

    def init_weights(self):
        initrange = 0.1
        self.encoder.weight.data.uniform_(-initrange, initrange)
        self.decoder.bias.data.zero_()
        self.decoder.weight.data.uniform_(-initrange, initrange)

    def forward(self, input, hidden):
        emb = self.drop(self.encoder(input))
        output, hidden = self.rnn(emb, hidden)
        output = self.drop(output)
        decoded = self.decoder(output)
        return decoded, hidden

    def init_hidden(self, bsz):
        weight = next(self.parameters())
        return (weight.new_zeros(self.nlayers, bsz, self.nhid),
                weight.new_zeros(self.nlayers, bsz, self.nhid))


[ShardingFilterIterDataPipe 문서](https://www.ccoderun.ca/programming/doxygen/pytorch/classtorch_1_1utils_1_1data_1_1datapipes_1_1iter_1_1grouping_1_1ShardingFilterIterDataPipe.html#a671003f20672f8f6b74d2c03394ed685)

In [3]:
dataset = WikiText2(root='.data', split=('train', 'valid', 'test'))
dataset

(ShardingFilterIterDataPipe,
 ShardingFilterIterDataPipe,
 ShardingFilterIterDataPipe)

In [4]:
for i, n in enumerate(dataset[0]) :
  print(n)
  if i == 10 :
    break

 

 = Valkyria Chronicles III = 

 

 Senjō no Valkyria 3 : <unk> Chronicles ( Japanese : 戦場のヴァルキュリア3 , lit . Valkyria of the Battlefield 3 ) , commonly referred to as Valkyria Chronicles III outside Japan , is a tactical role @-@ playing video game developed by Sega and Media.Vision for the PlayStation Portable . Released in January 2011 in Japan , it is the third game in the Valkyria series . <unk> the same fusion of tactical and real @-@ time gameplay as its predecessors , the story runs parallel to the first game and follows the " Nameless " , a penal military unit serving the nation of Gallia during the Second Europan War who perform secret black operations and are pitted against the Imperial unit " <unk> Raven " . 

 The game began development in 2010 , carrying over a large portion of the work done on Valkyria Chronicles II . While it retained the standard features of the series , it also underwent multiple adjustments , such as making the game more <unk> for series newcomers . 

In [5]:
class Dictionary(object):
    def __init__(self):
        self.word2idx = {}
        self.idx2word = []

    def add_word(self, word):
        if word not in self.word2idx:
            self.idx2word.append(word)
            self.word2idx[word] = len(self.idx2word) - 1
        return self.word2idx[word]

    def __len__(self):
        return len(self.idx2word)


class Corpus(object):
    def __init__(self, dataset):
        self.dictionary = Dictionary()
        self.train = self.tokenize(dataset[0])
        self.valid = self.tokenize(dataset[1])
        self.test = self.tokenize(dataset[2])

    def tokenize(self, dataset):
        """텍스트 파일 토큰화"""
        # 사전에 단어 추가
        for line in dataset:
            words = line.split() + ['<eos>']
            for word in words:
                self.dictionary.add_word(word)

        # 파일 내용 토큰화
        idss = []
        for line in dataset:
            words = line.split() + ['<eos>']
            ids = []
            for word in words:
                ids.append(self.dictionary.word2idx[word])
            idss.append(torch.tensor(ids).type(torch.int64))
        ids = torch.cat(idss)

        return ids

corpus = Corpus(dataset)



In [6]:
corpus.train, corpus.dictionary

(tensor([ 0,  1,  2,  ..., 15,  0,  0]),
 <__main__.Dictionary at 0x7fde527d0710>)

예제에서는 사전학습된 모델을 불러와서 파라미터 저장을 했지만, loss가 같은지, time이 줄어들었는지, Size(저장될 용량)이 줄었는지 확인하기 위한 양자화 예제이기 때문에 그렇게하지 않았다. 

In [7]:
ntokens = len(corpus.dictionary)

model = LSTMModel(
    ntoken = ntokens,
    ninp = 768,
    nhid = 256,
    nlayers = 5,
)

model.eval()
print(model)

LSTMModel(
  (drop): Dropout(p=0.5, inplace=False)
  (encoder): Embedding(33278, 768)
  (rnn): LSTM(768, 256, num_layers=5, dropout=0.5)
  (decoder): Linear(in_features=256, out_features=33278, bias=True)
)


In [8]:
input_ = torch.randint(ntokens, (1, 1), dtype=torch.long)
hidden = model.init_hidden(1)
temperature = 1.0
num_words = 1000

with open('/out.txt', 'w') as outf:
    with torch.no_grad():  # 기록을 추적하지 않습니다.
        for i in range(num_words):
            output, hidden = model(input_, hidden)
            word_weights = output.squeeze().div(temperature).exp().cpu()
            word_idx = torch.multinomial(word_weights, 1)[0]
            input_.fill_(word_idx)

            word = corpus.dictionary.idx2word[word_idx]

            outf.write(str(word.encode('utf-8')) + ('\n' if i % 20 == 19 else ' '))

            if i % 100 == 0:
                print('| Generated {}/{} words'.format(i, 1000))

with open('/out.txt', 'r') as outf:
    all_output = outf.read()
    print(all_output)

| Generated 0/1000 words
| Generated 100/1000 words
| Generated 200/1000 words
| Generated 300/1000 words
| Generated 400/1000 words
| Generated 500/1000 words
| Generated 600/1000 words
| Generated 700/1000 words
| Generated 800/1000 words
| Generated 900/1000 words
b'straw' b'Promise' b'Disaster' b'incoherent' b'Gravity' b'Alton' b'Scholarship' b'trout' b'clearances' b'shirts' b'170' b'550' b'Levy' b'eclipse' b'exploration' b'elders' b'Tr\xc3\xa4umerei' b'creating' b'Facelift' b'sustaining'
b'Bend' b'faintest' b'Sub' b'Stent' b'spacecraft' b'albedo' b'Monmouthshire' b'audiobook' b'Lion' b'MSF' b'barrows' b'slightest' b'religions' b'35th' b'sings' b'Humphrey' b'Relief' b'outraged' b'Havelange' b'hp'
b'mortar' b'naya' b'invade' b'graphite' b'vessels' b'devil' b'LP' b'Sussex' b'fittoni' b'go' b'assassinate' b'Sami' b'sits' b'Cabrera' b'chloride' b'Mets' b'Accolades' b'guardian' b'hairstyle' b'appointments'
b'Acting' b'Trinsey' b'intervene' b'respiratory' b'armed' b'globe' b'usurpers' b'

# helper 함수를 정의

In [9]:
bptt = 25
criterion = nn.CrossEntropyLoss()
eval_batch_size = 1

# 테스트 데이터셋 만들기
def batchify(data, bsz):
    # 데이터셋을 bsz 부분으로 얼마나 깔끔하게 나눌 수 있는지 계산합니다.
    nbatch = data.size(0) // bsz
    # 깔끔하게 맞지 않는 추가적인 부분(나머지들)을 잘라냅니다.
    data = data.narrow(0, 0, nbatch * bsz)
    # 데이터에 대하여 bsz 배치들로 동등하게 나눕니다.
    return data.view(bsz, -1).t().contiguous()

test_data = batchify(corpus.test, eval_batch_size)

# 평가 함수들
def get_batch(source, i):
    seq_len = min(bptt, len(source) - 1 - i)
    data = source[i:i+seq_len]
    target = source[i+1:i+1+seq_len].reshape(-1)
    return data, target

def repackage_hidden(h):
  """은닉 상태를 변화도 기록에서 제거된 새로운 tensor로 만듭니다."""

  if isinstance(h, torch.Tensor):
      return h.detach()
  else:
      return tuple(repackage_hidden(v) for v in h)

def evaluate(model_, data_source):
    # Dropout을 중지시키는 평가 모드로 실행합니다.
    model_.eval()
    total_loss = 0.
    hidden = model_.init_hidden(eval_batch_size)
    with torch.no_grad():
        for i in range(0, data_source.size(0) - 1, bptt):
            data, targets = get_batch(data_source, i)
            output, hidden = model_(data, hidden)
            hidden = repackage_hidden(hidden)
            output_flat = output.view(-1, ntokens)
            total_loss += len(data) * criterion(output_flat, targets).item()
    return total_loss / (len(data_source) - 1)

# 동적 양자화 테스트

모델에서 동적 양자화를 호출하여 기존 모델과 비교해보았다.

달라지는 부분 : 명시한 layer가 달라진다.
- rnn(lstm) : LSTM -> DynamicQuantizedLSTM
- decoder(linear) : LSTM -> DynamicQuantizedLSTM(..dtype=torch.qint8, qscheme=torch.per_tensor_affine)

Size(149->114), time(211->100)은 줄어들었고, loss는 같았다(근사하다).

[quantize_dynamic docs](https://pytorch.org/docs/stable/generated/torch.quantization.quantize_dynamic.html)를 살펴보았다.

1. float 모델을 동적(가중치만) 양자화된 모델로 변환한다.
2. 특정 모듈(명시한 모듈)을 양자화 버전과 결과값을 가지는 양자화 모델로 대체한다.
3. 가장 간단한 사용법은 float16을 qint8이 되도록 변형하는 것이다. 이것은 보통 가중치 크기가 큰 레이어에 대해 수행된다.
4. 파라미터인 qconfig및 mapping으로 세밀한 제어도 가능하다.

In [10]:
import torch.quantization

quantized_model = torch.quantization.quantize_dynamic(
    model, {nn.LSTM, nn.Linear}, dtype=torch.qint8
)
print(quantized_model)

LSTMModel(
  (drop): Dropout(p=0.5, inplace=False)
  (encoder): Embedding(33278, 768)
  (rnn): DynamicQuantizedLSTM(768, 256, num_layers=5, dropout=0.5)
  (decoder): DynamicQuantizedLinear(in_features=256, out_features=33278, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
)


In [13]:
print(model)

LSTMModel(
  (drop): Dropout(p=0.5, inplace=False)
  (encoder): Embedding(33278, 768)
  (rnn): LSTM(768, 256, num_layers=5, dropout=0.5)
  (decoder): Linear(in_features=256, out_features=33278, bias=True)
)


In [11]:
def print_size_of_model(model):
    torch.save(model.state_dict(), "temp.p")
    print('Size (MB):', os.path.getsize("temp.p")/1e6)
    os.remove('temp.p')

print_size_of_model(model)
print_size_of_model(quantized_model)

Size (MB): 149.069856
Size (MB): 114.07785


In [12]:
# 메모: 양자화 된 모델은 단일 스레드로 실행되기 때문에 단일 스레드 비교를 위해
# 스레드 수를 1로 설정했습니다.

torch.set_num_threads(1)

def time_model_evaluation(model, test_data):
    s = time.time()
    loss = evaluate(model, test_data)
    elapsed = time.time() - s
    print('''loss: {0:.3f}\nelapsed time (seconds): {1:.1f}'''.format(loss, elapsed))

time_model_evaluation(model, test_data)
time_model_evaluation(quantized_model, test_data)

loss: 10.416
elapsed time (seconds): 211.6
loss: 10.416
elapsed time (seconds): 100.9
