# Character Recurrent Neural Network
- Mimicing Shakespeare's writing style
- Long short-term memory(LSTM)

![alt text](./LSTM.png)

## 1. Settings
### 1) Import required libraries

In [2]:
import torch
import torch.nn as nn
from torch.autograd import Variable

In [3]:
import unidecode
import string
import random
import re
import time, math

## 2) Hyperparameter

In [151]:
num_epochs = 5000
print_every = 100
plot_every = 10
chunk_len = 200
hidden_size = 100
batch_size =1
num_layers = 1
lr = 0.002
NUM_STEPS = 500
DATA_PATH = './data/abstract.txt'

## 2. Data
### 1) Prepare characters

In [12]:
all_characters = string.printable
n_characters = len(all_characters)
print(all_characters)
print('num_chars = ', n_characters)

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ 	

num_chars =  100


In [33]:
def vocab_encode(text, vocab):
    return [vocab.index(x) + 1 for x in text if x in vocab]


def vocab_decode(array, vocab):
    return ''.join([vocab[x - 1] for x in array])

In [34]:
vocab = (" $%'()+,-./0123456789:;=?ABCDEFGHIJKLMNOPQRSTUVWXYZ""\\^_abcdefghijklmnopqrstuvwxyz{|}")

83

In [88]:
def read_data(filename, all_characters, window=NUM_STEPS, overlap=NUM_STEPS // 2):
    for text in open(filename, encoding='utf-8'):
        text = vocab_encode(text, all_characters)
        for start in range(0, len(text) - window, overlap):
            chunk = text[start: start + window]
            chunk += [0] * (window - len(chunk))
            return chunk

In [89]:
vocab_decode(read_data(DATA_PATH,all_characters),all_characters)

'ABSTRACT Recently, many modifications to the McCulloch/Pitts model have been proposed where both learning and forgetting occur. Given that the network never saturates (ceases to function effectively due to an overload of information), the learning updates can continue indefinitely. For these networks, we need to introduce performance measmes in addition to the information capacity to evaluate the different networks. We mathematically define quantities such as the plasticity of a network, the eff'

In [90]:
aa = file.split('\n')

In [91]:
aa[1]

'Abstract MURPHY consists of a camera looking at a robot arm, with a connectionist network architecture situated in between. By moving its arm through a small, representative sample of the 1 billion possible joint configurations, MURPHY learns the relationships, backwards and forwards, between the positions of its joints and the state of its visual field. MURPHY can use its internal model in the forward direction to "envision" sequences of actions for planning purposes, such as in grabbing a visually presented object, or in the reverse direction to "imitate", with its arm, autonomous activity in its visual field. Furthermore, by taking explicit advantage of continuity in the mappings between visual space and joint space, MURPHY is able to learn non-linear mappings with only a single layer of modifiable weights. '

In [126]:
def cleaner(text):
    text = re.sub('[^\w ]', '', text)
    return text

In [132]:
cleaner('aa[1]λhk!#%')

TypeError: expected string or bytes-like object

### 2) Get text data

In [93]:
file = unidecode.unidecode(open(DATA_PATH).read())
file_len = len(file)
print('file_len =', file_len)

file_len = 5315132


In [94]:
file[1]

'B'

## 3. Functions for text processing
### 1) Random Chunk

In [108]:
def random_chunk():
    start_index = random.randint(0, file_len - chunk_len)
    print(start_index)
    end_index = start_index + chunk_len + 1
    return file[start_index:end_index]

print(random_chunk())

5146557
: since queue-regret cannot be larger than classical regret, results for the standard MAB problem give algorithms that ensure queue-regret increases no more than logarithmically in time. Our paper show


### 2) Character to tensor

In [135]:
def char_tensor(string):
    tensor = torch.zeros(len(string)).long()
    for c in range(len(string)):
        tensor[c] = all_characters.index(string[c])
    return Variable(tensor).cuda(3)

print(char_tensor('ABCdef'))

Variable containing:
 36
 37
 38
 13
 14
 15
[torch.cuda.LongTensor of size 6 (GPU 3)]



In [136]:
def char_tensor2(string):
    tensor = torch.zeros(len(string)).long()
    for c in range(len(string)):
        print(string[c])
        tensor[c] = all_characters.index(string[c])
    return Variable(tensor).cuda(3)

print(char_tensor('ABCdef'))

Variable containing:
 36
 37
 38
 13
 14
 15
[torch.cuda.LongTensor of size 6 (GPU 3)]



### 3) Chunk into input & label

In [137]:
def random_training_set():    
    chunk = random_chunk()
    inp = char_tensor(chunk[:-1])
    target = char_tensor(chunk[1:])
    return inp, target

## 3. Model & Optimizer
### 1) Model

In [138]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super(RNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers
        self.encoder = nn.Embedding(input_size, hidden_size)
        self.rnn = nn.LSTM(hidden_size,hidden_size,num_layers)
        self.decoder = nn.Linear(hidden_size, output_size)
    def forward(self, input, hidden,cell):
        out = self.encoder(input.view(1,-1))
        out,(hidden,cell) = self.rnn(out,(hidden,cell))
        out = self.decoder(out.view(batch_size,-1))
        return out,hidden,cell
    def init_hidden(self):
        hidden = Variable(torch.zeros(num_layers,batch_size,hidden_size)).cuda(3)
        cell = Variable(torch.zeros(num_layers,batch_size,hidden_size)).cuda(3)
        return hidden,cell
model = RNN(n_characters, hidden_size, n_characters, num_layers).cuda(3)

In [139]:
inp = char_tensor("A")
print(inp)
hidden,cell = model.init_hidden()
print(hidden.size())

out,hidden,cell = model(inp,hidden,cell)
print(out.size())

Variable containing:
 36
[torch.cuda.LongTensor of size 1 (GPU 3)]

torch.Size([1, 1, 100])
torch.Size([1, 100])


### 2) Loss & Optimizer

In [140]:
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
loss_func = nn.CrossEntropyLoss()

### 3) Test function

In [149]:
def test():
    start_str = "A"
    inp = char_tensor(start_str)
    hidden,cell = model.init_hidden()
    x = inp
    print(start_str,end="")
    for i in range(500):
        output,hidden,cell = model(x,hidden,cell)
        output_dist = output.data.view(-1).div(0.8).exp()
        top_i = torch.multinomial(output_dist, 1)[0]
        predicted_char = all_characters[top_i]
        print(predicted_char,end="")
        x = char_tensor(predicted_char)

## 4. Train

In [155]:
for i in range(num_epochs):
    total = char_tensor(cleaner(aa[random.randint(0,len(aa))]))
    inp = total[:-1]
    label = total[1:]
    hidden,cell = model.init_hidden()
    loss = 0
    optimizer.zero_grad()
    for j in range(chunk_len-1):
        x  = inp[j]
        y_ = label[j]
        y,hidden,cell = model(x,hidden,cell)
        loss += loss_func(y,y_)
    loss.backward()
    optimizer.step()
    if i % 100 == 0:
        print("\n",loss/chunk_len,"\n")
        test()
        print("\n\n")


 Variable containing:
 0.9900
[torch.cuda.FloatTensor of size 1 (GPU 3)]
 

Abstract Computed in the firsted scene such that the convalientanize two socies iss a training primation to fittee fther for feedbade examples that generalizing are include the similar to visual and show that a large of coonential regularized learning is ans hierarchical scale framework similarization of a generalized the nonparameter connection tegorments of linear data is the consinigned in or the computation sharacking deised on different learning that a recon two a sample a class that Markov 



 Variable containing:
 1.2056
[torch.cuda.FloatTensor of size 1 (GPU 3)]
 

Abstract We feature Networks have regret are untroussion when tasks Interaly be learning estimators the which lour learning of the signition sets in a filt action of Genepure has workor the conventional expledgg communicating method by additionally inversion of over discrelation to palession but propose addression In difficults of a use bro

Abstract We propose pringlity of a deteted by selectivable problem of a linear complection of the called regression that we present their tach function is a compared by sample we study is the programmination of reseverally vection to yearce classify occlusing the method an approximate multiple approximate signal lateration and assumple lateral shown even The classifial lique one different can be impromanizing clusterion The presentation from the problem of conjugbe to the multiplace an agent metr



 Variable containing:
 0.9394
[torch.cuda.FloatTensor of size 1 (GPU 3)]
 

Abstract Hyppirication in presented in the Discale continuous classification explore softs of the turnoluted expercently stimul not humas models neural networks that robinal and polutation This many a shoving out that this paper the either algorithm of expression that computive methods for abt settine simple approximation in large data bultable sequence Bayesiar archave in prove as the problated used We prodemented 

Abstract We problem of observing achieving factions gnolis bandit as optimization have sconer of the particular and computers parameters that iteolodeomics to the intrievine learning a goal for similarity contain in the a nonsimized by experiments of relation in mapsing exammined to the integrate the nonnegences of the task is a multiclass latent and solution We introduce in learner internading simultain computer with the problem of the high we be explore shordererest is a studies of mapse or com



 Variable containing:
 1.4331
[torch.cuda.FloatTensor of size 1 (GPU 3)]
 

Abstract This paper we single of sets of set of the problem of the statistical and effecty multiple estimated for weights of the firsting on signal based on the in the existing functions learning or decisions of localization of regularization of a setting using the canters of extracting learning when the neural networks in componenves visual in two singletermation methods or strobutions that data features to a matri

IndexError: list index out of range