<a href="https://colab.research.google.com/github/vladimiralencar/DeepLearning-LANA/blob/master/LSTM/StackedLSTMs_Tweets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Gerando Tweets com Stacked LSTMs 

Usaremos Tweets do Presidente dos EUA, Donald Trump, para treinar um modelo LSTM de 2 camadas e então ensinar o modelo a gerar tweets automaticamente.

O conjunto de dados está disponível no site de compretições em Data Science <a href="https://www.kaggle.com/kingburrito666/better-donald-trump-tweets">Kaggle</a> e foi extraído do Twitter de Donald Trump: https://twitter.com/realDonaldTrump

### 1. Feature Engineering

Os dados de texto bruto não podem ser fornecidos diretamente no modelo LSTM. Nós devemos fazer engenharia dos atributos primeiro antes de podermos seguir para a etapa de modelagem.

In [0]:
# Imports
import numpy as np
import pandas as pd

In [0]:
# Carregando o dataset
data = pd.read_csv('tweets.csv')

In [0]:
data.head()

Unnamed: 0,Date,Time,Tweet_Text,Type,Media_Type,Hashtags,Tweet_Id,Tweet_Url,twt_favourites_IS_THIS_LIKE_QUESTION_MARK,Retweets,Unnamed: 10,Unnamed: 11
0,16-11-11,15:26:37,Today we express our deepest gratitude to all ...,text,photo,ThankAVet,7.97e+17,https://twitter.com/realDonaldTrump/status/797...,127213,41112,,
1,16-11-11,13:33:35,Busy day planned in New York. Will soon be mak...,text,,,7.97e+17,https://twitter.com/realDonaldTrump/status/797...,141527,28654,,
2,16-11-11,11:14:20,Love the fact that the small groups of protest...,text,,,7.97e+17,https://twitter.com/realDonaldTrump/status/797...,183729,50039,,
3,16-11-11,2:19:44,Just had a very open and successful presidenti...,text,,,7.97e+17,https://twitter.com/realDonaldTrump/status/796...,214001,67010,,
4,16-11-11,2:10:46,A fantastic day in D.C. Met with President Oba...,text,,,7.97e+17,https://twitter.com/realDonaldTrump/status/796...,178499,36688,,


Tudo o que precisamos é o campo ** Tweet_Text **. Vamos combinar todas as linhas para criar um corpus de texto, concatenando tweets, mas separando-os com duas novas linhas:

In [0]:
text = '\n\n'.join(data['Tweet_Text'].values)
print(text[:400])

Today we express our deepest gratitude to all those who have served in our armed forces. #ThankAVet https://t.co/wPk7QWpK8Z

Busy day planned in New York. Will soon be making some very important decisions on the people who will be running our government!

Love the fact that the small groups of protesters last night have passion for our great country. We will all come together and be proud!

Just h


Para reduzir o tamanho do nosso espaço de recursos e o tempo de treinamento, removemos caracteres raros:

In [0]:
from collections import Counter
import re

In [0]:
cntr = Counter(text)
rare = list(np.asarray(list(cntr.keys()))[np.asarray(list(cntr.values())) < 300])
for c in rare:
    text = re.sub('[' + c + ']', '', text)

Aqui está como o início do corpus se parece:

In [0]:
print(text[:1000])

Today we express our deepest gratitude to all those who have served in our armed forces. #ThankAVet https://t.co/wPk7QWpK8Z

Busy day planned in New York. Will soon be making some very important decisions on the people who will be running our government!

Love the fact that the small groups of protesters last night have passion for our great country. We will all come together and be proud!

Just had a very open and successful presidential election. Now professional protesters, incited by the media, are protesting. Very unfair!

A fantastic day in D.C. Met with President Obama for first time. Really good meeting, great chemistry. Melania liked Mrs. O a lot!

Happy 241st birthday to the U.S. Marine Corps! Thank you for your service!! https://t.co/Lz2dhrXzo4

Such a beautiful and important evening! The forgotten man and woman will never be forgotten again. We will all come together as never before

Watching the returns at 9:45pm.
#ElectionNight #MAGA__ https://t.co/HfuJeRZbod

RT @IvankaT

O corpus tem 857177 caracteres e há 78 caracteres únicos dentro dele:

In [0]:
print('Total de Caracteres no Corpus: {:,d}'.format(len(text)))
chars = sorted(list(set(text)))
print('Total de Caracteres Únicos:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

Total de Caracteres no Corpus: 857,177
Total de Caracteres Únicos: 78


Agora, vamos cortar o texto em sequências semi-redundantes de caracteres * maxlen * para que ele possa ser alimentado em um modelo LSTM:

In [0]:
maxlen = 50
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Número de Sequências: {:,d}'.format(len(sentences)))

Número de Sequências: 285,709


Então, vamos vetorizar as frases:

In [0]:
X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

In [0]:
X[0]

array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ..., 
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]], dtype=bool)

### 2. Modelo Generativo

In [0]:
import random
import sys
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop

Using TensorFlow backend.


Vamos criar algumas funções reutilizáveis que podem que podem gerar texto para nosso modelo generativo.

In [0]:
cntr = Counter(text)
cntr_sum = sum(cntr.values())
char_probs = list(map(lambda c: cntr[c] / cntr_sum, chars))

In [0]:
def sample(preds):
    preds = np.asarray(preds).astype('float64')
    preds = preds / np.sum(preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [0]:
def generate(model, length, seed=''):
    
    if len(seed) != 0:
        sys.stdout.write(seed)
    
    generated = seed
    sentence = seed
    
    for i in range(length):
        x = np.zeros((1, maxlen, len(chars)))

        padding = maxlen - len(sentence)
        
        for i in range(padding):
            x[0, i] = char_probs # pad usando os anteriores
            
        for t, char in enumerate(sentence):
            x[0, padding + t, char_indices[char]] = 1.

        preds = model.predict(x, verbose=0)[0]
        next_index = sample(preds)
        next_char = indices_char[next_index]

        sentence = sentence[1:] + next_char
        generated += next_char
        
        sys.stdout.write(next_char)
        sys.stdout.flush()
        
    return generated

Agora, vamos construir o grafo da nossa rede neural. Depois, vamos treinar nosso modelo e exibir algumas amostras em todas as épocas. No final do treinamento, salvamos o modelo para que possamos reutilizá-lo rapidamente no futuro.

In [0]:
from os.path import isfile
from keras.models import load_model

MODEL_PATH = 'stacked-lstm-2-layers-128-hidden.h5'

if isfile(MODEL_PATH):
    model = load_model(MODEL_PATH)
else:
    N_HIDDEN = 128

    # Modelo
    model = Sequential()
    model.add(LSTM(N_HIDDEN, dropout = 0.1, input_shape = (maxlen, len(chars)), return_sequences = True))
    model.add(LSTM(N_HIDDEN, dropout = 0.1))
    model.add(Dense(len(chars), activation = 'softmax'))

    # Otimizador
    optimizer = RMSprop(lr = 0.01)
    
    # Compilação
    model.compile(loss = 'categorical_crossentropy', optimizer = optimizer)

    # Imprime amostras a cada época
    for iteration in range(1, 40):
        print('\n')
        print('-' * 50)
        print('\nIteração', iteration)
        model.fit(X, y, batch_size=3000, epochs=1)

        print('\n-------------------- Tweet Gerado Pelo Modelo Nesta Iteração ---------------------\n')

        rand = np.random.randint(len(text) - maxlen)
        seed = text[rand:rand + maxlen]
        generate(model, 400, seed)

    model.save(MODEL_PATH)

Agora vamos experimentar o modelo!

Usando a primeira frase deste <a href='https://twitter.com/realDonaldTrump/status/890764622852173826'>tweet</a> como semente, vamos tentar continuar a frase de Trump e ver quais coisas interessantes o nosso modelo pode dizer:

In [0]:
sample_tweet_start = 'Go Republican Senators, Go!'
_ = generate(model, 200, sample_tweet_start)

Go Republican Senators, Go! National in feight better!

""@election: @realDonaldTrump what is rient, senution unfilter support are usituliated. http://t.co/uVeXkEax9G

RT @LSo: Debate @RealDonaldTrump it wascys got negtrandelic

In [0]:
sample_tweet_start = 'immigration'
for i in range(10):
    _ = generate(model, 200, sample_tweet_start)
    print('\n=========================================\n')

immigration They Mr. Weranshoun The Medians I should be ward. http://t.co/UTING:_! Ticketers"__

"@Separmtlims Republicans #ImWithYou #POTUSS" ۪p scrunges will findle and Massachusers @realDonaldTrump is. They a

immigration is sntaining Carsention #trumpTannel:
https://t.coV34uch #Debates2016 @ChrisWe Bost says Trump 4nether @Chiler2s2015
@RuberNBGH TO LOVISDINGA tHATER TRUMP IN TEE care Cruz even 2006 We #Supiccompenne

immigration the mess! This will find supporters! #MakeAmericaGAKE4 @DanScavino"
http://t.co/ eiculing

Just release wao Rulie on #Washingtonjoa Usens, #MakeAmericaGreat informerage great known!   https://t.coNQT

immigration: just openly perhams why will dest ROBINGEcTNOUR

I will be in the numbers.

Thank you for stall Crooked Hillary were allowed Melania_ - 10py have cruzy incredib job @realDonaldTrump Trump way! #Trum

immigration #MakeAmericaG120: @realDonaldTrump: The #VoteTrumpSPRO HIGED
https://t.co/aXUGE on #Trump2016

RIb @AMPRESIORTORALISTERROUPR DU FOR!

"@

In [0]:
sample_tweet_start = 'America'
for i in range(10):
    _ = generate(model, 200, sample_tweet_start)
    print('\n=========================================\n')

America Trump:
"I the payls will making slogchy #DrainThe http://tttpripadies if he was fancering" will voterst Carson, Jurinacive :  Obama Turnout:

Thank you. Thank. Many missies: https:// 8d16/NIFL TOOAL 

America

Hook luty it quehtion Sweeke Ted Cruzicins @nytimes"

"@menity_ #MakeAmerican Marco RubioS1:
#Trump2016: When I extweps #DemDebate Bearmins who very anyover Irencelgen @realDonald Trump will no open

American Be great readled readly #JebBush https://Thcrporian scollins his duning DNC hes insteasled will not.This will be will cribilast Old #MAGA!_ https://t.cv/NT @gratent is a liar if he care Trump starte

America

 #DrainTheroors #makesee @realDong
Bills is, applannel Lynotk2016

Ordeq XVCYX ! #Trump2016

Great So like it is the Outer @CNN They Alty incompeteD

Congraind!"

"@kegyropwerz Heads a dast? Used ha

America

Just interview: Get Trumps #IALILA:
https://C6/7/6 Ghean http://t.clussed Mr Trump: Hers fically in a black. I purter did

Canaan GOP:
https://@realDong #M

In [0]:
sample_tweet_start = 'China'
for i in range(10):
    _ = generate(model, 200, sample_tweet_start)
    print('\n=========================================\n')

China: http:/021 #Crooked Get will helpechates Trump @onlyZ MOME #RT https:/250 milliss in making @NYDamshorder

Gies When ROGTAMINYHY: https:_ https:_

Carron 4 Dem flazico4 TMATERIRINAS @realDon. He said

China delement 10P ta5 US @glue_

Unsela"

"@thay one officelost off, Really I begendands. They 11 in joz

Why http:/0016 #MAPRICA @Payly They Washers:_ #MAGA__https:

Puring Hillaryduch:_ #MakeAme 8 38 da

China Trump Hillary: #MakeAmerics:

NIVEME Trump: Opher 11 #IthYor, we can other Is finess"  Thank"

I will lead 9u, Dont #AZPricary #GOPDebare should Trugp  CLAN Emergemen, helliss #Shown, @megyn Trump Yo

China on The Only #Trump #Crooked  @nbcsh:
#Orisbicare? http:/15/https:_.
https:_"

Thanks VOTE Posencusion? @thes!

"@Manically Mix Henderson off. the care @Girmy we can stand whilen.", Ohio is ofd, JOBHI

Chinac 12/3 Enducals 130

#MakeAmerics is himses 4nmight not records #BigLeblyymary presides went really voter #Americs the Pail #Inter #DemDebate and I WILL!

Thanks https

In [0]:
sample_tweet_start = 'Democratic'
for i in range(10):
    _ = generate(model, 200, sample_tweet_start)
    print('\n=========================================\n')

Democratic

I hoped a join me today Gomgents with @Savun_Luol:
#Trump2016 @ChrisChemOch: Im #INPoinits

The biggds.  NOT POLL
#ENCrugs__ JOEN TEARSROMDSS HELL TRUMP is a propest all after or #NikeUnePPROVA #Rig

Democratic

Thanks. Not an and emmils:

Thank you cannot leave 4000 hell /parth up by ScareUSA BFFATING GUNDER POLL! Scare, Days Changes http://t.cohTdail Hance Prazears Alama at 4pm! #TrumpTraMWer1

"@JomTush:

Democraticon: AL Mack  32 Looking People crieded I can she watching went Barnon Marcorsin #IowaCaucus #CrookedHillary is he benothing #SNL #Waptings #VoteTrump"  Truegsher, you feak POTUS YOU "WESS... MAKE AMER

Democratic: #greatAsurc2 is plaznific

I۪d endended just release the endsamencied Obama cant want to President Rubio
#Debate"

"@JheWoFee6 lEED AARIVE @davilkimorcation: Obama Team #DranDayTrump #Trump2016

We 

Democratics۪: Trump"

Great. We are I spent people media ems  POTUS R! https://t.cPMWSING USFAhTORA

THENS TO VEDEY SHE WEAR TRUMP SUSIN! https://t.co/a