Skip to content

unixpickle/tweetenc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tweetenc

The goal of this project is to derive the latent variables behind tweets. To do this, I will be experimenting with various forms of sequence-to-sequence auto-encoders.

Data

You can find some tweet data here. It was intended to be used for sentiment analysis, but it can be repurposed for this. However, it is slightly biased (only tweets with emoticons were used).

Results

I trained a model with 768 LSTM cells per layer and a bottleneck layer with 1024 neurons. After a day of training on a Titan X, the model gets down to a cost of about 0.7 nats. The model is fairly good at reconstructions:

Original Reconstructed
I hate my job. I hate my job.
today will be a good day today will be a good day
Well, that's my musical day set then. Well, that's my musical days then sleep.
@unixpickle I am not sure if you're serious... @inupciline I am so tired your superfure... ok.

You can also use the model to interpolate between two tweets. Due to this paper, I suspect that I would get better interpolations if I used a variational auto-encoder (something I am looking into). For now, here's what we got:

0.000I hate my job.
0.167I hate my 10 one.
0.333I have my toddlering.
0.500I have my folding trashes
0.667I love my friends at hang
0.833I love my friends and family
1.000I love my friends and family
0.000I hate my job.
0.167I hate my job.
0.333I had to be my agile.
0.500Ita do we had my big lonely
0.667today we blit a good hand
0.833today will be a good day
1.000today will be a good day

Releases

No releases published

Packages

No packages published

Languages