tweetenc

The goal of this project is to derive the latent variables behind tweets. To do this, I will be experimenting with various forms of sequence-to-sequence auto-encoders.

Data

You can find some tweet data here. It was intended to be used for sentiment analysis, but it can be repurposed for this. However, it is slightly biased (only tweets with emoticons were used).

Results

I trained a model with 768 LSTM cells per layer and a bottleneck layer with 1024 neurons. After a day of training on a Titan X, the model gets down to a cost of about 0.7 nats. The model is fairly good at reconstructions:

Original	Reconstructed
I hate my job.	I hate my job.
today will be a good day	today will be a good day
Well, that's my musical day set then.	Well, that's my musical days then sleep.
@unixpickle I am not sure if you're serious...	@inupciline I am so tired your superfure... ok.

You can also use the model to interpolate between two tweets. Due to this paper, I suspect that I would get better interpolations if I used a variational auto-encoder (something I am looking into). For now, here's what we got:

0.000	I hate my job.
0.167	I hate my 10 one.
0.333	I have my toddlering.
0.500	I have my folding trashes
0.667	I love my friends at hang
0.833	I love my friends and family
1.000	I love my friends and family

0.000	I hate my job.
0.167	I hate my job.
0.333	I had to be my agile.
0.500	Ita do we had my big lonely
0.667	today we blit a good hand
0.833	today will be a good day
1.000	today will be a good day

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
analysis		analysis
encode		encode
reconstruct		reconstruct
train		train
.gitignore		.gitignore
README.md		README.md
decoder.go		decoder.go
encoder.go		encoder.go
samples.go		samples.go
train.go		train.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis

analysis

encode

encode

reconstruct

reconstruct

train

train

.gitignore

.gitignore

README.md

README.md

decoder.go

decoder.go

encoder.go

encoder.go

samples.go

samples.go

train.go

train.go

Repository files navigation

tweetenc

Data

Results

About

Releases

Packages

Languages

unixpickle/tweetenc

Folders and files

Latest commit

History

Repository files navigation

tweetenc

Data

Results

About

Topics

Resources

Stars

Watchers

Forks

Languages