Recurrent Weighted Average

This is a re-implementation of the architecture described in Machine Learning on Sequential Data Using a Recurrent Weighted Average.

Hypotheses

As the sequence gets longer and longer, the running average could become more and more "saturated" (i.e. new time-steps matter less and less). This might cause the network to have more and more trouble forming short-term memories as the sequence goes on. As a result, the network might do poorly at precise tasks like text character prediction.

If the above concern is actually an issue, perhaps the long-term benefits of RWAs could still be leveraged by stacking an RWA on top of an LSTM.

Results

Here are the experiments I have run:

char-rnn - RWAs can learn to model language character-by-character, although LSTMs are faster and better.
sentiment - A hybrid LSTM-RWA model learns to predict the sentiment of tweets faster than a plain LSTM.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
experiments		experiments
test_gen		test_gen
README.md		README.md
rwa.go		rwa.go
rwa_test.go		rwa_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments

experiments

test_gen

test_gen

README.md

README.md

rwa.go

rwa.go

rwa_test.go

rwa_test.go

Repository files navigation

Recurrent Weighted Average

Hypotheses

Results

About

Releases

Packages

Languages

unixpickle/rwa

Folders and files

Latest commit

History

Repository files navigation

Recurrent Weighted Average

Hypotheses

Results

About

Resources

Stars

Watchers

Forks

Languages