Do you even science, bro? Using RNN's to predict scientific titles.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

RNN Science titles

A recurrent neural network with long short term memory to predict titles of scientific articles. Uses karpathy's char-rnn as the base implementation.

I used every title from the to generate models of scientific papers. Each subtopic (over 160!) generated a unique model.


First presented at DC Hack && Tell Round 26: The Curious Camaraderie of Code.

Raw data

The titles used as input and the auto-generated titles can be found in the /samples directory.


Size matters. The most important takeaway is that the log of the raw character count has an almost perfect exponential decay to the quality of the model. From this, we can predict with high confidence the accuracy of a new data will be with this particular set of hyperparameters. Additionally we can see that there is a lower limit of about 0.75, which is independent of the size of the training set.


Due to space contraints the models are not stored in this repo.

You can recreate these models by using the pre-wrangled input files in the sample directory with the following hyperparameters:

 1024 RNN size
 2 layers
 30 max epochs
 0.5 dropout
 50 sequence length
 0.002 learning rate
 0.97 decay
 50 batch size


Place all journal titles, new line separated with extension *.txt in the raw_input directory.

Run python src/ to convert a limited character map (uses unidecode) and copies them into input directory.

Use python to build scripts to run the torch commands.