Serapis is a sentence identifier and modeling pipeline / built for Wordnik
Latest commit 6f8214f Jun 9, 2016 @maebert maebert API Endpoint
Failed to load latest commit information.
serapis API Endpoint Jun 9, 2016
.gitignore Add more domains to blacklist May 10, 2016 Create Mar 21, 2016 sm delt Mar 21, 2016 Histograms Jun 9, 2016


Serapis is an acquisition, mining, and modeling pipeline that takes undefined words from Wordnik's dictionary, finds occurences of them across the web, and determines a given sentence in which the word occurs offers an in-context definition for that word.

For example, this sentence is a free-range definition, or "FRD", for "cheeseor":

The term “cheeseors” describes flighted globules of intergalactic cheese, 
known to be the scourge of the asteroid belt.

Pipeline Schematic


It uses a standard message format throughout an Amazon Lambda pipeline.

Modeling Sentences

High-level details on the feature development and production can be found in the slides from Clare's presentation on building this system.

The system requires a model to score sentences for FRDness (scores: binary classification, classification confidence) for which the data is not included here.

Setup and Troubleshooting

Please read the Wiki for help with setting up your code base. Use the pipes at your own risk.


Serapis was created for Wordnik by the team, Clare Corthell and Manuel Ebert in 2015/2016.