Serapis is a sentence identifier and modeling pipeline / built for Wordnik
Python
Latest commit 6f8214f Jun 9, 2016 @maebert maebert API Endpoint
Permalink
Failed to load latest commit information.
misc
serapis API Endpoint Jun 9, 2016
temp_models
.gitignore Add more domains to blacklist May 10, 2016
LICENSE.md Create LICENSE.md Mar 21, 2016
README.md sm delt Mar 21, 2016
add.py Histograms Jun 9, 2016
all_json_formatted.csv
circle.yml
fabfile.py
lambda_handler.py
lambda_simulator.py
requirements-dev.txt
requirements.txt
split_wordlist.py

README.md

Serapis

Serapis is an acquisition, mining, and modeling pipeline that takes undefined words from Wordnik's dictionary, finds occurences of them across the web, and determines a given sentence in which the word occurs offers an in-context definition for that word.

For example, this sentence is a free-range definition, or "FRD", for "cheeseor":

The term “cheeseors” describes flighted globules of intergalactic cheese, 
known to be the scourge of the asteroid belt.

Pipeline Schematic

layout

It uses a standard message format throughout an Amazon Lambda pipeline.

Modeling Sentences

High-level details on the feature development and production can be found in the slides from Clare's presentation on building this system.

The system requires a model to score sentences for FRDness (scores: binary classification, classification confidence) for which the data is not included here.

Setup and Troubleshooting

Please read the Wiki for help with setting up your code base. Use the pipes at your own risk.

Contribution

Serapis was created for Wordnik by the summer.ai team, Clare Corthell and Manuel Ebert in 2015/2016.