Deep Sentence is a deep learning based engine to summarize texts from multiple sources into a single short summary.
- Python 3.5
- psycopg2 requirements
The scraper module also relies on html-extractor-miniserver which is available at http://extractor.deepsentence.com
Setup a new virtualenv environment if you want, then simply run
make
Copy .env.example
to .env
, and modify the variables to your needs.
To start the scraper, run
scrapy crawl line_news
if you want a shell to play around with the responses, run
scrapy shell ARTICLE_URL --spider=line_news
To learn, you will first need to download the word embeddings for word2vec. You can get them at the following URL: http://www.cl.ecei.tohoku.ac.jp/~m-suzuki/jawiki_vector/entity_vector.tar.bz2
Or you can use make download_models
to download them for you.
The web application lives in deep_sentence/webapp
.
To install dependencies, run make prepare_web
.
You can then start the application by running make dev_webapp
. If you do not
have foreman, you can start the app with make debug_app
and start webpack
(in another shell) with make webpack_watch
.
Run
make write_dependencies
to regenerate requirements.txt
.
Please be sure to run this from a clean environment, and only add needed dependencies.
You can access the database as follow
psql -h public-db.claudetech.com -p 5433 -U deep_sentence
To be able to use it in from Python, set DATABASE_URL
to the following value
postgres://deep_sentence:PASSWORD@public-db.claudetech.com:5433/deep_sentence
See deployment/README.md for more information about how to setup a node.