Did you ever search for scientific paper downloaded before and you can't
find it because it was downloaded as dsjkgqwtqoy843175.pdf
?
This project analyzes all your pdf files and displays them as web page containing titles and authors.
- All *.pdf files are found in
srcdir
- Those files are converted to html with great library - https://github.com/coolwanglu/pdf2htmlEX (because PDF is not good for parsing)
- Most probable title is found in the file
- The article is found at https://scholar.google.com based on the title and the result is saved to LMDB
- Stored results are presented by flask web page
This project is in alpha version, see todo section.
Please report all problems as Github issues.
git clone git@github.com:tivvit/articleOrganizer.git
docker-compose up -d web
- wait until the scan is finished
- go to
http://localhost:5050
- You are on your own here (consult Dockerfile)
- project is based on https://github.com/coolwanglu/pdf2htmlEX
- python3 (pip3 install -r requirements.txt)
- analyze papers
python3 main.py
- run web server
python3 web.py
Edit conf.yaml
srcdir
: directory with articles
- Check my other project for fast PDF reading (https://github.com/tivvit/pdf-book-reader)
- add keywords
- toggle abstract
- secure paths from user (no folder in path to pdf)
- "read that" plugin
- do not keep html option
- no abstracts in db option
- generate - file watcher process
- Images to docker hub with simple readme (without docker-compose)
- explain all configuration options
- add publish year
Feel free to contribute.
© 2016 Vít Listík
Released under MIT license