Did you ever search for scientific paper downloaded before and you can't
find it because it was downloaded as dsjkgqwtqoy843175.pdf?
This project analyzes all your pdf files and displays them as web page containing titles and authors.
- All *.pdf files are found in
srcdir - Those files are converted to html with great library - https://github.com/coolwanglu/pdf2htmlEX (because PDF is not good for parsing)
- Most probable title is found in the file
- The article is found at https://scholar.google.com based on the title and the result is saved to LMDB
- Stored results are presented by flask web page
This project is in alpha version, see todo section.
Please report all problems as Github issues.
git clone git@github.com:tivvit/articleOrganizer.gitdocker-compose up -d web- wait until the scan is finished
- go to
http://localhost:5050
- You are on your own here (consult Dockerfile)
- project is based on https://github.com/coolwanglu/pdf2htmlEX
- python3 (pip3 install -r requirements.txt)
- analyze papers
python3 main.py - run web server
python3 web.py
Edit conf.yaml
srcdir: directory with articles
- Check my other project for fast PDF reading (https://github.com/tivvit/pdf-book-reader)
- add keywords
- toggle abstract
- secure paths from user (no folder in path to pdf)
- "read that" plugin
- do not keep html option
- no abstracts in db option
- generate - file watcher process
- Images to docker hub with simple readme (without docker-compose)
- explain all configuration options
- add publish year
Feel free to contribute.
© 2016 Vít Listík
Released under MIT license