Paper Downloader

There are so many papers on the internet and we need to extract findings from them. Sometimes we need to label the papers manually or automatically. For example, we use the label studio to label the papers manually.

Before labeling, we need to prepare the papers for the NLP pipeline or the labeling pipeline first. This project is used to prepare the metadata of papers, download the pdf files of papers and convert the pdf files to html files.

If you want to deploy the full pipeline, you can use the prophet-studio project. Which is composed of the label-studio and the paper-downloader project.

Use `paper-downloader` as a command line tool

Install the paper downloader

pip install git+https://github.com/yjcyxky/paper-downloader.git

Prepare papers for the NLP pipeline

Fetch metadata

pdownloader fetch-metadata -d 3 -o metadata/file.json -c config/pmids_config.json

Fetch PDFs

pdownloader fetch-pdf -m metadata/file.json -o ./pdf

PDF to HTML

pdownloader pdf2html -p ./pdf -h ./html

Build the docker image

git clone https://github.com/yjcyxky/paper-downloader.git

cd paper-downloader
bash build-docker.sh

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
bin		bin
examples		examples
paper_downloader		paper_downloader
.gitignore		.gitignore
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
Dockerfile		Dockerfile
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
build-docker.sh		build-docker.sh
paper-downloader.service		paper-downloader.service
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper Downloader

Use `paper-downloader` as a command line tool

Install the paper downloader

Prepare papers for the NLP pipeline

Fetch metadata

Fetch PDFs

PDF to HTML

Build the docker image

About

Releases

Packages

Languages

License

open-prophetdb/paper-downloader

Folders and files

Latest commit

History

Repository files navigation

Paper Downloader

Use paper-downloader as a command line tool

Install the paper downloader

Prepare papers for the NLP pipeline

Fetch metadata

Fetch PDFs

PDF to HTML

Build the docker image

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Use `paper-downloader` as a command line tool

Packages