Skip to content

umegbewe/pdf-link-extract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF-Link-Extract

A simple tool in Python that scraps links from PDF files

MIT License

Project Running

Setup

Clone the project

  git clone https://github.com/umegbewe/pdf-link-extract

Install pikepdf and PyMuPDF Python ibraries

  pip3 install pikepdf PyMuPDF

Go to the project directory

  cd pdf-link-extract

Specify the PDF to scan by on line 3

  file = "(pdfname).pdf"

Run:

  python3 pdflinkscraper1.py

or:

  python3 pdflinkscraper2.py

NB

# pdflinkscraper1.py extracts links that are clickable which is more accurate.

# pdflinkscraper2.py extract links through a specified regex [Check pdflinkscraper.py line 5]

# You must have atleast Python 3 and PIP installed

About

Extract links from PDF

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages