Python Check Similarity PDF from active directory and store it to csv file. Project inspired by diff-pdf
pip install -r requirements.py
- Install all required depedencies.
- Copy
cspdf.py
into directory that contains pdf file to be compared. - Run
cspdf.py
script. - Note: This script just work on pdf files only, if you have word document please convert it into pdf first.
- Check similarity all pdf files on current active directory
python cspdf.py -a -o comparison.csv
- Check similarity one pdf file then compare with all pdf files on current active directory
python cspdf.py -t a.pdf -o comparison.csv
- Check similarity including image comparison (slow processing)
# Just add -i or --image argument python cspdf.py -i -t a.pdf -o comparison.csv
- Get help
python cspdf.py -h
- Text similarity with Sequence Matcher
- Image similarity with Structural Similarity Index (SSIM)
Made by Zavier, enjoyy ✨