highlight-extractor

A quick script to extract the highlights I make on PDF files using a Kobo Elipsa.

Description

The Kobo Elipsa allows you to export highlights made in documents, as long as those highlights are not made using the stylus that comes included with the eReader. Given that reading and highlighting PDFs was the entire point of me buying the thing, I decided to make sure my freehand highlights could be extracted from all of the papers I read with some Python.

Usage

Connect your Elipsa to your computer
Copy the mounted drive into a folder
Plop this script in the new folder
Create a virtualenv and install the requirements
Run the script with python highlight_extractor.py

Output

This will create an annotations directory, with sub-directories for each file that a highlight is detected in. Each file directory is then divided by page, with those files containing PNG clips of the highlights. There is an index.html file that contains the document titles followed by all of the embedded image highlights.

Future Plans

Clean up code
Add tests
Add cli
Package
Put on pypi
Stitch clips together
OCR to make highlights indexable/searchable
Add GUI

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
highlight_extractor.py		highlight_extractor.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md