Skip to content

A quick script to extract the highlights I make on PDF files using a Kobo Elipsa.

License

Notifications You must be signed in to change notification settings

willmooney3/highlight-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

highlight-extractor

A quick script to extract the highlights I make on PDF files using a Kobo Elipsa.

Description

The Kobo Elipsa allows you to export highlights made in documents, as long as those highlights are not made using the stylus that comes included with the eReader. Given that reading and highlighting PDFs was the entire point of me buying the thing, I decided to make sure my freehand highlights could be extracted from all of the papers I read with some Python.

Usage

  1. Connect your Elipsa to your computer
  2. Copy the mounted drive into a folder
  3. Plop this script in the new folder
  4. Create a virtualenv and install the requirements
  5. Run the script with python highlight_extractor.py

Output

This will create an annotations directory, with sub-directories for each file that a highlight is detected in. Each file directory is then divided by page, with those files containing PNG clips of the highlights. There is an index.html file that contains the document titles followed by all of the embedded image highlights.

Future Plans

  1. Clean up code
  2. Add tests
  3. Add cli
  4. Package
  5. Put on pypi
  6. Stitch clips together
  7. OCR to make highlights indexable/searchable
  8. Add GUI

About

A quick script to extract the highlights I make on PDF files using a Kobo Elipsa.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages