Skip to content

Watches a directory for PDF documents and automatically does OCR on them.

License

Notifications You must be signed in to change notification settings

xperimental/autoocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

autoocr

Docker Build Status

autoocr is a small tool which watches a directory for PDF files and then uses pdfsandwich to do OCR on them. After processing the files are moved to an output location.

Usage

You can build the tool yourself if you have Go installed. Generally using the provided docker image is much easier though.

The docker image has two pre-defined volumes:

  • /data/input
  • /data/output

The input directory is watched by the process for new files and the resulting files are written to the output directory. Running the image should generally be as easy as:

docker run --name autoocr \
  -v /path/to/input:/data/input \
  -v /path/to/output:/data/output \
  xperimental/autoocr:latest

The autoocr executable has a few options that can also be passed to the container:

Usage of autoocr:
      --delay duration        Processing delay after receiving watch events. (default 5s)
  -i, --input string          Directory to use for input. (default "input")
      --keep-original         Keep backup of original file. (default true)
      --languages string      OCR Languages to use. (default "deu+eng")
      --log-format string     Logging format to use. (default "plain")
      --log-level string      Logging level to show. (default "info")
  -o, --output string         Directory to use for output. (default "output")
      --pdf-sandwich string   Path to pdfsandwich utility. (default "pdfsandwich")

Acknowledgements

This software would not be possible without the existence of pdfsandwich, tesseract, ImageMagick and others. Thanks for providing those!

About

Watches a directory for PDF documents and automatically does OCR on them.

Resources

License

Stars

Watchers

Forks

Packages

No packages published