autoocr

autoocr is a small tool which watches a directory for PDF files and then uses pdfsandwich to do OCR on them. After processing the files are moved to an output location.

Usage

You can build the tool yourself if you have Go installed. Generally using the provided docker image is much easier though.

The docker image has two pre-defined volumes:

/data/input
/data/output

The input directory is watched by the process for new files and the resulting files are written to the output directory. Running the image should generally be as easy as:

docker run --name autoocr \
  -v /path/to/input:/data/input \
  -v /path/to/output:/data/output \
  xperimental/autoocr:latest

The autoocr executable has a few options that can also be passed to the container:

Usage of autoocr:
      --delay duration        Processing delay after receiving watch events. (default 5s)
  -i, --input string          Directory to use for input. (default "input")
      --keep-original         Keep backup of original file. (default true)
      --languages string      OCR Languages to use. (default "deu+eng")
      --log-format string     Logging format to use. (default "plain")
      --log-level string      Logging level to show. (default "info")
  -o, --output string         Directory to use for output. (default "output")
      --pdf-sandwich string   Path to pdfsandwich utility. (default "pdfsandwich")

Acknowledgements

This software would not be possible without the existence of pdfsandwich, tesseract, ImageMagick and others. Thanks for providing those!

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
_contrib		_contrib
processor		processor
watcher		watcher
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.go		config.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autoocr

Usage

Acknowledgements

About

Releases

Packages

Languages

License

xperimental/autoocr

Folders and files

Latest commit

History

Repository files navigation

autoocr

Usage

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages