Skip to content

techiewonk/awesome-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

# Awesome OCR

Awesome

This list contains links to great software tools and libraries and literature related to Optical Character Recognition (OCR).

Contributions are welcome, as is feedback.

1. Software

1.1. OCR engines

  • tesseract - The definitive Open Source OCR engine Apache 2.0
  • EasyOCR - OCR engine built on PyTorch by JaidedAI, Apache 2.0
  • ocropus - OCR engine based on LSTM, Apache 2.0
  • ocropus 0.4 - Older v0.4 state of Ocropus, with tesseract 2.04 and iulib, C++
  • kraken - Ocropus fork with sane defaults
  • gocr - OCR engine under the GNU Public License led by Joerg Schulenburg.
  • Ocrad - The GNU OCR. GPL
  • ocular - Machine-learning OCR for historic documents
  • SwiftOCR - fast and simple OCR library written in Swift
  • attention-ocr - OCR engine using visual attention mechanisms
  • RWTH-OCR - The RWTH Aachen University Optical Character Recognition System
  • simple-ocr-opencv and its fork - A simple pythonic OCR engine using opencv and numpy
  • Calamari - OCR Engine based on OCRopy and Kraken
  • doctr - A seamless & high-performing OCR library powered by Deep Learning

1.2. Older and possibly abandoned OCR engines

  • Clara OCR - Open source OCR in C GPL
  • Cuneiform - CuneiForm OCR was developed by Cognitive Technologies
  • Eye - an experimental Java OCR (image-to-text) application
  • kognition - An omnifont OCR software for KDE
  • OCRchie - Modular Optical Character Recognition Software
  • ocre - o.c.r. easy
  • xplab - A GTK 2 tool for pattern matching
  • hebOCR - Hebrew character recognition library (previously named hocr, see Wikipedia article) GPL

1.3. OCR file formats

1.3.1. hOCR

  • hocr-tools - Tools for doing various useful things with hOCR files, Apache 2.0
  • hocr-spec - hOCR 1.2 specification
  • ocr-transform - CLI tool to convert between hOCR and ALTO, MIT
  • hocr-parser - hOCR Specification Python Parser
  • hOCRTools - hOCR to ALTO conversion XSLT

1.3.2. ALTO XML

1.3.3. TEI

  • TEI-OCR - TEI customization for OCR generated layout and content information
  • TEI SIG on Libraries - Best Practices for TEI in Libraries
  • GDZ - METS/TEI-based GDZ document format

1.3.4. PAGE XML

  • PAGE-XML Schema - XML schema of the PAGE XML format along with documentation and examples
  • omni:us Pages Format (OPF) - XML schema very similar to PAGE XML that has some additional features.
  • py-pagexml - Python library for handling PAGE XML and OPF files.

1.4. OCR CLI

  • OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
  • Pdf2PdfOCR - A tool to OCR a PDF (or supported images) and add a text "layer" (a "pdf sandwich") in the original file making it a searchable PDF. GUI included. Tesseract and cuneiform supported.
  • Ocrocis - Project manager interface for Ocropy, see also external project homepage
  • tesseract-recognize - Tesseract-based tool that outputs result in Page XML format (docker image).

2. Deskewing and Dewarping

2.1. OCR GUI

  • moz-hocr-editor - Firefox Addon for editing hOCR files Discontinued
  • qt-box-editor - QT4 editor of tesseract-ocr box files.
  • ocr-gt-tools - Client-Server application for editing OCR ground truth.
  • Paperwork - Using scanners and OCR to grep paper documents the easy way.
  • Paperless - Scan, index, and archive all of your paper documents.
  • gImageReader - gImageReader is a simple Gtk/Qt front-end to tesseract-ocr.
  • VietOCR - A Java/.NET GUI frontend for Tesseract OCR engine, including jTessBoxEditor a graphical Tesseract box data editor
  • PoCoTo - Fast interactive batch corrections of complete OCR error series in OCR'ed historical documents.
  • OCRFeeder - GTK graphical user interface that allows the users to correct characters or bounding boxes, ODT export and more.
  • PRImA PAGE Viewer - Java based viewer for PAGE XML files (layout + text content). Also supports ALTO XML, FineReader XML, and HOCR.
  • LAREX - A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
  • archiscribe - Web application for transcribing OCR ground truth from Archive.org. Deployed instance available at https://archiscribe.jbaiter.de/, results are available in @jbaiter/archiscribe-corpus.
  • nw-page-editor - Simple app for visual editing of Page XML files. Provides desktop and server docker-based versions.

3. Text detection and localization

  • DB
  • DeepReg
  • CornerText - paper:2018) - Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
  • RRPN - (paper:2018) - Arbitrary-Oriented Scene Text Detection via Rotation Proposals
  • MASTER-TF - (paper:2021) - TensorFlow reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021).
  • MaskTextSpotterV3 - (paper:2020) - Mask TextSpotter v3 is an end-to-end trainable scene text spotter that adopts a Segmentation Proposal Network (SPN) instead of an RPN.
  • TextFuseNet - (paper:2020) A PyTorch implementation of "TextFuseNet: Scene Text Detection with Richer Fused Features".
  • SATRN- (paper:2020) - Official Tensorflow Implementation of Self-Attention Text Recognition Network (SATRN) (CVPR Workshop WTDDLE 2020).
  • cvpr20-scatter-text-recognizer - (paper:2020) - Unofficial implementation of CVPR 2020 paper "SCATTER: Selective Context Attentional Scene Text Recognizer"
  • seed - ([paper:2020[https://arxiv.org/pdf/2005.10977.pdf]) - This is the implementation of the paper "SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition"
  • vedastr - A scene text recognition toolbox based on PyTorch
  • AutoSTR - (paper:2020) Efficient Backbone Search for Scene Text Recognition
  • Decoupled-attention-network - (paper:2019) Pytorch implementation for "Decoupled attention network for text recognition".
  • Bi-STET - (paper:2020) Implementation of Bidirectional Scene Text Recognition with a Single Decoder
  • kiss - (paper:2019
  • Deformable Text Recognition - (paper:2019)
  • MaskTextSpotter - (paper:2019)
  • CUTIE - (paper:2019
  • AttentionOCR - (paper:2019)
  • crpn - (paper:2019)
  • Scene-Text-Detection-with-SPECNET - Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.
  • Character-Region-Awareness-for-Text-Detection
  • Real-time-Scene-Text-Detection-and-Recognition-System - End-to-end pipeline for real-time scene text detection and recognition.
  • ocr_attention - Robust Scene Text Recognition with Automatic Rectification.
  • masktextspotter.caffee2 - The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes".
  • InceptText-Tensorflow - An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection.
  • textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention
  • RRD - RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection.
  • crpn - Corner-based Region Proposal Network.
  • SSTDNet - Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'.
  • R2CNN - caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.
  • RRPN - Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals
  • Tensorflow_SceneText_Oriented_Box_Predictor - This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.
  • DeepSceneTextReader - This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.
  • DeRPN - A novel region proposal network for more general object detection ( including scene text detection ).
  • Bartzi/see - SEE: Towards Semi-Supervised End-to-End Scene Text Recognition
  • Bartzi/stn-ocr - Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition
  • beacandler/R2CNN - caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
  • HsiehYiChia/Scene-text-recognition - Scene text detection and recognition based on Extremal Region(ER)
  • R2CNN_Faster-RCNN_Tensorflow - Rotational region detection based on Faster-RCNN.
  • corner - Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
  • Corner_Segmentation_TextDetection - Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation.
  • TextSnake.pytorch - A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes
  • AON - Implementation for CVPR 2018 text recognition Paper by Tensorflow: "AON: Towards Arbitrarily-Oriented Text Recognition"
  • pixel_link - Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018
  • seglink - An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments (=> pixe_link)
  • SSTD - Single Shot Text Detector with Regional Attention
  • MORAN_v2 - MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition
  • Curve-Text-Detector - This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking table.
  • HCIILAB/DeRPN - A novel region proposal network for more general object detection ( including scene text detection ).
  • TextField - TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)
  • tensorflow-TextMountain - TextMountain: Accurate Scene Text Detection via Instance Segmentation
  • Bartzi/see - Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
  • bgshih/aster - Recognizing cropped text in natural images.
  • ReceiptParser - A fuzzy receipt parser written in Python.
  • vedastr

3.1. OCR Preprocessing

4. Segmentation

4.1. Line Segmentation

4.2. Character Segmentation

4.3. Word Segmentation

4.4. Document Segmentation

4.5. Form Segmentation

5. Handwritten

6. Table detection

7. Language detection

  • lingua - The most accurate natural language detection library for Java and other JVM languages, suitable for long and short text alike
  • langdetect
  • whatthelang - Lightning Fast Language Prediction rocket
  • wiki-lang-detect

7.1. OCR as a Service

7.2. OCR evaluation

7.3. OCR libraries by programming language

7.3.1. Crystal

7.3.2. Elixir

  • tesseract_ocr - Elixir library wrapping the tesseract executable.

7.3.3. Go

  • gosseract - Golang OCR library, wrapping Tesseract-ocr.

7.3.4. Java

  • Tess4J - Java Native Access bindings to Tesseract.
  • tess-two - Tools for compiling Tesseract on Android and Java API.

7.3.5. .Net

7.3.6. Object Pascal

7.3.7. PHP

7.3.8. Python

  • pytesseract - A Python wrapper for Google Tesseract.
  • pyocr - A Python wrapper for Tesseract and Cuneiform.
  • ocrodjvu - A library and standalone tool for doing OCR on DjVu documents, wrapping Cuneiform, gocr, ocrad, ocropus and tesseract
  • tesserocr - A Python wrapper for the tesseract-ocr API

7.3.9. Javascript

  • ocracy - pure javascript lstm rnn implementation based on ocropus
  • gocr.js - Javascript port (emscripten) of gocr
  • ocrad.js - Javascript port (emscripten) of ocrad
  • tesseract.js - Javascript port (emscripten) of Tesseract
  • node-tesseract-ocr - A simple wrapper for the Tesseract OCR package.
  • node-tesseract-native - C++ module for node providing OCR with tesseract and leptonica.

7.3.10. Ruby

  • rtesseract - Ruby library wrapping the tesseract and imagemagick executables.
  • ruby-tesseract - Native Tesseract bindings for Ruby MRI and JRuby
  • ocr_space - API wrapper for free ocr service ocr.space. Includes CLI

7.3.11. Rust

  • tesseract.rs - Rust bindings for tesseract OCR.
  • leptess - Productive and safe Rust bindings/wrappers for tesseract and leptonica.

7.3.12. R

7.3.13. Swift

  • Tesseract OCR iOS - Swift and Objective-C wrapper for Tesseract OCR.
  • SwiftOCR - Fast and simple OCR library written in Swift. Optimized for recognizing short, one line long alphanumeric codes.

7.4. OCR training tools

  • glyph-miner - A system for extracting glyphs from early typeset prints
  • ocrodeg - Document image degradation for OCR data augmentation

8. Datasets

8.1. Ground Truth

  • Rescribe - Transcriptions of Caroline Minuscule Manuscripts PDM 1.0

9. Video Text Spotting

10. Font detection

  • typefont - The first open-source library that detects the font of a text in a image.

11. Optical Character Recognition Engines and Frameworks

12. Awesome lists

13. Proprietary OCR Engines

14. Cloud based OCR Engines (SaaS)

15. File formats and tools

  • nw-page-editor - Simple app for visual editing of Page XML files
  • hocr
  • alto
  • PageXML
  • ocr-fileformat - Validate and transform various OCR file formats
  • hocr-tools - Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

16. Datasets

17. Data augmentation and Synthetic data generation

18. Pre OCR Processing

19. Post OCR Correction

20. Benchmarks

21. misc

  • ocrodeg - a small Python library implementing document image degradation for data augmentation for handwriting recognition and OCR applications.
  • scantailor - Scan Tailor is an interactive post-processing tool for scanned pages.
  • jlsutherland/doc2text - help researchers fix these errors and extract the highest quality text from their pdfs as possible.
  • mauvilsa/nw-page-editor - Simple app for visual editing of Page XML files.
  • Transkribus - Transkribus is a comprehensive platform for the digitisation, AI-powered recognition, transcription and searching of historical documents.
  • http://projectnaptha.com/
  • https://github.com/4lex4/scantailor-advanced
  • open-semantic-search - Open Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
  • ocrserver - A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well
  • cosc428-structor - ~1000 book pages + OpenCV + python = page regions identified as paragraphs, lines, images, captions, etc.
  • nidaba - An expandable and scalable OCR pipeline
  • https://github.com/MaybeShewill-CV/CRNN_Tensorflow
  • OCRmyPDF

22. Literature

22.1. OCR-related publication and link lists

22.2. Blog Posts and Tutorials

22.3. OCR Showcases

  • abbyy-finereader-ocr-senate - Using OCR to parse scanned Senate Financial Disclosure forms.
  • cvOCR - An OCR system for recognizing resume or cv text, implemented in Python and C and based on tesseract
  • MathOCR - A printed scientific document recognition system, pre-alpha

22.4. Academic articles

22.4.1. 2011 and before

22.4.2. 2012

22.4.3. 2013

22.4.4. 2014

22.4.5. 2015

22.4.6. 2016

22.4.7. 2017

22.4.8. 2018

22.4.9. 2019

22.4.10. 2020

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages