Home

Jochre is an OCR package based on supervised machine learning techniques. It has been applied to several languages, including Yiddish, Occitan and Alsacien.

There are several phases :

Annotation - Annotation of a training/evaluation corpus using JochreWeb
Training - Construction of the OCR model
Evaluation - Evaluating the accuracy of the OCR model
Analysis - Use of an existing model to analyse new scanned pages

Annotation requires the JochreWeb application.

Training and evaluation require a Jochre database constructed using JochreWeb.

Analysis requires a model constructed during training, but no longer requires the database used to construct the model.

During analysis (and evaluation), Jochre involves the following steps:

Segmentation : break up the images into paragraphs, rows, groups (representing words) and shapes (representing letters). This uses ad-hoc statistical algorithms.
Guessing : apply the model to guess the n most probable words for each group (this list is known as the "beam").
Post-processing : use of a lexicon to rerank the words in the beam, and select the most likely analysis.

See Installation for Jochre installation instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally