Developing an Optical Character Recognition system

Objective

To build and evaluate an optical character recognition system that can process scanned book pages and turn them into text.

The figures below indicate accuracy/performance using nearest neighbour and PCA based approach - Each page decreases in quality with added noise.

Noise value 0 - 98% accuracy	Noise value 0.1 - 98% accuracy	Noise value 0.2 - 92% accuracy

Noise value 0.3 - 78% accuracy	Noise value 0.4 - 63% accuracy	Noise value 0.5 - 51% accuracy

Ensure you are in the correct directory then run:

pip install -r requirements.txt

Followed by:

run_evaluate.sh

The code should print out the percentage of correctly classified characters for each page.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
data		data
utils		utils
README.md		README.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt
run_evaluate.sh		run_evaluate.sh
system.py		system.py
train.py		train.py