test_PhD

Code by Nguyen Tuan Nam

1. Introduction

This is source code reimplement a paper Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification (https://arxiv.org/abs/1704.03557). This system is used to recognize type of document on RVL_CDIP dataset and Tobacco_3482 dataset and written by Python 3

2. Installation

This software depends on NumPy, Keras, Tensorflow, matplotlib, opencv-python. You must have them installed before using. The simple way to install them is using pip:

	# sudo pip3 install -r requirements.txt

We also provide Dockerfile to deploy environtment

3. Usage

3.1. Data

Downloading RVL_CDIP dataset (https://www.cs.cmu.edu/~aharley/rvl-cdip/) and Tobacco dataset(https://www.kaggle.com/patrickaudriaz/tobacco3482jpg). And extract all downloaded files(rvl-cdip.tar.gz, labels_only.tar.gz, tobacco3482jpg.zip) in same folder of source.

After that, we run create_dataset.py by a following command:

	# python3 create_dataset.py

This command will move all image with same label to same folder in rvl-cdip dataset and remove all image of rvl-cdip training dataset which is contained in tobaco3482.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
input_RVL		input_RVL
input_tobacco		input_tobacco
labels_only		labels_only
models		models
output_RVL		output_RVL
output_tobacco		output_tobacco
plot_info		plot_info
Dockerfile		Dockerfile
Evaluate_rvl_cdip_first.ipynb		Evaluate_rvl_cdip_first.ipynb
README.md		README.md
Train_rvl_cdip_first.ipynb		Train_rvl_cdip_first.ipynb
Train_tobaco_3482.ipynb		Train_tobaco_3482.ipynb
create_dataset.py		create_dataset.py
inference_RVL_CDIP.py		inference_RVL_CDIP.py
inference_tobaco.py		inference_tobaco.py
labelmap.txt		labelmap.txt
plot_tobaco.py		plot_tobaco.py
requirements.txt		requirements.txt
result.png		result.png

tuannamnguyen93/test_PhD

Folders and files

Latest commit

History

Repository files navigation

test_PhD

1. Introduction

2. Installation

3. Usage

3.1. Data

3.2.Training

3.2.1. Train and test RVC_CDIP dataset

About

Resources

Stars

Watchers

Forks

Languages