Skip to content
No description, website, or topics provided.
Python Other
  1. Python 98.9%
  2. Other 1.1%
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
additional_analysis
additional_test
datasets
docker
docrec
docs
graphs
illustration
results
tables
traindata-mn
traindata
.gitignore
README.md
analysis.py
best_model-mn.txt
best_model.txt
build.sh
clean.sh
dataset.py
dataset.sh
graphs.sh
map.py
sample.jpg
sample.tif
test.sh
test_others.py
test_others_best.py
test_proposed-mn.py
test_proposed.py
train-mn.py
train-mn.sh
train.py
train.sh

README.md

A Deep Learning-Based Compatibility Score for Reconstruction of Strip-Shredded Text Documents

Thiago M. Paixão, Rodrigo F. Berriel, Maria C. S. Boeres, Claudine Badue, Alberto F. De Souza, and Thiago Oliveira-Santos

Paper presented in the 31st Conference on Graphics, Patterns and Images (SIBGRAPI 2018). The manuscript is available at the IEEExplore platform and at SIBGRAPI Digital Library Archive.

BibTeX

@inproceedings{paixao2018deep,
  title={A deep learning-based compatibility score for reconstruction of strip-shredded text documents},
  author={Paixao, Thiago M and Berriel, Rodrigo F and Boeres, Maria CS and Badue, Claudine and De Souza, Alberto F and Oliveira-Santos, Thiago},
  booktitle={2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)},
  pages={87--94},
  year={2018},
  organization={IEEE}
}

Abstract

The use of paper-shredder machines (mechanical shredding) to destroy documents can be illicitly motivated when the purpose is hiding evidence of fraud and other sorts of crimes. Therefore, reconstructing such documents is of great value for forensic investigation, but it is admittedly a stressful and time-consuming task for humans. To address this challenge, several computational techniques have been proposed in literature, particularly for documents with text-based content. In this context, a critical challenge for automated reconstruction is to measure properly the fitting (compatibility) between paper shreds (strips), which has been observed to be the main limitation of literature on this topic. The main contribution of this paper is a deep learning-based compatibility score to be applied in the reconstruction of strip-shredded text documents. Since there is no abundance of real-shredded data, we propose a training scheme based on digital simulated-shredding of documents from a well-known OCR database. The proposed score was coupled to a black-box optimization tool, and the resulting system achieved an average accuracy of 94.58% in the reconstruction of mechanically-shredded documents.


Reproducing the experiments

In construction.

The experiments can be reproduced with NVIDIA Docker container technology. After installing Docker in our environment, make sure you are able to run Docker containers as non-root user (check this guide for additional information). Then, run the following bash commands in a terminal:

  1. Clone the project repository and enter the project directory:
git clone https://github.com/thiagopx/deeprec-sib18.git
cd deeprec-sib18
  1. Build the container (defined in docker/Dockerfile):
bash build.sh

The container includes all dependencies except the LocalSolver optimizer, which requires a license. You should install LocalSolver locally in /opt/localsolver.

  1. Generate the training samples:
bash dataset.sh
  1. Train the models:
bash train.sh    # for SqueezeNet
bash train-mn.sh # for MobileNet
  1. Run the experiment (includes all methods/architectures):
bash test.sh

The results will be placed at results directory.

Important: Note that test.sh creates a volume from the local /opt/localsolver directory.

  1. Generate the graphs:
bash graphs.sh

By default, the graphs will be placed at graphs directory.

You can’t perform that action at this time.