Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

A Deep Learning-Based Compatibility Score for Reconstruction of Strip-Shredded Text Documents

Thiago M. Paixão, Rodrigo F. Berriel, Maria C. S. Boeres, Claudine Badue, Alberto F. De Souza, and Thiago Oliveira-Santos

Paper presented in the 31st Conference on Graphics, Patterns and Images (SIBGRAPI 2018). The manuscript is available at the IEEExplore platform and at SIBGRAPI Digital Library Archive.

BibTeX

@inproceedings{paixao2018deep,
  title={A deep learning-based compatibility score for reconstruction of strip-shredded text documents},
  author={Paixao, Thiago M and Berriel, Rodrigo F and Boeres, Maria CS and Badue, Claudine and De Souza, Alberto F and Oliveira-Santos, Thiago},
  booktitle={2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)},
  pages={87--94},
  year={2018},
  organization={IEEE}
}

Abstract

The use of paper-shredder machines (mechanical shredding) to destroy documents can be illicitly motivated when the purpose is hiding evidence of fraud and other sorts of crimes. Therefore, reconstructing such documents is of great value for forensic investigation, but it is admittedly a stressful and time-consuming task for humans. To address this challenge, several computational techniques have been proposed in literature, particularly for documents with text-based content. In this context, a critical challenge for automated reconstruction is to measure properly the fitting (compatibility) between paper shreds (strips), which has been observed to be the main limitation of literature on this topic. The main contribution of this paper is a deep learning-based compatibility score to be applied in the reconstruction of strip-shredded text documents. Since there is no abundance of real-shredded data, we propose a training scheme based on digital simulated-shredding of documents from a well-known OCR database. The proposed score was coupled to a black-box optimization tool, and the resulting system achieved an average accuracy of 94.58% in the reconstruction of mechanically-shredded documents.


Reproducing the experiments

In construction.

The experiments can be reproduced with NVIDIA Docker container technology. After installing Docker in our environment, make sure you are able to run Docker containers as non-root user (check this guide for additional information). Then, run the following bash commands in a terminal:

  1. Clone the project repository and enter the project directory:
git clone https://github.com/thiagopx/deeprec-sib18.git
cd deeprec-sib18
  1. Build the container (defined in docker/Dockerfile):
bash build.sh

The container includes all dependencies except the LocalSolver optimizer, which requires a license. You should install LocalSolver locally in /opt/localsolver.

  1. Generate the training samples:
bash dataset.sh
  1. Train the models:
bash train.sh    # for SqueezeNet
bash train-mn.sh # for MobileNet
  1. Run the experiment (includes all methods/architectures):
bash test.sh

The results will be placed at results directory.

Important: Note that test.sh creates a volume from the local /opt/localsolver directory.

  1. Generate the graphs:
bash graphs.sh

By default, the graphs will be placed at graphs directory.

About

No description, website, or topics provided.

Resources

Releases

No releases published

Packages

No packages published

Languages