This repository comprises datasets and source codes used in our Pattern Recognition (2020) paper.
Main dependencies:
- python==3.6
- tensorflow-gpu==1.11.0
- scikit-image==0.14.0
- scikit-learn==0.23.1
- opencv-python==3.3.1.11
- numba==0.51.2
For a fully-automatic setup of the virtual environment (tested on Linux Ubuntu 18.04), set the variable BASE_DIR in scripts/install.sh to a valid directory, and then run source scripts/install.sh from within the repository root directory. BASE_DIR indicates where additional directories (envs, concorde, qsopt) will be created.
You should have sudo privileges to run properly the installation script.
By default, the virtual environment will be created at $BASE_DIR/envs/deeprec-pr20. When finishing, the script will automatically activate the just created environment.
The datasets include the (i) integral documents where the training (small) samples are extracted and (ii) the mechanically-shredded documents collections S-MARQUES (D1), S-ISRI-OCR (D2), and S-CDIP (D3) used in the tests. To download them, just run bash scripts/get_dataset.sh.
It will create a directory datasets in the repository root directory.
You can download the results by running bash scripts/get_results.sh.
It will create a directory results in the repository root directory with three subdirectories (one for each experiment).
A reconstruction demo is available by running python demo.py. By default, the script uses a pretrained model available in the traindata directory. Here is an example of output of the demo script:
For details of the parameters, you may run python demo.py --help.

