This repository contains a Docker container environment used for processing PDF files. The main functionality involves using pdfcrop
and pdftk
tools to extract each page from a PDF. By default, pdfcrop
is set to automatically crop out any excess whitespace from each page. However, manual adjustment of margins is also possible.
- PDF Splitting: Use
pdftk
to split a single PDF file into multiple single-page PDF files. - Automatic Cropping: By default,
pdfcrop
is used to automatically crop out excess whitespace from each page. Margins can be manually adjusted.
- Ensure Docker and VS Code are installed on your machine.
- Install the Remote - Containers extension in VS Code.
- Clone this repository to your local machine.
- Open the repository folder in VS Code.
- Open the project in a container using the Remote - Containers extension in VS Code.
- Place the PDF file you wish to process in the
data
folder. - Adjust the
margins
in theprocess_pdf.sh
script if manual cropping is required. - Run
./scripts/process_pdf.sh
in the terminal of VS Code. - The processed PDF files will be saved in the
output
folder.
.devcontainer
: Contains Docker configuration files.scripts
: Holds the scripts for processing PDF files.data
: Place the original PDF files here.output
: Processed PDF files are stored here.
Feel free to contribute to the project through Pull Requests or Issues!