The Transaction Extractor, or 'Transtractor', aspires to be a universal library for extracting transaction data from PDF bank statements. Key features:
- Written in Rust (fast)
- Python API (user friendly)
- AI-free (lightweight)
- Rules-based extraction (100% predictable and accurate)
Transtractor is available on PyPI and can be installed with pip:
pip install transtractorRequirements: Python 3.9 or higher
-
Install Rust: Download and install Rust from rustup.rs
-
Install Maturin: Install the Python build tool for Rust extensions
pip install maturin
-
Build and install Transtractor: Clone the repository and build
git clone https://github.com/gravytoast/transtractor.git cd transtractor maturin develop --release
-
Import and initialise the parser
from transtractor import Parser parser = Parser()
-
Convert PDF to CSV: All CSV files are written in a standard format
parser.parse('statement.pdf').to_csv('statement.csv')
-
Convert PDF to DataFrame: Load into a DataFrame for analysis
import pandas as pd data = parser.parse('statement.pdf').to_pandas_dict() df = pd.DataFrame(data)
See the documentation maintained on Read the Docs.
See the documentation for a current list of supported statements. You may also create your own parsing configuration files by following these instructions and loading it by:
from transtractor import Parser
parser = Parser()
parser.load('my_config.json')
parser.parse('statement.pdf').to_csv('statement.csv')New and well-tested configuration files are especially welcome. Please submit a pull request with them add to the python/transtractor/configs directory, or email to develop@transtractor.net.