Skip to content

transtractor/transtractor-lib

Repository files navigation

The Transtractor

PyPI version Development Status Tests codecov License

Universal PDF bank statement parsing

The Transaction Extractor, or 'Transtractor', aspires to be a universal library for extracting transaction data from PDF bank statements. Key features:

  • Written in Rust (fast)
  • Python API (user friendly)
  • AI-free (lightweight)
  • Rules-based extraction (100% predictable and accurate)

Installation

Install from PyPI

Transtractor is available on PyPI and can be installed with pip:

pip install transtractor

Requirements: Python 3.9 or higher

Compile from source

  1. Install Rust: Download and install Rust from rustup.rs

  2. Install Maturin: Install the Python build tool for Rust extensions

    pip install maturin
  3. Build and install Transtractor: Clone the repository and build

    git clone https://github.com/gravytoast/transtractor.git
    cd transtractor
    maturin develop --release

Basic usage

  1. Import and initialise the parser

    from transtractor import Parser
    
    parser = Parser()
  2. Convert PDF to CSV: All CSV files are written in a standard format

    parser.parse('statement.pdf').to_csv('statement.csv')
  3. Convert PDF to DataFrame: Load into a DataFrame for analysis

    import pandas as pd
    
    data = parser.parse('statement.pdf').to_pandas_dict()
    df = pd.DataFrame(data)

Advanced usage

See the documentation maintained on Read the Docs.

Supported statements

See the documentation for a current list of supported statements. You may also create your own parsing configuration files by following these instructions and loading it by:

from transtractor import Parser

parser = Parser()
parser.load('my_config.json')
parser.parse('statement.pdf').to_csv('statement.csv')

Contributions

New and well-tested configuration files are especially welcome. Please submit a pull request with them add to the python/transtractor/configs directory, or email to develop@transtractor.net.

About

A Python/Rust library for PDF bank statement extraction.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors