Skip to content

vikrant-pune/Project-Jugnu

Repository files navigation

Project Jugnu: Firefly III Data Ingestion Tools

A suite of intelligent, generic python tools for automatically routing, sanitizing, and bulk-ingesting raw Bank statements and Android SMS Backup XML exports into Firefly III.

This repository was specifically built to abstract away complex personal Indian Banking logic (like arbitrary length BBPS alerts or multiline NEFT transactions) into an extensible, open-source rule engine.

Features

  • Generic Regex Extractor: Decoupled bank mappings allow you to configure custom Regex templates for SMS extraction (config.py) or CSV merging (accounts.json) without hacking python.
  • Smart SMS XML Parsing (sms_parser.py): Automatically maps an XML backup of SMS texts to Firefly transactions/ endpoints. Features automatic duplicate suppression, fuzzy logic bank-matching, and execution date cross-checks.
  • CSV Healing Engine (csv_healer.py): Analyzes raw bank CSV/Excel ledger exports and reconciles them against early SMS extractions in Firefly, updating external IDs and merging dates accurately.
  • Pandas Sanitizer (cleanData.py): General pipeline processor for bulk-stripping bad encoding or junk marketing rows from Bank exports before ingestion.

🚀 Quick Start Setup

1. Requirements

# Clone the repository
git clone https://github.com/yourusername/jugnu.git
cd jugnu

# Install required dependencies
pip install -r requirements.txt

2. Configure Environment Secrets (.env)

You MUST keep your API tokens out of the codebase! Rename .env.example to .env (or create a new .env file) in the project root:

FIREFLY_URL="http://your.firefly.local:8080"
FIREFLY_TOKEN="eyJ0eXAiOiJKV1QiLCJhb..."

Note: Your .env will be completely ignored by git to protect you.

3. Configure Account Targets (accounts.json)

The scripts dynamically map Firefly III internal Account IDs or specific Bank Folder structures via accounts.json. An accounts.example.json is provided as a template. Rename it to accounts.json and configure:

{
  "CSV_FOLDERS": {
    "IDFC": {"id": 16, "bank_template": "idfc"},
    "HDFC": {"id": 1,  "bank_template": "hdfc"}
  }
}

🛠 Usage

Ingesting SMS Backups

If you use an Android SMS backup app to export an XML file containing your daily bank transaction texts:

python sms_parser.py ./actual-data

The script will loop over all .xml files in that folder, execute waterfall regex routing based on config.py templates to extract Merchants and Amounts, then securely bulk-import them into Firefly III.

Reconciling Official Bank Ledgers (Healer)

End of the month? Export the official .csv or .xlsx statement from your bank platform and place it in a subfolder (e.g. actual-data/IDFC/my statement.csv).

python csv_healer.py ./actual-data

csv_healer.py will:

  1. Crawl the IDFC folder based on your accounts.json setup.
  2. Apply the parsing rules matching the "idfc" template.
  3. Query Firefly III to find any existing transactions recorded previously from SMS during that month.
  4. "Heal" them by attaching exact external reference numbers, correcting merchant titles, and tagging them as "Reconciled".

⚙️ Extending the Parser (Bank Rules)

To support a new bank, just add a new template definition to BANK_REGEX_TEMPLATES in config.py:

"MY_BANK": [
    {
        "name": "Debit Alert",
        "pattern": r'Debited\s+Rs\.([\d,]+)\s+from\s+Account\s+Ending\s+(\d+)',
        "extract": lambda m: {
            "amount": m.group(1).replace(',', ''),
            "type": "withdrawal",
            "reference": "N/A"
        }
    }
]

Security & Privacy Note

All Personal Identifiable Information (PII) extraction (like names, explicit account numbers, and tokens) has been strictly stripped from the core codebase. Jugnu operates purely on decoupled configuration mapping. Run git status frequently to ensure you don't accidentally commit your .env or accounts.json!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages