A suite of intelligent, generic python tools for automatically routing, sanitizing, and bulk-ingesting raw Bank statements and Android SMS Backup XML exports into Firefly III.
This repository was specifically built to abstract away complex personal Indian Banking logic (like arbitrary length BBPS alerts or multiline NEFT transactions) into an extensible, open-source rule engine.
- Generic Regex Extractor: Decoupled bank mappings allow you to configure custom Regex templates for SMS extraction (
config.py) or CSV merging (accounts.json) without hacking python. - Smart SMS XML Parsing (
sms_parser.py): Automatically maps an XML backup of SMS texts to Fireflytransactions/endpoints. Features automatic duplicate suppression, fuzzy logic bank-matching, and execution date cross-checks. - CSV Healing Engine (
csv_healer.py): Analyzes raw bank CSV/Excel ledger exports and reconciles them against early SMS extractions in Firefly, updating external IDs and merging dates accurately. - Pandas Sanitizer (
cleanData.py): General pipeline processor for bulk-stripping bad encoding or junk marketing rows from Bank exports before ingestion.
- Python 3.8+
- Firefly III instance running locally or externally.
# Clone the repository
git clone https://github.com/yourusername/jugnu.git
cd jugnu
# Install required dependencies
pip install -r requirements.txtYou MUST keep your API tokens out of the codebase! Rename .env.example to .env (or create a new .env file) in the project root:
FIREFLY_URL="http://your.firefly.local:8080"
FIREFLY_TOKEN="eyJ0eXAiOiJKV1QiLCJhb..."Note: Your .env will be completely ignored by git to protect you.
The scripts dynamically map Firefly III internal Account IDs or specific Bank Folder structures via accounts.json. An accounts.example.json is provided as a template. Rename it to accounts.json and configure:
{
"CSV_FOLDERS": {
"IDFC": {"id": 16, "bank_template": "idfc"},
"HDFC": {"id": 1, "bank_template": "hdfc"}
}
}If you use an Android SMS backup app to export an XML file containing your daily bank transaction texts:
python sms_parser.py ./actual-dataThe script will loop over all .xml files in that folder, execute waterfall regex routing based on config.py templates to extract Merchants and Amounts, then securely bulk-import them into Firefly III.
End of the month? Export the official .csv or .xlsx statement from your bank platform and place it in a subfolder (e.g. actual-data/IDFC/my statement.csv).
python csv_healer.py ./actual-datacsv_healer.py will:
- Crawl the
IDFCfolder based on youraccounts.jsonsetup. - Apply the parsing rules matching the "idfc" template.
- Query Firefly III to find any existing transactions recorded previously from SMS during that month.
- "Heal" them by attaching exact external reference numbers, correcting merchant titles, and tagging them as "Reconciled".
To support a new bank, just add a new template definition to BANK_REGEX_TEMPLATES in config.py:
"MY_BANK": [
{
"name": "Debit Alert",
"pattern": r'Debited\s+Rs\.([\d,]+)\s+from\s+Account\s+Ending\s+(\d+)',
"extract": lambda m: {
"amount": m.group(1).replace(',', ''),
"type": "withdrawal",
"reference": "N/A"
}
}
]All Personal Identifiable Information (PII) extraction (like names, explicit account numbers, and tokens) has been strictly stripped from the core codebase. Jugnu operates purely on decoupled configuration mapping. Run git status frequently to ensure you don't accidentally commit your .env or accounts.json!