# Credit Card Statement Parser (HDFC Bank)

A Python-based PDF parser that extracts key details from HDFC credit card statements, built using `pdfminer.six` and regex.

This is part of an assignment to extract 5 key data points:
- Cardholder Name
- Card Number (masked)
- Statement Date
- Payment Due Date
- Total Amount Due


In [2]:
!pip install pdfminer.six

Collecting pdfminer.six
  Downloading pdfminer_six-20251107-py3-none-any.whl.metadata (4.2 kB)
Downloading pdfminer_six-20251107-py3-none-any.whl (5.6 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/5.6 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.9/5.6 MB[0m [31m27.3 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m5.6/5.6 MB[0m [31m96.9 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m71.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pdfminer.six
Successfully installed pdfminer.six-20251107


In [None]:
import re
from pdfminer.high_level import extract_text
from google.colab import files

uploaded = files.upload()
pdf_path = next(iter(uploaded))


In [3]:
text = extract_text(pdf_path)

name_match = re.search(r'Name\s*:\s*([A-Z\s]+)', text)
cardholder_name = name_match.group(1).strip() if name_match else "N/A"

card_number_match = re.search(r'Card No:\s*([\dX\s]+)', text)
card_number = card_number_match.group(1).strip() if card_number_match else "N/A"

statement_date_match = re.search(r'Statement Date[:\s]+([\d/]+)', text)
statement_date = statement_date_match.group(1) if statement_date_match else "N/A"

# Updated regex: target horizontal data line following headers
due_and_amount_match = re.search(r'Payment Due Date.*?\n\s*([\d/]+)\s+([\d,]+\.\d{2})', text, re.DOTALL)
if due_and_amount_match:
    payment_due_date = due_and_amount_match.group(1)
    total_amount_due = due_and_amount_match.group(2)
else:
    payment_due_date = "N/A"
    total_amount_due = "N/A"

# Display updated results
parsed_output = {
    "Cardholder Name": cardholder_name,
    "Card Number": card_number,
    "Statement Date": statement_date,
    "Payment Due Date": payment_due_date,
    "Total Amount Due": total_amount_due
}

import json
print(json.dumps(parsed_output, indent=4))


{
    "Cardholder Name": "ABHIJEET SHRIMANT JADHAV",
    "Card Number": "5268 73XX XXXX 0005",
    "Statement Date": "14/10/2025",
    "Payment Due Date": "03/11/2025",
    "Total Amount Due": "8,501.00"
}


## Future Enhancements
- Extract full transaction history from statements
- Enhance parser robustness using layout-aware tools like `PyMuPDF`
