# Llama Adaptive Bank Statement Extraction

Uses multi-turn conversation to:
1. **Turn 1**: Classify bank statement structure (flat vs date-grouped)
2. **Turn 2**: Apply structure-specific extraction prompt

This demonstrates the power of multi-turn for adaptive processing.

In [1]:
from pathlib import Path
import random
import yaml

import numpy as np
import torch
from PIL import Image
from transformers import AutoProcessor, MllamaForConditionalGeneration

def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)

set_seed(42)
print("✅ Imports loaded")

✅ Imports loaded


In [2]:
# Load model
model_id = "/home/jovyan/nfs_share/models/Llama-3.2-11B-Vision-Instruct"
print("🔧 Loading model...")

model = MllamaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)

print("✅ Model loaded")

🔧 Loading model...


Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

✅ Model loaded


In [3]:
# Load extraction prompts from YAML
prompt_file = Path("/home/jovyan/nfs_share/tod/LMM_POC/prompts/generated/llama_bank_statement_prompt.yaml")

with open(prompt_file, 'r') as f:
    prompt_data = yaml.safe_load(f)

flat_prompt = prompt_data['prompts']['bank_statement_flat']['prompt']
grouped_prompt = prompt_data['prompts']['bank_statement_date_grouped']['prompt']

print("✅ Extraction prompts loaded")
print(f"  - Flat table prompt: {len(flat_prompt)} chars")
print(f"  - Date-grouped prompt: {len(grouped_prompt)} chars")

✅ Extraction prompts loaded
  - Flat table prompt: 3056 chars
  - Date-grouped prompt: 3282 chars


In [4]:
# Select test image (change this to test different images)
# Options: image_003.png (flat), image_008.png (date-grouped), image_009.png (flat)

image_name = "image_008.png"  # Change to test different structures
image_path = f"/home/jovyan/nfs_share/tod/LMM_POC/evaluation_data/{image_name}"

image = Image.open(image_path)
images = [image]  # Store as list for multi-turn

print(f"✅ Image loaded: {image_name}")
print(f"  Size: {image.size}")

✅ Image loaded: image_008.png
  Size: (900, 1320)


In [5]:
# Chat function using the working multi-turn pattern
def chat_with_mllm(model, processor, prompt, images, messages=[], max_new_tokens=2000, do_sample=False):
    """Multi-turn chat function following Medium article pattern."""
    if len(messages) == 0:
        messages = [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": prompt}]}]
    else:
        messages.append({"role": "user", "content": [{"type": "text", "text": prompt}]})
    
    text = processor.apply_chat_template(messages, add_generation_prompt=True)
    inputs = processor(images=images, text=text, return_tensors="pt").to(model.device)
    
    generation_args = {"max_new_tokens": max_new_tokens, "do_sample": do_sample}
    generate_ids = model.generate(**inputs, **generation_args)
    
    # Trim input tokens from output
    generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:-1]
    generated_texts = processor.decode(generate_ids[0], clean_up_tokenization_spaces=False)
    
    messages.append({"role": "assistant", "content": [{"type": "text", "text": generated_texts}]})
    
    return generated_texts, messages

print("✅ Chat function defined")

✅ Chat function defined


## Turn 1: Classify Bank Statement Structure

In [6]:
# Structure classification prompt
classification_prompt = """Analyze this bank statement's structure.

Answer with ONLY one word:
- FLAT (if transactions are in a continuous table with columns: Date | Description | Debit | Credit | Balance)
- GROUPED (if transactions are organized under date headers like "Thu 04 Sep 2025" with transactions listed below)

Answer (one word only):"""

messages = []
print("💬 Turn 1: Classifying structure...")
structure_answer, messages = chat_with_mllm(model, processor, classification_prompt, images, messages, max_new_tokens=50)

print("\n" + "="*60)
print("STRUCTURE CLASSIFICATION:")
print("="*60)
print(structure_answer.strip())
print("="*60)

# Parse answer
structure_type = structure_answer.strip().upper()
if "FLAT" in structure_type:
    detected_structure = "flat"
elif "GROUPED" in structure_type or "DATE" in structure_type:
    detected_structure = "date_grouped"
else:
    detected_structure = "flat"  # Default fallback

print(f"\n🎯 Detected structure: {detected_structure}")

💬 Turn 1: Classifying structure...





STRUCTURE CLASSIFICATION:
GROUPED

🎯 Detected structure: date_grouped


## Turn 2: Apply Structure-Specific Extraction

In [7]:
# Select appropriate prompt based on detected structure
if detected_structure == "date_grouped":
    extraction_prompt = grouped_prompt
    prompt_name = "Date-Grouped"
else:
    extraction_prompt = flat_prompt
    prompt_name = "Flat Table"

print(f"💬 Turn 2: Applying {prompt_name} extraction prompt...")
print(f"  Prompt length: {len(extraction_prompt)} chars")

extraction_result, messages = chat_with_mllm(
    model, processor, extraction_prompt, images, messages, max_new_tokens=2000
)

print("\n" + "="*60)
print(f"EXTRACTION RESULT ({prompt_name}):")
print("="*60)
print(extraction_result)
print("="*60)

💬 Turn 2: Applying Date-Grouped extraction prompt...
  Prompt length: 3282 chars

EXTRACTION RESULT (Date-Grouped):
DOCUMENT_TYPE: BANK_STATEMENT
STATEMENT_DATE_RANGE: 08/08/2025 to 07/09/2025
TRANSACTION_DATES: 07/09/2025 | 08/09/2025 | 04/09/2025 | 03/09/2025 | 02/09/2025 | 01/09/2025 | 31/08/2025 | 29/08/2025 | 27/08/2025 | 26/08/2025 | 24/08/2025 | 23/08/2025 | 22/08/2025 | 20/08/2025 | 19/08/2025 | 18/08/2025 | 17/08/2025 | 14/08/2025 | 13/08/2025 | 11/08/2025 | 10/08/2025 | 09/08/2025 | 06/08/2025 | 05/08/2025 | 04/08/2025
LINE_ITEM_DESCRIPTIONS: EFTPOS Cash Out PRICELINE PHARMACY MACKAY QLD | EFTPOS Purchase OFFICEWORKS BUSINESS ROCKHAMPTO... | Mortgage Repayment MORT 8103P16754533 NAB | OSKO Payment to MIKE CHEN 809B4B5773133 | Direct Debit 1539SP4 1608203 MWF 42730 | Salary Payment ATO 28782P60782100 | DD INSURANCE ACME CORP PTY LTD 82321942151853 | Auto Payment UTILITIES AGL 74294 0P85943944 | DIRECT CREDIT SALARY 6080P67812986 | BPAY Payment BILLER60279 CRN 9813018480 | Prof

## Save Results

In [8]:
# Save extraction results
output_dir = Path("/home/jovyan/nfs_share/tod/LMM_POC/output")
output_dir.mkdir(exist_ok=True)

output_file = output_dir / f"adaptive_extraction_{image_name.replace('.png', '')}.txt"

with open(output_file, 'w') as f:
    f.write(f"Image: {image_name}\n")
    f.write(f"Detected Structure: {detected_structure}\n")
    f.write(f"Prompt Used: {prompt_name}\n")
    f.write("\n" + "="*60 + "\n")
    f.write("EXTRACTION RESULT:\n")
    f.write("="*60 + "\n")
    f.write(extraction_result)

print(f"✅ Results saved to: {output_file}")

✅ Results saved to: /home/jovyan/nfs_share/tod/LMM_POC/output/adaptive_extraction_image_008.txt


## Summary & Analysis

In [9]:
# Show conversation flow
print("\n📊 ADAPTIVE EXTRACTION SUMMARY")
print("="*60)
print(f"Image: {image_name}")
print(f"Detected Structure: {detected_structure}")
print(f"Prompt Applied: {prompt_name}")
print(f"Total Conversation Turns: {len(messages)}")
print("\nConversation Flow:")
for i, msg in enumerate(messages, 1):
    role = msg['role'].upper()
    content_preview = ""
    for c in msg['content']:
        if c['type'] == 'text':
            text = c['text'].strip()
            content_preview = text[:80] + "..." if len(text) > 80 else text
            break
        elif c['type'] == 'image':
            content_preview = "[IMAGE]"
    print(f"  {i}. {role}: {content_preview}")
print("="*60)

# Parse extraction to count fields
field_count = 0
for line in extraction_result.split('\n'):
    if ':' in line and not line.strip().startswith('#'):
        field_count += 1

print(f"\n✅ Extracted {field_count} fields")
print(f"✅ Multi-turn adaptive extraction complete!")


📊 ADAPTIVE EXTRACTION SUMMARY
Image: image_008.png
Detected Structure: date_grouped
Prompt Applied: Date-Grouped
Total Conversation Turns: 4

Conversation Flow:
  1. USER: Analyze this bank statement's structure.

Answer with ONLY one word:
- FLAT (if ...
  2. ASSISTANT: GROUPED
  3. USER: You are an expert document analyzer specializing in bank statement extraction.
E...
  4. ASSISTANT: DOCUMENT_TYPE: BANK_STATEMENT
STATEMENT_DATE_RANGE: 08/08/2025 to 07/09/2025
TRA...

✅ Extracted 5 fields
✅ Multi-turn adaptive extraction complete!


## Test Different Images

To test different bank statement structures, change `image_name` in cell 4:
- `image_003.png` - Flat table format
- `image_008.png` - Date-grouped format  
- `image_009.png` - Flat table format

The notebook will automatically:
1. Detect the structure
2. Apply the appropriate extraction prompt
3. Show the results