# InternVL3-2B Document-Type-Aware Adaptive Extraction

**Llama-style explicit multi-stage processing** for transparency and debugging:

1. **Stage 0**: Document Type Classification (INVOICE/RECEIPT/BANK_STATEMENT)
2. **Stage 1**: Structure Classification (if BANK_STATEMENT: FLAT/GROUPED)
3. **Stage 2**: Document-Type-Aware Extraction (using appropriate prompt)

**Key Features**:
- Saves intermediate VLM responses (`doctype_classification`, `structure_classification`, `extraction_raw`)
- Multi-turn chat capability for conversation history
- Llama-compatible CSV output for model comparison
- Explicit stage-by-stage progress display

**Pattern**: Follows llama_batch_adaptive.ipynb for consistency

Outputs compatible with model_comparison.ipynb

## 1. Imports

In [1]:
# Path setup for V100 systems - ensures proper module resolution
import sys
import os
from pathlib import Path
os.environ['EVALUATION_METHOD'] = 'order_aware_f1'  # or 'f1', 'kieval', 'order_aware_f1', 'correlation'


# Get the notebook's directory
notebook_path = Path().absolute()
print(f"📂 Current directory: {notebook_path}")

# Ensure the project root is in the Python path
if str(notebook_path) not in sys.path:
    sys.path.insert(0, str(notebook_path))
    print(f"✅ Added {notebook_path} to sys.path")

# Verify common module can be found
try:
    import common
    print(f"✅ Common module found at: {common.__file__ if hasattr(common, '__file__') else 'built-in'}")
except ImportError as e:
    print(f"❌ Common module not found: {e}")
    print("📋 Current sys.path:")
    for p in sys.path[:5]:  # Show first 5 paths
        print(f"   - {p}")

print("✅ Path setup complete - proceed to imports")

📂 Current directory: /home/jovyan/nfs_share/tod/LMM_POC
✅ Added /home/jovyan/nfs_share/tod/LMM_POC to sys.path
✅ Common module found at: /home/jovyan/nfs_share/tod/LMM_POC/common/__init__.py
✅ Path setup complete - proceed to imports


## 1a. Path Setup (V100 Compatibility)

**IMPORTANT**: If you encounter import errors on V100 systems, this cell ensures proper module resolution.

In [2]:
# Enable autoreload for module changes
%load_ext autoreload
%autoreload 2

# Standard library imports
import gc
import json
import sys
import time
import warnings
from datetime import datetime
from pathlib import Path

# Add current directory to path to ensure proper module resolution
notebook_dir = Path.cwd()
if str(notebook_dir) not in sys.path:
    sys.path.insert(0, str(notebook_dir))

# Third-party imports
import numpy as np
import pandas as pd
import torch
from IPython.display import display
from rich import print as rprint
from rich.console import Console
from rich.progress import track
from transformers import AutoModel, AutoTokenizer

# Project-specific imports - only what we actually use
from models.document_aware_internvl3_processor import (
    DocumentAwareInternVL3HybridProcessor,
)
from common.gpu_optimization import emergency_cleanup
from common.extraction_parser import discover_images
from common.evaluation_metrics import load_ground_truth

print("✅ All imports loaded successfully")
print("✅ InternVL3 Hybrid Processor imported")
print(f"📂 Working directory: {notebook_dir}")
print("🔬 ADAPTIVE MODE: Explicit multi-stage processing for transparency")
warnings.filterwarnings('ignore')

✅ All imports loaded successfully
✅ InternVL3 Hybrid Processor imported
📂 Working directory: /home/jovyan/nfs_share/tod/LMM_POC
🔬 ADAPTIVE MODE: Explicit multi-stage processing for transparency


## 2. Pre-emptive Memory Cleanup

**CRITICAL for V100**: Run this cell first to prevent OOM errors when switching between models.

In [3]:
# Pre-emptive V100 Memory Cleanup - Run FIRST to prevent OOM errors
rprint("[bold red]🧹 PRE-EMPTIVE V100 MEMORY CLEANUP[/bold red]")
rprint("[yellow]Clearing any existing model caches before loading...[/yellow]")
rprint("[cyan]💡 This prevents OOM errors when switching between models on V100[/cyan]")

# Emergency cleanup to ensure clean slate
emergency_cleanup(verbose=True)

rprint("[green]✅ Memory cleanup complete - ready for model loading[/green]")
rprint("[dim]📋 Next: Import modules and configure settings[/dim]")

🚨 Running V100 emergency GPU cleanup...
🧹 Starting V100-optimized GPU memory cleanup...
   📊 Initial GPU memory: 0.00GB allocated, 0.00GB reserved
   ✅ Final GPU memory: 0.00GB allocated, 0.00GB reserved
   💾 Memory freed: 0.00GB
✅ V100-optimized memory cleanup complete
✅ V100 emergency cleanup complete


## 3. Configuration

In [4]:
# Initialize console and environment configuration
console = Console()

# Environment-specific base paths
ENVIRONMENT_BASES = {
    'sandbox': '/home/jovyan/nfs_share/tod',
    'efs': '/efs/shared/PoC_data'
}
base_data_path = ENVIRONMENT_BASES['sandbox']

CONFIG = {
    # Model settings
    'MODEL_PATH': '/home/jovyan/nfs_share/models/InternVL3-2B', #DANGER WILL ROBINSON
    # 'MODEL_PATH': '/home/jovyan/nfs_share/models/InternVL3-8B',
    # 'MODEL_PATH': '/efs/shared/PTM/InternVL3-2B',
    
    # Batch settings
    'DATA_DIR': f'{base_data_path}/evaluation_data',
    'GROUND_TRUTH': f'{base_data_path}/evaluation_data/ground_truth.csv',
    # 'OUTPUT_BASE': f'{base_data_path}/output',
    'OUTPUT_BASE': f'{base_data_path}/LMM_POC/output',
    'MAX_IMAGES': None,  # None for all, or set limit
    'DOCUMENT_TYPES': None,  # None for all, or ['invoice', 'receipt']
    'ENABLE_MATH_ENHANCEMENT': False,  # Disable mathematical correction for bank statements
    
    # Inference and evaluation mode
    'INFERENCE_ONLY': False,  # Default: True (inference-only mode)
    
    # Verbosity control
    'VERBOSE': True,
    'SHOW_PROMPTS': True,
    
    # InternVL3 optimization settings - NON-QUANTIZED TESTING
    # TESTING: Non-quantized performance after bug fixes (Rich recursion, prompt repetition)
    # This follows the official InternVL3 documentation pattern exactly
    'USE_QUANTIZATION': False,  # TESTING: Disabled to test non-quantized performance
    'DEVICE_MAP': 'auto',
    'MAX_NEW_TOKENS': 600,
    'TORCH_DTYPE': 'bfloat16',
    'LOW_CPU_MEM_USAGE': True,
    # Flash Attention: NOT supported on V100, only enable for modern GPUs
    'USE_FLASH_ATTN': False  # V100 compatible default
}

# Make GROUND_TRUTH conditional based on INFERENCE_ONLY mode
if CONFIG['INFERENCE_ONLY']:
    CONFIG['GROUND_TRUTH'] = None

# ============================================================================
# PROMPT CONFIGURATION - Explicit file and key mapping
# ============================================================================
# This configuration controls which prompt files and keys are used for each
# document type. You can explicitly override both the file and the key.
#
# Structure:
#   'extraction_files': Maps document types to YAML prompt files
#   'extraction_keys': (Optional) Maps document types to specific keys in those files
#
# If 'extraction_keys' is not specified for a document type, the key will be
# derived from the document type name (e.g., 'INVOICE' -> 'invoice')
#
# For bank statements, structure classification (_flat or _date_grouped) is 
# automatically appended UNLESS you provide a full key in 'extraction_keys'
# ============================================================================

PROMPT_CONFIG = {
    # Document type detection configuration
    'detection_file': 'prompts/document_type_detection.yaml',
    'detection_key': 'detection',
    
    # Extraction prompt file mapping (REQUIRED)
    'extraction_files': {
        'INVOICE': 'prompts/internvl3_prompts.yaml',
        'RECEIPT': 'prompts/internvl3_prompts.yaml', 
        'BANK_STATEMENT': 'prompts/internvl3_prompts.yaml'
    },
    
    # Extraction prompt key mapping (OPTIONAL - for explicit control)
    # Uncomment and configure to override automatic key derivation
    # 'extraction_keys': {
    #     'INVOICE': 'invoice',
    #     'RECEIPT': 'receipt',
    #     'BANK_STATEMENT': 'bank_statement',  # Will auto-append _flat or _date_grouped
    #     # Or specify full key to skip automatic structure suffix:
    #     # 'BANK_STATEMENT': 'bank_statement_flat',  # Forces flat table prompt
    # }
}

# Example configurations:
# ----------------------
# Use generated prompts (if you create InternVL3 generated versions):
#   'extraction_files': {
#       'INVOICE': 'prompts/generated/internvl3_invoice_prompt.yaml',
#       'RECEIPT': 'prompts/generated/internvl3_receipt_prompt.yaml',
#       'BANK_STATEMENT': 'prompts/generated/internvl3_bank_statement_prompt.yaml'
#   }
#
# Mix standard and custom prompts:
#   'extraction_files': {
#       'INVOICE': 'prompts/internvl3_prompts.yaml',
#       'RECEIPT': 'prompts/custom_receipt_prompt.yaml',
#       'BANK_STATEMENT': 'prompts/internvl3_prompts.yaml'
#   }
#
# Force specific bank statement structure:
#   'extraction_keys': {
#       'BANK_STATEMENT': 'bank_statement_flat'  # Ignores vision classification
#   }

# Field list required for DocumentAwareInternVL3HybridProcessor
UNIVERSAL_FIELDS = [
    "DOCUMENT_TYPE", "BUSINESS_ABN", "SUPPLIER_NAME", "BUSINESS_ADDRESS",
    "PAYER_NAME", "PAYER_ADDRESS", "INVOICE_DATE", "STATEMENT_DATE_RANGE",
    "LINE_ITEM_DESCRIPTIONS", "LINE_ITEM_QUANTITIES", "LINE_ITEM_PRICES",
    "LINE_ITEM_TOTAL_PRICES", "IS_GST_INCLUDED", "GST_AMOUNT", "TOTAL_AMOUNT",
    "TRANSACTION_DATES", "TRANSACTION_AMOUNTS_PAID", "TRANSACTION_AMOUNTS_RECEIVED",
    "ACCOUNT_BALANCE"
]

print("✅ Configuration set up successfully")
print(f"📂 Evaluation data: {CONFIG['DATA_DIR']}")
print(f"📊 Ground truth: {CONFIG['GROUND_TRUTH']}")
print(f"🤖 Model path: {CONFIG['MODEL_PATH']}")
print(f"📁 Output base: {CONFIG['OUTPUT_BASE']}")
print(f"📋 Universal fields: {len(UNIVERSAL_FIELDS)}")
print(f"🎯 Mode: {'Inference-only' if CONFIG['INFERENCE_ONLY'] else 'Evaluation mode'}")
print(f"⚙️  Quantization: {'ENABLED (8-bit)' if CONFIG['USE_QUANTIZATION'] else 'DISABLED (full precision)'}")
print(f"⚡ Flash Attention: {'ENABLED' if CONFIG['USE_FLASH_ATTN'] else 'DISABLED (V100 compatible)'}")
print("🔬 TESTING: Non-quantized InternVL3 performance after bug fixes")

✅ Configuration set up successfully
📂 Evaluation data: /home/jovyan/nfs_share/tod/evaluation_data
📊 Ground truth: /home/jovyan/nfs_share/tod/evaluation_data/ground_truth.csv
🤖 Model path: /home/jovyan/nfs_share/models/InternVL3-2B
📁 Output base: /home/jovyan/nfs_share/tod/LMM_POC/output
📋 Universal fields: 19
🎯 Mode: Evaluation mode
⚙️  Quantization: DISABLED (full precision)
⚡ Flash Attention: DISABLED (V100 compatible)
🔬 TESTING: Non-quantized InternVL3 performance after bug fixes


# 4. Output Directory Setup

In [5]:
# Setup output directories - Handle both absolute and relative paths

# Convert OUTPUT_BASE to Path and handle absolute/relative paths
OUTPUT_BASE = Path(CONFIG['OUTPUT_BASE'])
if not OUTPUT_BASE.is_absolute():
    # If relative, make it relative to current working directory
    OUTPUT_BASE = Path.cwd() / OUTPUT_BASE

BATCH_TIMESTAMP = datetime.now().strftime("%Y%m%d_%H%M%S")

OUTPUT_DIRS = {
    'base': OUTPUT_BASE,
    'batch': OUTPUT_BASE / 'batch_results',
    'csv': OUTPUT_BASE / 'csv',
    'visualizations': OUTPUT_BASE / 'visualizations',
    'reports': OUTPUT_BASE / 'reports'
}

for dir_path in OUTPUT_DIRS.values():
    dir_path.mkdir(parents=True, exist_ok=True)

# 5. Model Loading (Direct Official Pattern)

**NON-QUANTIZED TESTING**: Loading InternVL3 without quantization using the official documentation pattern to test whether the recent bug fixes (Rich recursion, prompt repetition) resolved the underlying issues.

In [6]:
# Load InternVL3 model using DIRECT official pattern (bypassing wrapper)
# https://internvl.readthedocs.io/en/latest/internvl3.0/quick_start.html
rprint("[bold green]Loading InternVL3 model with official NON-QUANTIZED pattern...[/bold green]")
rprint("[cyan]🔬 Testing: Non-quantized performance after bug fixes[/cyan]")
rprint("[cyan]📖 Following: https://internvl.readthedocs.io/en/latest/internvl3.0/quick_start.html[/cyan]")

try:
    # Clear any existing CUDA cache
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        rprint("[blue]🧹 CUDA cache cleared[/blue]")
    
    # Load model using exact official pattern
    rprint("[cyan]📥 Loading model with official parameters...[/cyan]")
    model = AutoModel.from_pretrained(
        CONFIG['MODEL_PATH'],
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        use_flash_attn=False,  # V100 compatible
        trust_remote_code=True,
        device_map="auto"  # Distribute across available GPUs
    ).eval()
    
    # Load tokenizer
    rprint("[cyan]📥 Loading tokenizer...[/cyan]")
    tokenizer = AutoTokenizer.from_pretrained(
        CONFIG['MODEL_PATH'],
        trust_remote_code=True,
        use_fast=False
    )
    
    # Set generation parameters
    model.config.max_new_tokens = CONFIG['MAX_NEW_TOKENS']
    
    # Display model information
    rprint("[green]✅ Model and tokenizer loaded successfully![/green]")
    
    # GPU memory check
    if torch.cuda.is_available():
        allocated = torch.cuda.memory_allocated() / 1e9
        reserved = torch.cuda.memory_reserved() / 1e9
        total = torch.cuda.get_device_properties(0).total_memory / 1e9
        rprint(f"[blue]📊 GPU Memory: {allocated:.2f}GB allocated, {reserved:.2f}GB reserved, {total:.0f}GB total[/blue]")
        rprint(f"[blue]🔍 Memory usage: {(allocated/total*100):.1f}%[/blue]")
    
    # Model parameters
    param_count = sum(p.numel() for p in model.parameters())
    rprint(f"[blue]🔢 Model parameters: {param_count:,}[/blue]")
    rprint(f"[blue]🎯 Data type: {model.dtype}[/blue]")
    rprint(f"[blue]🖥️  Device: {next(model.parameters()).device}[/blue]")
    
    # Add device map diagnostic
    if hasattr(model, 'hf_device_map'):
        from collections import Counter
        device_distribution = Counter(model.hf_device_map.values())
        rprint(f"[blue]🔄 Model distribution: {dict(device_distribution)}[/blue]")
    else:
        rprint("[blue]📍 No device map found - model placed manually[/blue]")
    
    # Initialize the hybrid processor with loaded model components
    rprint("[cyan]🔧 Initializing document-aware processor...[/cyan]")
    hybrid_processor = DocumentAwareInternVL3HybridProcessor(
        field_list=UNIVERSAL_FIELDS,
        model_path=CONFIG['MODEL_PATH'],
        debug=CONFIG['VERBOSE'],
        pre_loaded_model=model,
        pre_loaded_tokenizer=tokenizer,
        prompt_config=PROMPT_CONFIG  # Single source of truth for configuration!
    )
    
    rprint("[bold green]✅ InternVL3 NON-QUANTIZED model ready for document-aware processing[/bold green]")
    rprint("[yellow]🔬 If you see gibberish responses, it confirms quantization is still needed for V100[/yellow]")
    rprint("[yellow]🎉 If responses are clean, it proves the bug fixes resolved the core issues![/yellow]")
    
except Exception as e:
    rprint(f"[red]❌ Error loading model: {e}[/red]")
    rprint("[yellow]💡 This may indicate that quantization is still required for V100 GPUs[/yellow]")
    raise

FlashAttention2 is not installed.


🎯 InternVL3 Hybrid processor initialized for 19 fields: DOCUMENT_TYPE → ACCOUNT_BALANCE
🔧 CUDA memory allocation configured: max_split_size_mb:64
💡 Using 64MB memory blocks to reduce fragmentation
📊 Initial CUDA state (Multi-GPU Total): Allocated=3.89GB, Reserved=3.95GB
🤖 Auto-detected batch size: 8 (GPU Memory: 275.5GB)
🎯 DOCUMENT AWARE REDUCTION: 19 fields (~34% fewer than original 29)
🎯 Generation config: max_new_tokens=2000, temperature=0.0, do_sample=False
✅ Using pre-loaded InternVL3 model and tokenizer
🔧 Device: cuda:0
💾 Model parameters: 2,088,957,440
🚀 V100 optimizations applied


## 5a. Multi-Turn Chat Function

InternVL3 equivalent of Llama's `chat_with_mllm` for maintaining conversation history across multiple stages:

In [7]:
def chat_with_internvl(model, tokenizer, prompt, pixel_values, messages=None, max_new_tokens=2000, do_sample=False):
    """
    Multi-turn chat with InternVL3 using conversation history.

    Similar to Llama's chat_with_mllm but adapted for InternVL3's chat method.
    Maintains conversation history across multiple queries on the same image.

    Args:
        model: InternVL3 model
        tokenizer: InternVL3 tokenizer
        prompt: Text prompt for this turn
        pixel_values: Preprocessed image tensor
        messages: Conversation history (list of [role, content] pairs) or None
        max_new_tokens: Maximum tokens to generate
        do_sample: Whether to use sampling

    Returns:
        Tuple of (response, updated_messages)
    """
    # Initialize or extend conversation history
    if messages is None:
        messages = []

    # Add current prompt to history
    messages.append(['user', f'<image>\n{prompt}'])

    # Generate response using InternVL3 chat method
    generation_config = {
        "max_new_tokens": max_new_tokens,
        "temperature": None if not do_sample else 0.6,
        "do_sample": do_sample,
        "top_p": 0.9 if do_sample else None,
        "pad_token_id": tokenizer.eos_token_id,
    }

    # Use InternVL3 chat with history
    response = model.chat(
        tokenizer,
        pixel_values,
        prompt,
        generation_config=generation_config,
        history=messages[:-1] if len(messages) > 1 else None,  # Exclude current prompt from history
        return_history=False
    )

    # Add response to history
    messages.append(['assistant', response])

    return response, messages

rprint("[green]✅ Multi-turn chat function defined[/green]")

# 6. Image Discovery

In [8]:
# Discover and filter images - Handle both absolute and relative paths

# Convert DATA_DIR to Path and handle absolute/relative paths
data_dir = Path(CONFIG['DATA_DIR'])
if not data_dir.is_absolute():
    # If relative, make it relative to current working directory
    data_dir = Path.cwd() / data_dir

# Discover images from the resolved data directory
all_images = discover_images(str(data_dir))

# Conditionally load ground truth only when not in inference-only mode
ground_truth = {}
if not CONFIG['INFERENCE_ONLY'] and CONFIG['GROUND_TRUTH']:
    # Convert GROUND_TRUTH to Path and handle absolute/relative paths
    ground_truth_path = Path(CONFIG['GROUND_TRUTH'])
    if not ground_truth_path.is_absolute():
        # If relative, make it relative to current working directory
        ground_truth_path = Path.cwd() / ground_truth_path
    
    # Load ground truth from the resolved path
    ground_truth = load_ground_truth(str(ground_truth_path), verbose=CONFIG['VERBOSE'])
    
    rprint(f"[green]✅ Ground truth loaded for {len(ground_truth)} images[/green]")
else:
    rprint("[cyan]📋 Running in inference-only mode (no ground truth required)[/cyan]")

# Apply filters (only if ground truth is available)
if CONFIG['DOCUMENT_TYPES'] and ground_truth:
    filtered = []
    for img in all_images:
        img_name = Path(img).name
        if img_name in ground_truth:
            doc_type = ground_truth[img_name].get('DOCUMENT_TYPE', '').lower()
            if any(dt.lower() in doc_type for dt in CONFIG['DOCUMENT_TYPES']):
                filtered.append(img)
    all_images = filtered

if CONFIG['MAX_IMAGES']:
    all_images = all_images[:CONFIG['MAX_IMAGES']]

rprint(f"[bold green]Ready to process {len(all_images)} images[/bold green]")
rprint(f"[cyan]Data directory: {data_dir}[/cyan]")
if not CONFIG['INFERENCE_ONLY'] and CONFIG['GROUND_TRUTH']:
    rprint(f"[cyan]Ground truth: {ground_truth_path}[/cyan]")
rprint(f"[cyan]Mode: {'Inference-only' if CONFIG['INFERENCE_ONLY'] else 'Evaluation mode'}[/cyan]")
for i, img in enumerate(all_images[:5], 1):
    print(f"  {i}. {Path(img).name}")
if len(all_images) > 5:
    print(f"  ... and {len(all_images) - 5} more")

📊 Ground truth CSV loaded with 9 rows and 20 columns
📋 Available columns: ['image_file', 'DOCUMENT_TYPE', 'BUSINESS_ABN', 'BUSINESS_ADDRESS', 'GST_AMOUNT', 'INVOICE_DATE', 'IS_GST_INCLUDED', 'LINE_ITEM_DESCRIPTIONS', 'LINE_ITEM_QUANTITIES', 'LINE_ITEM_PRICES', 'LINE_ITEM_TOTAL_PRICES', 'PAYER_ADDRESS', 'PAYER_NAME', 'STATEMENT_DATE_RANGE', 'SUPPLIER_NAME', 'TOTAL_AMOUNT', 'TRANSACTION_AMOUNTS_PAID', 'TRANSACTION_DATES', 'TRANSACTION_AMOUNTS_RECEIVED', 'ACCOUNT_BALANCE']
✅ Using 'image_file' as image identifier column
✅ Ground truth mapping created for 9 images


  1. image_001.png
  2. image_002.png
  3. image_003.png
  4. image_004.png
  5. image_005.png
  ... and 4 more


## 7. Multi-Stage Batch Processing

**Explicit multi-stage processing** (Llama-style transparency):
- **Stage 0**: Document Type Classification (INVOICE/RECEIPT/BANK_STATEMENT)
- **Stage 1**: Structure Classification (for BANK_STATEMENT only: FLAT/GROUPED)
- **Stage 2**: Document-Type-Aware Extraction (using appropriate prompt)

**Saves intermediate responses**:
- `doctype_classification`: Raw VLM response from document type detection
- `structure_classification`: Raw VLM response from structure classification
- `extraction_raw`: Raw VLM response from field extraction

In [None]:
# Multi-stage adaptive extraction with explicit stages (Llama-style)
results = []
processing_times = []
doctype_counts = {'INVOICE': 0, 'RECEIPT': 0, 'BANK_STATEMENT': 0}
structure_counts = {'flat': 0, 'date_grouped': 0}

rprint("\n[bold green]🚀 Starting multi-stage adaptive extraction...[/bold green]\n")

for idx, image_path in enumerate(track(all_images, description="Processing images"), 1):
    image_name = Path(image_path).name

    try:
        start_time = time.time()

        # Initialize conversation history and load image once
        messages = []
        pixel_values = hybrid_processor.load_image(str(image_path))

        # ===================================================================
        # STAGE 0: Document Type Classification
        # ===================================================================
        if CONFIG['VERBOSE']:
            rprint(f"\n[bold blue]Processing [{idx}/{len(all_images)}]: {image_name}[/bold blue]")
            rprint("[dim]Stage 0: Document type detection...[/dim]")

        classification_result = hybrid_processor.detect_and_classify_document(
            str(image_path), verbose=False
        )

        document_type = classification_result['document_type']
        doctype_answer = classification_result.get('raw_response', document_type)
        doctype_counts[document_type] = doctype_counts.get(document_type, 0) + 1

        # ===================================================================
        # STAGE 1: Structure Classification (for BANK_STATEMENT only)
        # ===================================================================
        structure_type = "N/A"
        structure_answer = "N/A"

        if document_type == "BANK_STATEMENT":
            if CONFIG['VERBOSE']:
                rprint("[dim]Stage 1: Bank statement structure classification...[/dim]")

            # Use vision-based structure classifier
            from common.vision_bank_statement_classifier import classify_bank_statement_structure_vision

            structure_type = classify_bank_statement_structure_vision(
                str(image_path),
                model=hybrid_processor,  # Pass the processor (has load_image method)
                processor=None,  # InternVL3 doesn't use separate processor
                verbose=False
            )

            structure_answer = structure_type  # For InternVL3, we have the parsed result directly
            structure_counts[structure_type] = structure_counts.get(structure_type, 0) + 1
            prompt_key = f"internvl3_bank_statement_{structure_type}"
        elif document_type == "INVOICE":
            prompt_key = "internvl3_invoice"
        elif document_type == "RECEIPT":
            prompt_key = "internvl3_receipt"

        # ===================================================================
        # STAGE 2: Document-Type-Aware Extraction
        # ===================================================================
        if CONFIG['VERBOSE']:
            rprint(f"[dim]Stage 2: Extraction using {prompt_key}...[/dim]")

        extraction_result = hybrid_processor.process_document_aware(
            str(image_path), classification_result, verbose=False
        )

        # Extract data and raw response
        extracted_fields = extraction_result.get('extracted_data', {})
        extraction_raw = extraction_result.get('raw_response', '')

        # Store comprehensive results (Llama-style structure)
        result = {
            'image_file': image_name,
            'document_type': document_type,
            'structure_type': structure_type,
            'prompt_used': prompt_key,
            'doctype_classification': doctype_answer.strip() if isinstance(doctype_answer, str) else str(doctype_answer),
            'structure_classification': structure_answer.strip() if isinstance(structure_answer, str) else str(structure_answer),
            'extraction_raw': extraction_raw,
            **extracted_fields  # Add all individual field columns
        }
        results.append(result)

        processing_time = time.time() - start_time
        processing_times.append(processing_time)

        structure_display = structure_type if structure_type != 'N/A' else 'direct'
        rprint(f"[green]✅ {image_name}: {document_type} ({structure_display}) - {processing_time:.2f}s[/green]")

    except Exception as e:
        rprint(f"[red]❌ {image_name}: Error - {e}[/red]")
        results.append({
            'image_file': image_name,
            'document_type': 'ERROR',
            'structure_type': 'ERROR',
            'error': str(e)
        })
        processing_times.append(0)

    finally:
        # Memory cleanup after each image
        if 'pixel_values' in locals():
            del pixel_values

        # Clear GPU cache
        if torch.cuda.is_available():
            torch.cuda.empty_cache()

        # Periodic garbage collection every 3 images
        if idx % 3 == 0:
            gc.collect()

console.rule("[bold green]Batch Processing Complete[/bold green]")

# Display summary statistics
rprint(f"\n[bold blue]📊 Document Type Classification Summary:[/bold blue]")
rprint(f"[cyan]  Invoices: {doctype_counts.get('INVOICE', 0)}[/cyan]")
rprint(f"[cyan]  Receipts: {doctype_counts.get('RECEIPT', 0)}[/cyan]")
rprint(f"[cyan]  Bank Statements: {doctype_counts.get('BANK_STATEMENT', 0)}[/cyan]")

if doctype_counts.get('BANK_STATEMENT', 0) > 0:
    rprint(f"\n[bold blue]📊 Bank Statement Structure Summary:[/bold blue]")
    rprint(f"[cyan]  Flat table: {structure_counts.get('flat', 0)}[/cyan]")
    rprint(f"[cyan]  Date-grouped: {structure_counts.get('date_grouped', 0)}[/cyan]")

## 8. Save Results (Llama-Compatible Format)

In [10]:
# Convert results to DataFrame (Llama-compatible structure)
df = pd.DataFrame(results)

# Save to CSV (compatible with model_comparison.ipynb pattern)
csv_output = OUTPUT_DIRS['csv'] / f"internvl3_adaptive_results_{BATCH_TIMESTAMP}.csv"
df.to_csv(csv_output, index=False)

rprint(f"[green]✅ CSV saved to: {csv_output}[/green]")
rprint(f"[cyan]  Rows: {len(df)}[/cyan]")
rprint(f"[cyan]  Columns: {len(df.columns)}[/cyan]")

# Show column names to verify Llama-compatible structure
rprint("\n[bold blue]📋 CSV Columns (Llama-compatible):[/bold blue]")
core_cols = ['image_file', 'document_type', 'structure_type', 'prompt_used',
             'doctype_classification', 'structure_classification', 'extraction_raw']
rprint(f"[cyan]Core columns: {', '.join(core_cols)}[/cyan]")
field_cols = [col for col in df.columns if col not in core_cols and col != 'error']
rprint(f"[cyan]Field columns ({len(field_cols)}): {', '.join(field_cols[:5])}{'...' if len(field_cols) > 5 else ''}[/cyan]")

# Save detailed JSON results
json_output = OUTPUT_DIRS['csv'] / f"internvl3_adaptive_results_{BATCH_TIMESTAMP}.json"
with open(json_output, 'w') as f:
    json.dump(results, f, indent=2)

rprint(f"[green]✅ JSON saved to: {json_output}[/green]")

## 9. Display Sample Results

In [11]:
# Display sample results
console.rule("[bold blue]Sample Results[/bold blue]")

display_cols = ['image_file', 'document_type', 'structure_type', 'prompt_used']
rprint(df[display_cols].to_string(index=False))

## 10. Summary Statistics

In [12]:
print("\n📊 DOCUMENT-TYPE-AWARE ADAPTIVE EXTRACTION SUMMARY")
print("="*80)
print(f"Total images processed: {len(results)}")
print(f"Successful extractions: {len([r for r in results if 'error' not in r])}")
print(f"Errors: {len([r for r in results if 'error' in r])}")

print("\nDocument Type Classification:")
print(f"  Invoices: {doctype_counts.get('INVOICE', 0)}")
print(f"  Receipts: {doctype_counts.get('RECEIPT', 0)}")
print(f"  Bank Statements: {doctype_counts.get('BANK_STATEMENT', 0)}")

if doctype_counts.get('BANK_STATEMENT', 0) > 0:
    print("\nBank Statement Structure Classification:")
    print(f"  Flat table format: {structure_counts.get('flat', 0)}")
    print(f"  Date-grouped format: {structure_counts.get('date_grouped', 0)}")

print("\nPrompts Used:")
prompt_usage = {}
for result in results:
    if 'prompt_used' in result:
        prompt = result['prompt_used']
        prompt_usage[prompt] = prompt_usage.get(prompt, 0) + 1

for prompt, count in sorted(prompt_usage.items()):
    print(f"  {prompt}: {count}")

print("="*80)

# Field extraction statistics
if len(df) > 0:
    field_cols = [col for col in df.columns if col not in [
        'image_file', 'document_type', 'structure_type', 'prompt_used',
        'doctype_classification', 'structure_classification', 'extraction_raw', 'error'
    ]]

    if field_cols:
        print("\n📈 Field Extraction Coverage:")
        for field in field_cols:
            if field in df.columns:
                found_count = df[field].notna().sum()
                coverage = (found_count / len(df)) * 100
                print(f"  {field}: {found_count}/{len(df)} ({coverage:.1f}%)")


📊 DOCUMENT-TYPE-AWARE ADAPTIVE EXTRACTION SUMMARY
Total images processed: 9
Successful extractions: 9
Errors: 0

Document Type Classification:
  Invoices: 3
  Receipts: 3
  Bank Statements: 3

Bank Statement Structure Classification:
  Flat table format: 0
  Date-grouped format: 3

Prompts Used:
  internvl3_bank_statement_date_grouped: 3
  internvl3_invoice: 3
  internvl3_receipt: 3

📈 Field Extraction Coverage:
  DOCUMENT_TYPE: 9/9 (100.0%)
  BUSINESS_ABN: 6/9 (66.7%)
  SUPPLIER_NAME: 6/9 (66.7%)
  BUSINESS_ADDRESS: 6/9 (66.7%)
  PAYER_NAME: 6/9 (66.7%)
  PAYER_ADDRESS: 6/9 (66.7%)
  INVOICE_DATE: 6/9 (66.7%)
  LINE_ITEM_DESCRIPTIONS: 9/9 (100.0%)
  LINE_ITEM_QUANTITIES: 6/9 (66.7%)
  LINE_ITEM_PRICES: 6/9 (66.7%)
  LINE_ITEM_TOTAL_PRICES: 6/9 (66.7%)
  IS_GST_INCLUDED: 6/9 (66.7%)
  GST_AMOUNT: 6/9 (66.7%)
  TOTAL_AMOUNT: 6/9 (66.7%)
  STATEMENT_DATE_RANGE: 3/9 (33.3%)
  TRANSACTION_DATES: 3/9 (33.3%)
  TRANSACTION_AMOUNTS_PAID: 3/9 (33.3%)
  TRANSACTION_AMOUNTS_RECEIVED: 3/9 (33.3%

## 11. View Individual Extraction

Change `image_to_view` to view detailed extraction for a specific image:

In [13]:
# View detailed extraction for specific image
image_to_view = "image_003.png"  # Change this

result = next((r for r in results if r['image_file'] == image_to_view), None)

if result:
    print(f"\n🔍 Detailed Extraction: {image_to_view}")
    print("="*80)
    print(f"Document Type: {result['document_type']}")
    print(f"Structure Type: {result['structure_type']}")
    print(f"Prompt Used: {result['prompt_used']}")
    print(f"\nDocument Type Classification Response:")
    print(result.get('doctype_classification', 'N/A'))
    print(f"\nStructure Classification Response:")
    print(result.get('structure_classification', 'N/A'))
    print(f"\nExtraction Result:")
    extraction_display = result.get('extraction_raw', 'N/A')
    if len(extraction_display) > 1000:
        extraction_display = extraction_display[:1000] + "\n...[truncated]..."
    print(extraction_display)
    print("="*80)
else:
    print(f"Image {image_to_view} not found in results")


🔍 Detailed Extraction: image_003.png
Document Type: BANK_STATEMENT
Structure Type: date_grouped
Prompt Used: internvl3_bank_statement_date_grouped

Document Type Classification Response:
BANK_STATEMENT

Structure Classification Response:
date_grouped

Extraction Result:
```json
{
  "DOCUMENT_TYPE": "BANK_STATEMENT",
  "STATEMENT_DATE_RANGE": "03/05/2025 to 10/05/2025",
  "LINE_ITEM_DESCRIPTIONS": "EFTPOS PURCHASE WOOLWORTHS | INTEREST PAYMENT | REFUND PROCESSED | DIRECT CREDIT SALARY | ATM WITHDRAWAL ANZ ATM",
  "TRANSACTION_DATES": "03/05/2025 | 04/05/2025 | 05/05/2025 | 06/05/2025 | 07/05/2025 | 08/05/2025 | 09/05/2025 | 10/05/2025",
  "TRANSACTION_AMOUNTS_PAID": "288.03 | 22.50 | 114.66 | 3497.47 | 187.59 | 112.50 | 5.16 | 146.72"
}
```
