# Azure AI Content Understanding - Classifier and Analyzer Demo

This notebook demonstrates how to use Azure AI Content Understanding service to:
1. Create a classifier to categorize documents
2. Create a custom analyzer to extract specific fields
3. Combine classifier and analyzers to classify, optionally split, and analyze documents in a flexible processing pipeline

If you’d like to learn more before getting started, see the official documentation:
[Understanding Classifiers in Azure AI Services](https://learn.microsoft.com/en-us/azure/ai-services/content-understanding/concepts/classifier)

## Prerequisites
1. Ensure Azure AI service is configured following [steps](../README.md#configure-azure-ai-service-resource)
2. Install the required packages to run the sample.


In [None]:
%pip install -r requirements.txt

## 1. Import Required Libraries

In [None]:
import json
import logging
import os
import sys
import uuid
from pathlib import Path

from dotenv import find_dotenv, load_dotenv
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

load_dotenv(find_dotenv())
logging.basicConfig(level=logging.INFO)

print("✅ Libraries imported successfully!")

## 2. Import Azure Content Understanding Client

The `AzureContentUnderstandingClient` class handles all API interactions with the Azure AI service.

In [None]:
try:
    from content_understanding_client import AzureContentUnderstandingClient
    print("✅ Azure Content Understanding Client imported successfully!")
except ImportError:
    print("❌ Error: Make sure 'AzureContentUnderstandingClient.py' is in the same directory as this notebook.")
    raise

## 3. Configure Azure AI Service Settings and Prepare the Sample

Update these settings to match your Azure environment:

- **AZURE_AI_ENDPOINT**: Your Azure AI service endpoint URL or set up in ".env" file
- **AZURE_AI_API_VERSION**: The Azure AI API version to use. Default is "2025-05-01-preview". 
- **AZURE_AI_API_KEY**: Your Azure AI service key (optional if using token authentication)
- **SAMPLE_CLAIMS_BUNDLE**: Path to the PDF document you want to process

In [None]:
# always refresh all vars
load_dotenv(override=True)
# For authentication, you can use either token-based auth or subscription key, and only one of them is required
AZURE_AI_ENDPOINT = os.getenv("AZURE_AI_ENDPOINT")
# IMPORTANT: Replace with your actual subscription key or set up in ".env" file if not using token auth
AZURE_AI_API_KEY = os.getenv("AZURE_CONTENT_UNDERSTANDING_SUBSCRIPTION_KEY")
AZURE_AI_API_VERSION = os.getenv("AZURE_AI_API_VERSION", "2025-05-01-preview")
SAMPLE_CLAIMS_BUNDLE = os.getenv("SAMPLE_CLAIMS_BUNDLE")
# Authentication - Using DefaultAzureCredential for token-based auth
# Using the current users identity here
# This will be a managed identity once move to an Azure Function

# Setup credentials
credential = DefaultAzureCredential(
    exclude_managed_identity_credential=True,
    exclude_client_secret_credential=True,
    exclude_environment_credential=True,
    exclude_workload_identity_credential=True,
    exclude_shared_token_cache_credential=True,
    exclude_azure_powershell_credential=True,
    exclude_azure_developer_cli_credential=True,
)
token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")

file_location = Path(SAMPLE_CLAIMS_BUNDLE)

print("📋 Configuration Summary:")
print(f"   Endpoint: {AZURE_AI_ENDPOINT}")
print(f"   API Version: {AZURE_AI_API_VERSION}")
#print(f"   API KEY: {AZURE_AI_API_KEY}")
print(f"   Document: {file_location.name if file_location.exists() else '❌ File not found'}")

## 4. Define Classifier Schema

The classifier schema defines:
- **Categories**: Document types to classify (e.g., Legal, Medical)
  - **description (Optional)**: An optional field used to provide additional context or hints for categorizing or splitting documents. This can be helpful when the category name alone isn’t descriptive enough. If the category name is already clear and self-explanatory, this field can be omitted.

- **This classifier should indtify these document types**
  -  **Completed_Claim_Form**  
  -   **HIPAA_Release**  
  -   **Signed_Physician_Statement**  
  -   **Pathology_Report**  
  -   **Doctor_Office_Visit_Report**  
  -   **Scanner_Report**  
  -   **Other_Document_Type**
  -   **Itemized_Bill_for_Lab_Services**  
  -   **Itemized_Bill_for_Radiology_Services**  
  -   **Itemized_Bill_from_Other_Service_Providers_Type**  
  -   **UB04_Bil** 

- **splitMode Options**: Defines how multi-page documents should be split before classification or analysis.
  - `"auto"`: Automatically split based on content.  
  For example, if two categories are defined as “invoice” and “application form”:
    - A PDF with only one invoice will be classified as a single document.
    - A PDF containing two invoices and one application form will be automatically split into three classified sections.
  - `"none"`: No splitting.  
  The entire multi-page document is treated as a single unit for classification and analysis.
  - `"perPage"`: Split by page.  
  Each page is treated as a separate document. This is useful when you’ve built custom analyzers designed to operate on a per-page basis.

  ### Below is my schema definition

In [None]:
# I am automatically split based on content! That means per document within the bundle
# Define field descriptionsand classifier document categories and their descriptions
classifier_schema = {
			"categories": {
					"Completed_Claim_Form": {"description": "a Completed Claim Form"},
					"HIPAA_Release": {"description": "a HIPAA Release"},
					"Signed_Physician_Statement": {"description": "a Signed Physician Statement"},
                    "Itemized_Bill_for_Lab_Services": {"description": "an Itemized Bill from a laboratory for Lab test and services. This type which includes Statments, Invoices, Account Summaries, any document that has dollar amounts on it sent from a lab"},
					"Itemized_Bill_for_Radiology_Services": {"description": "an Itemized Bill from a radiology department for imaging services. This type which includes Statments, Invoices, Account Summaries, any document that has dollar amounts on it sent from a radiology provider."},
                    "Itemized_Bill_from_Other_Service_Providers_Type": {"description": "an Itemized Bill from a other than a laboratory, a radiology provider or a hospitals provider types listed above This type which includes Statments, Invoices, Account Summaries, any document that has dollar amounts on it."},
					"UB04_Bill": {"description": "A special type of itemized bill. It will have the notation on it UB04 or UB-04 or UB 04."},
					"Pathology_Report": {"description": "a Pathology Report"},
                    "Doctor_Office_Visit_Report": {"description": "a Doctor Office Visit Report contains a narrative of the visit, including symptoms, diagnosis, and treatment plan. It does not include any billing information."},
                    "Scanner_Report": {"description": "a Scanner Report that list issue with the scan of the documents"},
					"Other_Document_Type": {"description": "A document type other the other ones specified"}
					},
    			"splitMode": "auto"  # IMPORTANT: Automatically detect document boundaries. Can change mode for your needs.
			}
			

# Make 2 columns in the output columns aligned
print("📄 Classifier DocTypes:")
for category, details in classifier_schema["categories"].items():
    print(f"   • {category}: {details['description'][:60]}...")

## 5. Initialize Content Understanding Client

Create the client that will communicate with Azure AI services.

⚠️ Important:
You must update the code below to match your Azure authentication method.
Look for the `# IMPORTANT` comments and modify those sections accordingly.
If you skip this step, the sample may not run correctly.

⚠️ Note: Using a subscription key works, but using a token provider with Azure Active Directory (AAD) is much safer and is highly recommended for production environments.

In [None]:
# Initialize the Azure Content Understanding client
try:
    content_understanding_client = AzureContentUnderstandingClient(
        endpoint=AZURE_AI_ENDPOINT,
        api_version=AZURE_AI_API_VERSION,
        # IMPORTANT: Comment out token_provider if using subscription key
        token_provider=token_provider,
        # IMPORTANT: Uncomment this if using subscription key
        subscription_key=AZURE_AI_API_KEY,
    )
    print("✅ Content Understanding client initialized successfully!")
    print("   Ready to create classifiers and analyzers.")
except Exception as e:
    print(f"❌ Failed to initialize client: {e}")
    raise

## 9A. Create a Custom Analyzer (9A) for the itemizated bill doc types

Now let's create a schenma for custom analyzer that can extract specific fields from documents.
This analyzer will:
- Extract common document fields from the Medical and billing documents in the bundle


In [None]:
# Define analyzer schema with custom fields
analyzer_schema_9A = {
    "description": "Analyzer_with_document_fields - extracts key document information from a bundle of documents in a single pdf submitted for claims",
    "baseAnalyzerId": "prebuilt-documentAnalyzer",  # Built on top of the general document analyzer
    "config": {
        "returnDetails": True,
        "enableLayout": True,          # Extract layout information
        "enableBarcode": False,        # Skip barcode detection
        "enableFormula": False,        # Skip formula detection
        "estimateFieldSourceAndConfidence": True, # Set to True if you want to estimate the field location (aka grounding) and confidence
        "disableContentFiltering": False,
    },
    "fieldSchema": {
        "fields": {
			"title_on_first_page_of_document": {
				"type": "string",
				"method": "generate",
				"description": "This is the title of the document. It will typically be the line of text with the largest sized font near the top of the page The value should be \"None\" if there is no title or it cannot be determined. "
			}
        }
    }
}

# Generate unique analyzer ID
analyzer_id_9A = "Analyzer_with_document_fields_9A_" + str(uuid.uuid4())

# Create the analyzer
try:
    print(f"🔨 Creating custom analyzer: {analyzer_id_9A}")
    print("\n📋 Analyzer will extract:")
    for field_name, field_info in analyzer_schema_9A["fieldSchema"]["fields"].items():
        print(f"   • {field_name}: {field_info['description']}")
    
    response = content_understanding_client.begin_create_analyzer(analyzer_id_9A, analyzer_schema_9A)
    result = content_understanding_client.poll_result(response)
    
    print("\n✅ Analyzer_with_document_fields created successfully!")
    print(f"   Analyzer ID 9A: {analyzer_id_9A}")
    
except Exception as e:
    print(f"\n❌ Error creating analyzer: {e}")
    analyzer_id_9A = None  # Set to None if creation failed

## 9B. And a Custom Analyzer (9B) for the billed expenses

In [None]:
# Define analyzer schema with custom fields
analyzer_schema_9B = {
    "description": "Analyzer_with_document_fields - extracts key document information from a bundle of documents in a single pdf submitted for claims",
    "baseAnalyzerId": "prebuilt-documentAnalyzer",  # Built on top of the general document analyzer
    "config": {
        "returnDetails": True,
        "enableLayout": True,          # Extract layout information
        "enableBarcode": False,        # Skip barcode detection
        "enableFormula": False,        # Skip formula detection
        "estimateFieldSourceAndConfidence": True, # Set to True if you want to estimate the field location (aka grounding) and confidence
        "disableContentFiltering": False,
    },
    "fieldSchema": {
        "fields": {
			"title_on_first_page_of_document": {
				"type": "string",
				"method": "generate",
				"description": "This is the title of the document. It will typically be the line of text with the largest sized font near the top of the page The value should be \"None\" if there is no title or it cannot be determined. "
			},
			"Expenses": {
				"type": "array",
				"items": {
					"type": "object",
					"properties": {
                        "Expense_Amount": {
                                "type": "number",
                                "method": "generate",
                                "description": "A table of the expense items amounts billed to patient or insurance company. These are charges for procedures, professional services, lab tests performed and other medical services. They will be numeric with 2 decimal places. Keep the 2 decimal places even it they are .00. They will typically be on the document pages in a tabular layout with the expensed dollar amounts all in the same column. You will typically find the other columns to extract ICD code CPT code etc for the other columns in this table usually on the same line as the amount. Only capture positive amounts that are actual charges (not totals, subtotals, adjustments, refunds, or negative values or amount that are zero). All dollar amounts for expenses must be captured. the document may contain multiple pages of expenses within a single document."
                            },
                            "ICD_Code": {
                                "type": "string",
                                "method": "generate",
                                "description": "The ICD code associated with the expense if there is one. If there is no ICD code, use \"\".  The ICD code is usually on the same line of the table as the amount"
                            },
                            "Date": {
                                "type": "date",
                                "method": "generate",
                                "description": "The date of the expense. The date is usually on the same line of the table as the amount format the date as mm/dd/yyyy."
                            },
                            "Expense_Description": {
                                "type": "string",
                                "method": "generate",
                                "description": "The description of the expense. This may be the procedure name. It is usually on the same line of the table as the amount."	
                            },
                            "Surgeon_Name_or_Provider": {
                                "type": "string",
                                "method": "generate",
                                "description": "The surgeon or provider if this expense was a sugical procedure."
                            },
                            "CPT_Code": {
                                "type": "string",
                                "method": "generate",
                                "description": "The CPT code associated with the expense.  It is usually on the same line of the table as the amount."	
                            },
                            "Ref_Page": {
                                "type": "number",
                                "method": "generate",
                                "description": "The Bundle page the expense was found on. This is the page number from the top of the page in tha stamped header. It will on the line of text that starts with Page XX of YY where xx  is the current page number and yy is the total number of pages in the document bundle."
                            },
                            "Drug_Name": {
                                "type": "string",
                                "method": "generate",
                                "description": "If the expense charge was for a drug, put the drug name here. If not for a drug put N/A in this field."
                            },
                            "Expense_Type": {
                                "type": "string",
                                "method": "generate",
                                "description": "Categorize each expense into one of four categories based on the description, ICD10, CPT code, or other context. The 4 categories are:  1. Cancer_History_Expenses, 2. Diagnostic_Tests_and_Labs_Expenses,  3. Surgical_Events_Expenses,  4. Cancer_Treatment_Expenses. Put every expense into one of the four. If it was for a exam, a lab test or other diagostic test that diagosed cancer or remission, make it a #1. it was for a lab or dignostic test make it a #2 If it was for a surgical procedure make it a #3. Everything else is a #4. use the full name not just the number when filling in field."
                            }
                        },
                        "method": "generate"
                    },
				"method": "generate",
				"description": "Expenses are charges billed to either the patient or insurance company. They are single charges for a procedure, test or other medical service. They do not include payments, adjustments, refunds, total balances, subtotals. Other than these exceptions all other dollar amounts may be an expense and should be reviewed."
                }
            },
        }
}

# Generate unique analyzer ID
analyzer_id_9B = "Analyzer_with_document_fields_9B_" + str(uuid.uuid4())

# Create the analyzer
try:
    print(f"🔨 Creating custom analyzer: {analyzer_id_9B}")
    print("\n📋 Analyzer will extract:")
    for field_name, field_info in analyzer_schema_9B["fieldSchema"]["fields"].items():
        print(f"   • {field_name}: {field_info['description']}")

    response = content_understanding_client.begin_create_analyzer(analyzer_id_9B, analyzer_schema_9B)
    result = content_understanding_client.poll_result(response)
    
    print("\n✅ Analyzer_with_document_fields created successfully!")
    print(f"   Analyzer ID: {analyzer_id_9B}")

except Exception as e:
    print(f"\n❌ Error creating analyzer: {e}")
    analyzer_id_9B = None  # Set to None if creation failed

## 9C. And a Custom Analyzer (9C) for the patient information

In [None]:
# Define analyzer schema with custom fields
analyzer_schema_9C = {
    "description": "Analyzer_with_document_fields - extracts key document information from a bundle of documents in a single pdf submitted for claims",
    "baseAnalyzerId": "prebuilt-documentAnalyzer",  # Built on top of the general document analyzer
    "config": {
        "returnDetails": True,
        "enableLayout": True,          # Extract layout information
        "enableBarcode": False,        # Skip barcode detection
        "enableFormula": False,        # Skip formula detection
        "estimateFieldSourceAndConfidence": True, # Set to True if you want to estimate the field location (aka grounding) and confidence
        "disableContentFiltering": False,
    },
    "fieldSchema": {
        "fields": {
			"title_on_first_page_of_document": {
				"type": "string",
				"method": "generate",
				"description": "This is the title of the document. It will typically be the line of text with the largest sized font near the top of the page The value should be \"None\" if there is no title or it cannot be determined. "
			},
			"Patient_First_Name": {
                "type": "string",
                "method": "generate",
                "description": "The first name of the patient. This is usually on the first page of the document."
            },
            "Patient_Last_Name": {
                "type": "string",
                "method": "generate",
                "description": "The last name of the patient. This is usually on the first page of the document."
            },
            "DOB": {
                "type": "string",
                "method": "generate",
                "description": "The DOB of the patient. This is usually on the first page of the document. Put into YYYY-MM-DD format."
            },
            "Gender": {
                "type": "string",
                "method": "generate",
                "description": "The gender of the patient. This is usually on the first page of the document."
            },
            "Policy_Number": {
                "type": "string",
                "method": "generate",
                "description": "The policy number of the patient. This is usually on the first page of the document. If the field is missing, use \"\"."
            }
        }
    }
}

# Generate unique analyzer ID
analyzer_id_9C = "Analyzer_with_document_fields_9C_" + str(uuid.uuid4())

# Create the analyzer
try:
    print(f"🔨 Creating custom analyzer: {analyzer_id_9C}")
    print("\n📋 Analyzer will extract:")
    for field_name, field_info in analyzer_schema_9C["fieldSchema"]["fields"].items():
        print(f"   • {field_name}: {field_info['description']}")

    response = content_understanding_client.begin_create_analyzer(analyzer_id_9C, analyzer_schema_9C)
    result = content_understanding_client.poll_result(response)
    
    print("\n✅ Analyzer_with_document_fields created successfully!")
    print(f"   Analyzer ID: {analyzer_id_9C}")

except Exception as e:
    print(f"\n❌ Error creating analyzer: {e}")
    analyzer_id_9C = None  # Set to None if creation failed

## 10. Create an Enhanced Classifier with 3 Custom Analyzers  - Create the Schema

Now we'll create a new classifier that classifys the document and our 3 custom analyzer, 1 for bills and 1 for claim form and 1 everything else. This combines classification with field extraction in one operation.


we are using the 3 analyzers from previos cells 9A, 9B and 9C


## This is the schema for my Classifier  

Not that for each document type I am specifing one of the 3 analyzers 9A or 9B or 9C)

In [None]:
# I am automatically split based on content! That means per document within the bundle
# Define field descriptionsand classifier document categories and their descriptions
enhanced_classifier_with_document_metadata_and_fields_schema_9 = {
			"categories": {
                	"Completed_Claim_Form": {"description": "a Completed Claim Form", "analyzerId": analyzer_id_9C},
					"HIPAA_Release": {"description": "a HIPAA Release", "analyzerId": analyzer_id_9A},
					"Signed_Physician_Statement": {"description": "a Signed Physician Statement", "analyzerId": analyzer_id_9A},
                    "Pathology_Report": {"description": "a Pathology Report", "analyzerId": analyzer_id_9A},
                    "Doctor_Office_Visit_Report": {"description": "a Doctor Office Visit Report contains a narrative of the visit, including symptoms, diagnosis, and treatment plan. It does not include any billing information.", "analyzerId": analyzer_id_9A},
                    "Scanner_Report": {"description": "a Scanner Report that list issue with the scan of the documents", "analyzerId": analyzer_id_9A},
					"Other_Document_Type": {"description": "A document type other the other ones specified", "analyzerId": analyzer_id_9A},
					"Itemized_Bill_for_Lab_Services": {"description": "an Itemized Bill from a laboratory for Lab test and services. This type which includes Statments, Invoices, Account Summaries, any document that has dollar amounts on it sent from a lab", "analyzerId": analyzer_id_9B},
					"Itemized_Bill_for_Radiology_Services": {"description": "an Itemized Bill from a radiology department for imaging services. This type which includes Statments, Invoices, Account Summaries, any document that has dollar amounts on it sent from a radiology provider.", "analyzerId": analyzer_id_9B},
                    "Itemized_Bill_from_Other_Service_Providers_Type": {"description": "an Itemized Bill from a other than a laboratory, a radiology provider or a hospitals provider types listed above This type which includes Statments, Invoices, Account Summaries, any document that has dollar amounts on it.", "analyzerId": analyzer_id_9B},
					"UB04_Bill": {"description": "A special type of itemized bill. It will have the notation on it UB04 or UB-04 or UB 04.", "analyzerId": analyzer_id_9B},
					},
    			"splitMode": "auto"  # IMPORTANT: Automatically detect document boundaries. Can change mode for your needs.
			}
# Make 2 columns in the output columns aligned
print("📄 Classifier DocTypes:")
for category, details in enhanced_classifier_with_document_metadata_and_fields_schema_9["categories"].items():
    print(f"   • {category}: {details['description'][:60]}...")

## 11. Then, create the Classifier  

It take the schema ans an input parameter and the name you want to give to the classifier

In [None]:
print(f"Using analyzer from prior cell 9A:: {analyzer_id_9A}")
print(f"Using analyzer from prior cell 9B:: {analyzer_id_9B}")
print(f"Using analyzer from prior cell 9C:: {analyzer_id_9C}")


# Generate unique enhanced classifier ID
classifier_id_9 = "classifier_based_on_doc_type_9" + str(uuid.uuid4())
print(f"🔨 Creating classifier: {classifier_id_9}")

# Create the enhanced classifier
if analyzer_id_9A and analyzer_id_9B:  # Only create if both of the previous analyzers was successfully created
	print("\n📋 Configuration:")
	print("   • Medical documents in claim bundle → Custom analyzer with field extraction")

	print(f"\n   • These document types below can use the classifier {classifier_id_9} and the custom analyzer - analyzer_id: {analyzer_id_9C} and this schema:  enhanced_classifier_with_document_metadata_and_fields_schema_9")
	print("\n	- Completed_Claim_Form")


	print(f"\n   • These document types below can use the classifier {classifier_id_9} and the custom analyzer - analyzer_id: {analyzer_id_9A} and this schema:  enhanced_classifier_with_document_metadata_and_fields_schema_9")
	print("		- HIPAA_Release")
	print("		- Signed_Physician_Statement")
	print("		- Pathology_Report")
	print("		- Doctor_Office_Visit_Report")
	print("		- Scanner_Report")
	print("		- Other_Document_Type")

	print(f"\n   • These document types below can use the classifier {classifier_id_9} and the custom analyzer - analyzer_id: {analyzer_id_9B} and this schema: enhanced_classifier_with_document_metadata_and_fields_schema_9")
	print(f"\n	- Itemized_Bill_for_Lab_Services")
	print(f"	- Itemized_Bill_for_Radiology_Services")
	print(f"	- Itemized_Bill_from_Other_Service_Providers_Type")
	print(f"	- UB04_Bill")
	try:
		response = content_understanding_client.begin_create_classifier(classifier_id_9, enhanced_classifier_with_document_metadata_and_fields_schema_9 )
		result = content_understanding_client.poll_result(response)
			
		print("\n✅ Enhanced classifier created successfully!")
			
	except Exception as e:
		print(f"\n❌ Error creating enhanced classifier: {e}")
else:
	print("⚠️  Skipping enhanced classifier creation - analyzer was not created successfully.")

## 12. Process Document with Enhanced Classifier - this is reading the PDF document

Let's process the documents again using our enhanced classifier.  

All documents will now have additional metadata fields extracted.

In [None]:
# using the classifyer that breaks the bundle into documents
#print(f"Using analyzer from prior cell 9A:: {analyzer_id_9A}")
print(f"Using analyzer from prior cell 9B:: {analyzer_id_9B}")
if classifier_id_9:
    print(f"🔨 Using classifier: {classifier_id_9}")
if analyzer_id_9A and analyzer_id_9B and analyzer_id_9C:
    print(f"🔨 Using analyzer: {analyzer_id_9A} and {analyzer_id_9B}  and {analyzer_id_9C}")
else:
    print("⚠️  Skipping analyzer usage - analyzer was not created successfully in previous cell")
if classifier_id_9 and analyzer_id_9A and analyzer_id_9B and analyzer_id_9C:
    print(f"🔨 Using classifier: {classifier_id_9}")
    try:
        # Check if document exists
        if not file_location.exists():
            raise FileNotFoundError(f"Document not found at {file_location}")
    
        # Process with enhanced classifier
        print("📄 Processing document with enhanced classifier")
        print(f"   Document: {file_location.name}")
        print("\n⏳ Processing with classification + field extraction...")

        response = content_understanding_client.begin_classify(classifier_id=classifier_id_9, file_location=str(file_location))
        enhanced_result = content_understanding_client.poll_result(response, timeout_seconds=720,polling_interval_seconds=25)
        
        print("\n✅ Enhanced processing completed!")
        
    except Exception as e:
        print(f"\n❌ Error processing document: {e}")
else:
    print("⚠️  Skipping enhanced classification - enhanced classifier was not created.")

## 13. View Enhanced Results with Extracted Fields  - Now we can look at what was extracted

Let's see the classification results along with the extracted fields. 
-  All documents should have a classification
-  All documents should have a title (if one could be found)
-  The itemized bills should have extracted expenses.  

In [None]:
# Display enhanced results
if 'enhanced_result' in locals() and enhanced_result:
    result_data = enhanced_result.get("result", {})
    contents = result_data.get("contents", [])
    
    print("📊 ENHANCED CLASSIFICATION RESULTS WITH DOCUMENT METADATA")
    print("=" * 70)
    print(f"\nTotal sections(documents) found: {len(contents)}")
    
    # Process each section
    for i, content in enumerate(contents, 1):
        print(f"\n{'='*70}")
        print(f"DOCUMENT #{i}")
        print(f"{'='*70}")
        
        category = content.get('category', 'Unknown')

    
        print(f"\n📁 Type of Document: {category}")
        print(f"📄 Document Starting Page in Bundle: {content.get('startPageNumber', '?')}")
        print(f"📄 Document Ending Page in Bundle: {content.get('endPageNumber', '?')}")
        print(f"📄 Number of Pages in Document: {content.get('endPageNumber', 0) - content.get('startPageNumber', 0) + 1}")

 
        

        # Show extracted fields from field extraction
        fields = content.get('fields', {})
        if fields:
            for field_name, field_data in fields.items():
                if field_name == "title_on_first_page_of_document":
                    print(f"📄 Document Title: {field_data['valueString']}")
                    continue

                if field_name == 'Patient_First_Name':
                    print(f"📄 Patient_First_Name: {field_data['valueString']}")
                    continue

                if field_name == "Patient_Last_Name":
                    print(f"📄 Patient_Last_Name: {field_data['valueString']}")
                    continue

                if field_name == "DOB":
                    print(f"📄 DOB: {field_data['valueString']}")
                    continue

                if field_name == "Gender":
                    print(f"📄 Gender: {field_data['valueString']}")
                    continue

                if field_name == "Policy_Number":
                    print(f"📄 Policy_Number: {field_data['valueString']}")
                    continue

                if field_name == "Expenses":
                    for expense in field_data.get('valueArray', []):
                        print(expense)
                else:
                    print(field_data)

## 14. View Enhanced Results with Extracted Fields - Another way to view it, it is just JSON

Let's see the classification results along with the extracted fields from the documents in the claim document bundle.

You can also see the fulll JSON result below.

In [None]:
print(json.dumps(enhanced_result, indent=2))

## 15. Summary and Next Steps

Congratulations! You've successfully:
1. ✅ Created a basic classifier to categorize documents
2. ✅ Created a 3 custom analyzers to extract specific fields from specific types of documents
3. ✅ Combined them into an enhanced classifier for intelligent document processing

## 16. Let's count the expenses found

In [None]:
# I need a count of expenses found
if 'enhanced_result' in locals() and enhanced_result:
    result_data = enhanced_result.get("result", {})
    contents = result_data.get("contents", [])

    expense_count = 0
    for content in contents:
        fields = content.get('fields', {})
        if fields:
            expense_count += len(fields.get("Expenses", {}).get('valueArray', []))

    print(f"💰 Total expenses found: {expense_count}")