# Sample Documents Classification

- Extract using Azure Document Intelligence Service

## Prerequisites
1. To run the code, install the following packages. Please use the latest pre-release version `pip install azure-ai-documentintelligence==1.0.0`.


- > ! pip install azure-ai-documentintelligence==1.0.0

## Load all the API keys, parameters and login credentials

In [1]:
import os
import docintel

# Your Azure Document Intelligence Service Instance
DOC_INTEL_ENDPOINT = os.getenv('FORM_RECOGNIZER_ENDPOINT')
DOC_INTEL_API_KEY = os.getenv("FORM_RECOGNIZER_API_KEY")
# The model id should match the custom model you have
# trained and deployed in your Azure Document Intelligence Service Instance
# with the endpoint MY_FORM_RECOGNIZER_ENDPOINT
MY_CLASSIFIER_MODEL_ID = 'docai-classifier-v2'

documentIntelligenceCredential = docintel.getDocumentIntelligenceCredential(DOC_INTEL_API_KEY)

documentIntelligenceClient = docintel.getDocumentIntelligenceClient(
                                endpoint=DOC_INTEL_ENDPOINT,
                                credential=documentIntelligenceCredential
                                )


Got Azure Form Recognizer API Key from environment variable


In [2]:
MY_TEST_DOCUMENT = r'C:\Users\tibarar\OneDrive - Microsoft\Desktop\DocAI - LocalDocs\AutoInsuranceClaims\InsuranceClaim-WilliamWordworth.pdf'

classifierResult = docintel.classifyLocalDocument(
                    client=documentIntelligenceClient,
                    model=MY_CLASSIFIER_MODEL_ID,
                    file_path=MY_TEST_DOCUMENT
        )

## Get the categories by page, from the Document Intelligent Service API response 

In [3]:
# Use the threshold to filter low confidence classification from the response
# You do not have to do this step for your use case.
confidenceThreshold = 0.0

theCategories = docintel.getCategories(classifierResult, confidenceThreshold)
print(f'Categories: {theCategories}')

Categories: [{'category': 'auto-insurance-claim', 'confidence': 0.692, 'pages': '[1]'}, {'category': 'auto-insurance-claim', 'confidence': 0.508, 'pages': '[2]'}]


## Extract documents from files in blob store

In [2]:
test_file_url = os.getenv('BLOB_TEST_FILE_JEAN_GENET_SAS_URL')
classifierResult = docintel.classifyOnlineDocument(
                    client=documentIntelligenceClient,
                    model=MY_CLASSIFIER_MODEL_ID,
                    file_url=test_file_url
        )

In [3]:
# Use the threshold to filter low confidence classification from the response
# You do not have to do this step for your use case.
confidenceThreshold = 0.0

theCategories = docintel.getCategories(classifierResult, confidenceThreshold)
print(f'Categories: {theCategories}')

Categories: [{'category': 'auto-insurance-claim', 'confidence': 0.856, 'pages': '[1]'}, {'category': 'auto-insurance-claim', 'confidence': 0.57, 'pages': '[2]'}]
