# Extract data from a Sample Form which has been filled up with hand written text

## Prerequisites
1. To run the code, install the following packages. Please use the latest pre-release version `pip install azure-ai-formrecognizer==3.3.0`.


- > ! pip install azure-ai-formrecognizer==3.3.0

## Load all the API keys, parameters and login credentials

In [1]:
import fr

# Your Azure Document Intelligence Service Instance
MY_FORM_RECOGNIZER_ENDPOINT = 'https://tr-docai-form-recognizer.cognitiveservices.azure.com/'
# The model id should match the custom model you have
# trained and deployed in your Azure Document Intelligence Service Instance
# with the endpoint MY_FORM_RECOGNIZER_ENDPOINT
MY_CLAIMS_MODEL_ID = 'claims-v2'

formRecognizerCredential = fr.getFormRecognizerCredential()

formRecognizerClient = fr.getDocumentAnalysisClient(
                            endpoint=MY_FORM_RECOGNIZER_ENDPOINT,
                            credential=formRecognizerCredential
                        )


Got Azure Form Recognizer API Key from environment variable


## Document Extraction Examples

### Auto Insurance Claims form by hand

- Custom Trained model
- Display label, data and confidence (document level and indivudual field level)
- Text, Checkbox, radio button

#### Display labeled data

In [3]:
# Assuming you are running notebook from the notebook folder
MY_TEST_DOCUMENT = r'..\..\..\data\sample-claims-docs\testing\IC-handwritten-RobertFrost.pdf'

fr_api_version, model_id, is_handwritten, result = fr.extractResultFromLocalDocument(
                                                        client=formRecognizerClient,
                                                        model=MY_CLAIMS_MODEL_ID,
                                                        filepath=MY_TEST_DOCUMENT
                                                    )

print(f'Document Intelligence API version = {fr_api_version}\n \
        Document Extraction Model Id = {model_id}\n \
        Does document have any hand written text? {is_handwritten}\n'
     )
doc_count = len(result.documents)
print(f'Document count = {doc_count}')

for idx, document in enumerate(result.documents):
    print(f'Document {idx} ---------------')
    print(f'\tDocument extraction confidence = {document.confidence}')
    for name, field in document.fields.items():
        field_value = field.value if field.value else field.content
        print("\t{}[type:{};conf:{}] = '{}'".format(name, field.value_type, field.confidence, field_value))


Document Intelligence API version = 2023-07-31
         Document Extraction Model Id = claims-v2
         Does document have any hand written text? True

Document count = 1
Document 0 ---------------
	Document extraction confidence = 0.992
	FormType[type:string;conf:0.948] = 'Auto Insurance Claim Document'
	Name[type:string;conf:0.965] = 'Robert Frost'
	Address[type:string;conf:0.969] = '100 Main Street, Lawrence, MA 01841'
	Phone[type:string;conf:0.948] = '+1 231 435 5612'
	Email[type:string;conf:0.982] = 'dummy5@5.com'
	PolicyNumber[type:string;conf:0.968] = 'TRI 029471329'
	IncidentDate[type:string;conf:0.98] = '8/02/2023'
	IncidentTime[type:string;conf:0.964] = '5 pm EST'
	IncidentLocation[type:string;conf:0.953] = '2 Wood Street, Lawrence, MA 01841'
	IncidentDescription[type:string;conf:0.916] = 'Two roads diverged in wood , and I - I took the one less traveled by. The other party did the same and we collided.'
	VehicleOwner[type:string;conf:0.978] = 'NA'
	VehicleMakeAndModel[type

#### View the extracted raw data pages, tables...

In [4]:
for page in result.pages:
    for line_idx, line in enumerate(page.lines):
        print(
         "...Line # {} has text content '{}'".format(
        line_idx,
        line.content.encode("utf-8")
        )
    )

    for selection_mark in page.selection_marks:
        print(
         "...Selection mark is '{}' and has a confidence of {}".format(
         selection_mark.state,
         selection_mark.confidence
         )
    )

for table_idx, table in enumerate(result.tables):
    print(
        "Table # {} has {} rows and {} columns".format(
        table_idx, table.row_count, table.column_count
        )
    )
        
    for cell in table.cells:
        print(
            "...Cell[{}][{}] has content '{}'".format(
            cell.row_index,
            cell.column_index,
            cell.content.encode("utf-8"),
            )
        )

print("----------------------------------------")

...Line # 0 has text content 'b'TR INSURED''
...Line # 1 has text content 'b'A Test P&C INSURANCE Company''
...Line # 2 has text content 'b'Auto Insurance Claim Document''
...Line # 3 has text content 'b'Customer Information''
...Line # 4 has text content 'b'Name Robert Frost''
...Line # 5 has text content 'b'Address 100 Main Street, Lawrence, MA 01841''
...Line # 6 has text content 'b'Phone Number +1 231 435 5612''
...Line # 7 has text content 'b'Email dummy5@5.com''
...Line # 8 has text content 'b'Policy Number TRI 029471329''
...Line # 9 has text content 'b'Incident Information''
...Line # 10 has text content 'b'Date of Incident 8/02/2023''
...Line # 11 has text content 'b'Time of Incident 5 pm EST''
...Line # 12 has text content 'b'Location of Incident''
...Line # 13 has text content 'b'2 Wood Street, Lawrence, MA 01841''
...Line # 14 has text content 'b'Description of Incident Two roads diverged in a wood , and I -''
...Line # 15 has text content 'b'I took the one less traveled by