In this example you will learn, how to use extract structured data when parsing ACORD forms using Tensorlake DocumentAI. To learn more about ACORD form processing [check out the Tensorlake docs](https://docs.tensorlake.ai/use-cases/insurance-financial-services/acord-form-processing)

In [None]:
!pip install tensorlake

In [None]:
# Import libraries
from tensorlake.documentai import DocumentAI
from tensorlake.documentai.models import (
    EnrichmentOptions,
    PageClassConfig,
    ParsingOptions,
    StructuredExtractionOptions,
    ParseStatus
)
from tensorlake.documentai.models.enums import (
    ChunkingStrategy,
    TableOutputMode,
    TableParsingFormat,
)
import time
import json

In [None]:
%env TENSORLAKE_API_KEY=your_api_key

## Specify Structured Data Extraction

Create a simple JSON schema to specify what structured data you want extracted from the document

In [None]:
# JSON schema to extract relevant data in a structured format
structured_schema = {
  "title": "Acord125ApplicantInfo",
  "type": "object",
  "properties": {
    "agencyName": { "type": "string" },
    "agencyContactName": { "type": "string" },
    "agencyPhone": { "type": "string" },
    "applicantName": { "type": "string" },
    "applicantAddress": { "type": "string" },
    "businessPhone": { "type": "string" },
    "fein": { "type": "string" },
    "policyNumber": { "type": "string" },
    "effectiveDate": { "type": "string" },
    "expirationDate": { "type": "string" },
    "typeOfOrganization": { "type": "string" },
    "linesOfBusiness": {
      "type": "array",
      "items": { "type": "string" }
    }
  }
}

## Parse the Document
To use the Tensorlake Python SDK, you need to:
1. Create a Tensorlake Client
2. Specify a file, in this case we have a sample ACORD form
3. Specify Parsing Options, including page numbers and a chunking strategy for parsing
4. Specify Structured Extraction Options, which need to include a schema name and schema at minimum.
5. Initiate the parsing job and wait until it compeltes successfully

In [None]:
# Create a Tensorlake Client, this will reference the `TENSORLAKE_API_KEY` environment variable you set above
doc_ai = DocumentAI()

# Reference to the 29-page ACORD Form (125, 823, and 140)
file_id = "https://pub-226479de18b2493f96b64c6674705dd8.r2.dev/acord_form_125_823_140.pdf"

# Configure parsing with structured schema
parsing_options = ParsingOptions(
    page_range = "1-4", # Specify the page range you want the structured data to reference (e.g. only the 125 Form pages)
    chunking_strategy=ChunkingStrategy.PAGE
)

structured_extraction_options = StructuredExtractionOptions(
    schema_name="ACORD Form",
    json_schema=structured_schema
)

# Parse the document with the specified extraction options for structured data
parse_id = doc_ai.parse(file_id, parsing_options=parsing_options, structured_extraction_options=[structured_extraction_options])

# Wait for completion
result = doc_ai.get_parsed_result(parse_id)
print(f"Parse job {parse_id} is {result.status}, waiting...")
while result.status in [ParseStatus.PENDING, ParseStatus.PROCESSING]:
    time.sleep(5)
    result = doc_ai.get_parsed_result(parse_id)
    if result.status == ParseStatus.SUCCESSFUL:
        print(f"Parse job {parse_id} is {result.status}")
        break
    print(f"Parse job {parse_id} is {result.status}, waiting...")

# Tensorlake Parsing Output

In one single DocumentAI API call, Tensorlake returns both the full markdown content of the document and the structured data in JSON format.

## Review the Markdown Chunks Output

In [None]:
# Get the markdown from extracted data
for index, chunk in enumerate(result.chunks):
    print(f"Chunk {index}:")
    print(chunk.content)
    if index == 3:
      break

## Review the Structured Data Output

In [None]:
# Print the structured data output
result.structured_data