# Enhance your analyzer with labeled data


> #################################################################################
>
> Note: Currently this feature is only available for analyzer scenario is `document`
>
> #################################################################################

Labeled data is a group of samples that have been tagged with one or more labels to add context or meaning, which is used to improve analyzer's performance.

Please go to [Azure AI Foundry]() to use the labling tool to annotate your data.

In this notebook we will demonstrate after you have the labeled data, how to create analyzer with them and analyze your files.



## Prerequisites
1. Ensure Azure AI service is configured following [steps](../README.md#configure-azure-ai-service-resource)
1. Follow steps in [Set labeled data](../docs/set_env_for_labeled_data.md) to add training data related env variables in `.env`.
1. Install packages needed to run the sample




In [None]:
%pip install -r ../requirements.txt


## Analyzer template
In this sample we define a template for [purchase order](../analyzer_templates/purchase_order.json). We labeled the fields in the training data.

In [2]:
analyzer_template = '../analyzer_templates/purchase_order.json'

## Create Azure content understanding client
>The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is utility Class which contain the functions to interact with the Content Understanding server. Before Content Understanding SDK release, we can regard it as a lightweight SDK. Fill the constant **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, **AZURE_AI_API_KEY** with the information from your Azure AI Service.

In [None]:
import logging
import json
import os
import sys
from pathlib import Path
from dotenv import find_dotenv, load_dotenv
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

# import utility package from python samples root directory
parent_dir = Path(Path.cwd()).parent
sys.path.append(str(parent_dir))
from python.content_understanding_client import AzureContentUnderstandingClient

load_dotenv(find_dotenv())
logging.basicConfig(level=logging.INFO)

credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")

client = AzureContentUnderstandingClient(
    endpoint=os.getenv("AZURE_AI_ENDPOINT"),
    api_version=os.getenv("AZURE_AI_API_VERSION", "2024-12-01-preview"),
    token_provider=token_provider,
    x_ms_useragent="azure-ai-content-understanding-python/analyzer_training",
)

## Create analyzer with defined schema
Before creating the custom fields analyzer, you should fill the constant ANALYZER_ID with a business-related name. Here we randomly generate a name for demo purpose.

We use **TRAINING_DATA_SAS_URL** and **TRAINING_DATA_PATH** that's set in the prerequisite step.

In [4]:
import uuid
ANALYZER_ID = "train-sample-" + str(uuid.uuid4())

response = client.begin_create_analyzer(
    ANALYZER_ID,
    analyzer_template_path=analyzer_template,
    training_storage_container_sas_url=os.getenv("TRAINING_DATA_SAS_URL"),
    training_storage_container_path_prefix=os.getenv("TRAINING_DATA_PATH"),
)
result = client.poll_result(response)
if result is not None and "status" in result and result["status"] == "Succeeded":
    logging.info(f"Here is the analyzer detail for {result['result']['analyzerId']}")
    logging.info(json.dumps(result, indent=2))
else:
    logging.info(
        "Check your service please, may be some issues in configuration and deployment"
    )

INFO:python.content_understanding_client:Analyzer train-sample-18473b27-6d27-4d51-8906-9e341ad3fb59 create request accepted.
INFO:python.content_understanding_client:Request 7a0f7689-2b41-4a5e-96bc-c7ef8cb72c5e in progress ...
INFO:python.content_understanding_client:Request 7a0f7689-2b41-4a5e-96bc-c7ef8cb72c5e in progress ...
INFO:python.content_understanding_client:Request 7a0f7689-2b41-4a5e-96bc-c7ef8cb72c5e in progress ...
INFO:python.content_understanding_client:Request 7a0f7689-2b41-4a5e-96bc-c7ef8cb72c5e in progress ...
INFO:python.content_understanding_client:Request 7a0f7689-2b41-4a5e-96bc-c7ef8cb72c5e in progress ...
INFO:python.content_understanding_client:Request result is ready after 13.72 seconds.
INFO:root:Here is the analyzer detail for train-sample-18473b27-6d27-4d51-8906-9e341ad3fb59
INFO:root:{
  "id": "7a0f7689-2b41-4a5e-96bc-c7ef8cb72c5e",
  "status": "Succeeded",
  "result": {
    "analyzerId": "train-sample-18473b27-6d27-4d51-8906-9e341ad3fb59",
    "description"

## Use created analyzer to extract document content
After the analyzer is successfully created, we can use it to analyze our input files.

In [5]:
response = client.begin_analyze(ANALYZER_ID, file_location='../data/purchase_order.jpg')
result = client.poll_result(response)

logging.info(json.dumps(result, indent=2))

INFO:python.content_understanding_client:Analyzing file ../data/purchase_order.jpg with analyzer: train-sample-18473b27-6d27-4d51-8906-9e341ad3fb59
INFO:python.content_understanding_client:Request dced30f5-bb4d-473b-8b7a-13a7e29ed3ac in progress ...
INFO:python.content_understanding_client:Request dced30f5-bb4d-473b-8b7a-13a7e29ed3ac in progress ...
INFO:python.content_understanding_client:Request result is ready after 5.52 seconds.
INFO:root:{
  "id": "dced30f5-bb4d-473b-8b7a-13a7e29ed3ac",
  "status": "Succeeded",
  "result": {
    "analyzerId": "train-sample-18473b27-6d27-4d51-8906-9e341ad3fb59",
    "apiVersion": "2024-12-01-preview",
    "createdAt": "2024-12-09T23:59:16Z",
    "contents": [
      {
        "markdown": "Purchase Order\n\n\n# Hero Limited\n\nCompany Phone: 555-348-6512\nWebsite: www.herolimited.com\nEmail:\naccounts@herolimited.com\n\nPurchase Order\n\nDated As: 12/20/2020\nPurchase Order #: 948284\n\nShipped To\n\nVendor Name: Hillary Swank\nCompany Name: Higgly W

## Delete exist analyzer in Content Understanding Service
This snippet is not required, but it's only used to prevent the testing analyzer from residing in your service. The custom fields analyzer could be stored in your service for reusing by subsequent business in real usage scenarios.


In [6]:
client.delete_analyzer(ANALYZER_ID)

INFO:python.content_understanding_client:Analyzer train-sample-18473b27-6d27-4d51-8906-9e341ad3fb59 deleted.


<Response [204]>