# Email Classification with Azure OpenAI and Form Recognizer
This code demonstrates how to use Azure Form Recognizer with OpenAI and Azure Python SDK to classify documents

## Prerequisites
1. To run the code, install the following packages. Please use the latest pre-release version `pip install azure-ai-formrecognizer==3.3.0`.


- > ! pip install azure-ai-formrecognizer==3.3.0
- > ! pip install openai

## Login to Azure Document Intelligence Service

- Need to get Admin Client connection to train/build classifier
- Need regular Client connecton to classify user document

In [1]:
import fr

# Your Azure Document Intelligence Service Instance
MY_FORM_RECOGNIZER_ENDPOINT = 'https://tr-docai-form-recognizer.cognitiveservices.azure.com/'

formRecognizerCredential = fr.getFormRecognizerCredential()

formRecognizerClient = fr.getDocumentAnalysisClient(
                            endpoint=MY_FORM_RECOGNIZER_ENDPOINT,
                            credential=formRecognizerCredential
                        )
formRecognizerAdminClient = fr.getDocumentModelAdminClient(
                            endpoint=MY_FORM_RECOGNIZER_ENDPOINT,
                            credential=formRecognizerCredential
                        )


Got Azure Form Recognizer API Key from environment variable


## Load all the AOAI API keys and model parameters

In [2]:
import aoai

MY_AOAI_ENDPOINT = 'https://tr-non-prod-gpt4.openai.azure.com/'
MY_AOAI_VERSION = '2023-07-01-preview'
MY_GPT_ENGINE = 'tr-gpt4'
MY_AOAI_EMBEDDING_ENGINE = 'tr-embedding-ada'

status = aoai.setupOpenai(aoai_endpoint=MY_AOAI_ENDPOINT, 
                 aoai_version=MY_AOAI_VERSION)
if status > 0:
    print("AOAI setup succeeded")
else:
    print("AOAI setup failed")


Got OPENAI API Key from environment variable
AOAI setup succeeded


#### Set the parameters

In [3]:
# TODO: Read from Blob Store
# Assuming you are running notebook from the notebook folder
MY_INPUT_DATA_FILE = r'..\..\..\data\sample-auto-insurance-emails\cleaned-emails-with-classes.json'
MY_OUTPUT_DATA_FOLDER = r'..\..\..\data\sample-auto-insurance-emails\output'

MY_BLOB_STORE_PATH = r'sample-auto-insurance-emails'
MY_BLOB_STORE_URL = r'https://trxdocaixblob.blob.core.windows.net/docai'
MY_BLOB_STORE_CONTAINER_SAS_URL = r'https://trxdocaixblob.blob.core.windows.net/docai?sp=racwdle&st=2023-10-19T12:05:17Z&se=2023-10-27T20:05:17Z&spr=https&sv=2022-11-02&sr=c&sig=H4kLRE7Q0xoFx3HLv6T3wL53RXiFEjeNMnlok7EHi7g%3D'

# The different classes
categories = ["PolicyCancellation","IncisoCancellation","PersonChange",
                "VINNumberChange","CoverageChange","SubsequenteRegister",
                "PaymentMethodChange","UseChange","DiscountChange","VehicleChange",
                "BillingChange","VehicleDataChange","Transactionoutofscope"]

## Create the email files and list files

<font color=red>You do NOT need to run this cell if the files were already generated from  
\DocAI\data\sample-auto-insurance-emails\cleaned-emails-with-classes.json.</font>


In [4]:
import os
import json
from fpdf import FPDF

with open(MY_INPUT_DATA_FILE, 'r', encoding='utf-8') as file:
    input_data = json.load(file)
   
for item in input_data:
    email_file_name = item['FileName']
    email_body = item['EmailBody']
    
    # Write email to the pdf file
    pdf = FPDF()
    pdf.compress = False
    pdf.accept_page_break()
    pdf.set_margins(left=30.0, top=30.0, right=-1)
    pdf.add_page()
    pdf.add_font(family='arial', fname=r'c:\WINDOWS\Fonts\arial.ttf', uni=True)
    pdf.set_font(family='arial', size=10)
    pdf.write(5,email_body)
    #pdf.cell(ln=10, h=0, align='L', w=0, txt=email_body, border=0)
    pdf.output(f'{MY_OUTPUT_DATA_FOLDER}\emails\{email_file_name}', 'F')
    pdf.close()
    
    #f = open(f'{MY_OUTPUT_DATA_FOLDER}\emails\{email_file_name}', "a", encoding="utf-8")
    #f.write(email_body)
    #f.close()
    
    # Write file list in each category file
    for category in categories:
        if item[category] == True:
            f = open(f'{MY_OUTPUT_DATA_FOLDER}\{category}.jsonl', "a")
            file_path = f'{MY_BLOB_STORE_PATH}/output/emails/{email_file_name}'
            f.write('{"file":"' + file_path + '"}\n')
            f.close

#### Load the sample-auto-insurance-emails folder to your blob store
This blob store will be read by Azure Document Intelligence Service to read the emails and the classes to train the classifier  
<b>TODO:</b> Automatically upload the files to the blob store.  

For now manually upload the <b>sample-auto-insurance-emails</b> folder at the root of your container in your Blob Store

In [5]:
# Create the categoryFileMap, needed by Form Recognizer for training
categoryFileMap = {}
for category in categories:
    categoryFileMap[category] = f'{MY_BLOB_STORE_PATH}/output/{category}.jsonl'
result = fr.trainClassifier(
                            admin_client=formRecognizerAdminClient,
                            blob_url=MY_BLOB_STORE_CONTAINER_SAS_URL,
                            class_file_list=categoryFileMap
                           )
classifierId = result.classifier_id
print(f"Classifier ID: {classifierId}")
print(f"API version used to build the classifier model: {result.api_version}")
print(f"Classifier description: {result.description}")
print(f"Document Classifier expires on: {result.expires_on}")
print(f"Document classes used for training the model:")
for doc_type, details in result.doc_types.items():
    print(f"Document type: {doc_type}")
    print(f"Container source: {details.source.container_url}\n")

HttpResponseError: (InvalidArgument) Invalid argument.
Code: InvalidArgument
Message: Invalid argument.
Exception Details:	(InvalidContentSourceFormat) Invalid content source: Could not read build content.
	Code: InvalidContentSourceFormat
	Message: Invalid content source: Could not read build content.

In [None]:
print(f"Result: {result}")

In [None]:
fr.deleteClassifier(admin_client=formRecognizerAdminClient, classifier_id=classifierId)