# Using the AI Services JSON API

This Python notebook teaches developers and Digital Accelerators how to call AI Services programmatically.  This can be useful in the initial stages of developing an integration, or when testing an AI model recently deployed to AI Services.  In this notebook, we will submit a single job to AI Services, poll for updates, and fetch the result.

To use this notebook, you should be able to run each cell in order.  Two cells need to be tweaked before being run. **These cells are marked with a warning in bold.**  Assuming these cells have been updated, you should be able to run all cells in order, and receive a good result from your AI model.

## The Process

Even if the underlying AI model is interfaced with in a single HTTP call, the AI Service job API breaks usage into several phases, across several endpoints:

1. Create a Job
2. Upload document
3. Execute Job
4. Poll Job Status
5. Download Result

## Setup

### Software Dependencies

Along with the standard library and Jupyter helpers, this model depends on the [requests library](https://pypi.org/project/requests/), which is officially recommended by Python as a high-level HTTP library.

In [None]:
import base64
import json
import os
import time

from IPython.display import clear_output
import requests

### Credentials

These items must be received from an AI Services representative:
- A username.
- An application integration GUID.  This is a UUID.
- An application token.

Your credentials and "Document Type" will be for a specific environment.  Select that environment here.  If you're not sure, it's probably "prod".

**Action: This cell starts with nonsense credentials.  Change this cell to use your own values**

In [None]:
USERNAME = "ChangeMe"
APP_GUID = "d66e7b22-37b6-4db9-8e1e-0a632c45e961"
APP_TOKEN = "CFFFFD6AE1D844378680614B13905B76"

ENVIRONMENT_ROOTS = {
    "prod": "https://ai.pwc.com/api/Nlp",
    "test": "https://ai-tst.pwcinternal.com/api/Nlp",
    "dev": "https://ai-dev.pwcinternal.com/api/Nlp"
}

api_root = ENVIRONMENT_ROOTS["prod"]

### Configuration:

This items must be received from an AI Services representative:
- A document type ID.  This is an integer.  "Document Type" is a bit of a misnomer, as this number actually identifies the AI asset you will be hitting.

Additionally:
- The file path to the document you would like to upload to AI Services.
- The file path where you would like the output of AI Services to go.

**Action: This cell starts with sample configuration to call an AIA Cost Segregation Classifier model.  You can use this configuration with the provided sample input file, or change this cell to use your own values**

In [None]:
DOC_TYPE_ID = 1429
INPUT_FILE_PATH = "sample_input_file.csv"
OUTPUT_FILE_PATH = "output.xlsx"

### HTTP Headers

Let's package our credentials into an object that we'll attach to our HTTP calls.

In [None]:
ai_services_headers = {
    "Username": USERNAME,
    "ApplicationIntegrationGuid": APP_GUID,
    "ApplicationToken": APP_TOKEN
}
print(json.dumps(ai_services_headers, indent=2))

### Test API Connection

This cell will test the connection to AI Services.

In [None]:
test_call = requests.get(f"{api_root}/GetDocumentTypes", headers=ai_services_headers, verify=False)
status = test_call.status_code
if status == 200:
    print('Successful API call, your credentials must be good!')
else:
    print(f'Received unexpected status code {status}, printing response to help you debug')
    print(test_call.text)

## Pre-req: Find ID for Output Format

If you don't already know your output format, you can learn about them by querying AI Services for information about your AI Asset "Document Type".

In [None]:
get_document_types = requests.get(f"{api_root}/GetDocumentTypes", headers=ai_services_headers, verify=False)
response = get_document_types.json()

my_doc_type = next(info for info in response if info.get("DocumentTypeId") == DOC_TYPE_ID)
print(json.dumps(my_doc_type, indent=2))

We'll grab the output format ID for later.  

**Action: You may need to tweak this cell to use the desired output format.**

In [None]:
output_format = my_doc_type["OutputFormats"][1]
output_format_name = output_format["DisplayValue"]
output_format_id = output_format["OutputFormatId"]
print(f"The output format is '{output_format_name}' (ID {output_format_id})")

## 1. Create a Job

By using our document type ID, and specifiying the output format we expect, we can create a new job.  We grab the JobId from the response to use in the remaining steps.

In [None]:
create_job_json = {
    "DocumentTypeId": DOC_TYPE_ID,
    "OutputFormatId": output_format_id
}
create_job = requests.post(f"{api_root}/CreateJob", headers=ai_services_headers, verify=False, json=create_job_json)
response = create_job.json()
print(json.dumps(response, indent=2))

job_id = response["JobId"]
print(f"The job ID is {job_id}")

## 2. Upload Document

Documents must be uploaded as BASE64 encoded versions of the files.  Uploading a document does not trigger the execution of the job.  Note that you don't _have_ to have a file on disk to upload.  If you're able to encode your data without going to disk, more power to you.

Note that _very large_ files may run into some trouble here.  AI Services is loading the base64-encoded file as a string initially, and this can lead to issues at memory allocation time.  If you're regularly breaching 50MB or 100MB in file size, consider yourself warned.

In [None]:
input_file_name = os.path.basename(INPUT_FILE_PATH)

encoded_input_file = ""
with open(INPUT_FILE_PATH, "rb") as input_file:
    encoded_input_file = base64.b64encode(input_file.read()).decode('ascii')

In [None]:
upload_document_json = {
    "JobId": job_id,
    "Document": {
        "Name": input_file_name,
        "Base64Content": encoded_input_file
    }
}

create_job = requests.post(f"{api_root}/AddDocument", headers=ai_services_headers, verify=False, json=upload_document_json)
response = create_job.json()
print(json.dumps(response, indent=2))

## 3. Execute Job

Starts the execution of the job.

In [None]:
execute_job = requests.post(f"{api_root}/ExecuteJob?jobId={job_id}", headers=ai_services_headers, verify=False)
response = execute_job.json()
print(json.dumps(response, indent=2))

status = response["Status"]
print(f'The current status of job {job_id} is "{status}"')

## 4. Poll Job Status

Now we watch the job status, hoping for a status of "Processing" to change to a status of "Ready".

This cell will check the job status periodically until the status is no longer "Processing".  Note that in some models, certain error cases result in a hanging job that will never update from "Processing".  In this case, the cell below will give up after about 30 minutes.

In [None]:
def get_status(jid):
    poll_job_status = requests.get(f"{api_root}/GetJobInformation?jobId={jid}", headers=ai_services_headers, verify=False)
    return poll_job_status.json()

wait_time = 10
start_time = time.time()
duration = 0
status = "Processing"
while status == "Processing" and duration < 1800:
    response = get_status(job_id)
    status = response["Status"]
    
    clear_output()
    duration = int(time.time() - start_time)
    print(f"The current status of job {job_id} is '{status}'.")
    print(f"{duration} seconds elapsed")
    print(json.dumps(response, indent=2))
    
    if status == "Processing":
        time.sleep(wait_time)
        wait_time = min(wait_time * 2, 300)

## 5. Download Result

Finally, now that the job is complete, we can fetch the results. The results typically come back to us as base64 in a JSON object, which we will need to decode and stuff into a file.

Note of course that you don't _have_ to stuff the results into a file, if you've got some other way of using the data.

In [None]:
download_result = requests.get(f"{api_root}/GetJobSummary?jobId={job_id}&outputFormatId={output_format_id}", headers=ai_services_headers, verify=False)
response = download_result.json()
print(json.dumps(response, indent=2)[:128] + "...")
base64_encoded_output = response["Summary"]

### A brief note on output formats

Pretty much all output formats come back as a base64-encoded file.  The JSON output format is a known exception.  It comes back as JSON.  We don't explore that pure JSON output in this notebook, but suffice it to say you won't be able to decode it as base64.

In [None]:
try:
    binary_contents = base64.standard_b64decode(base64_encoded_output)
except Exception as e:
    print("I'm having trouble decoding the result from AI Services as base64.  Perhaps you selected an output format of JSON?")
    raise e
    
with open(OUTPUT_FILE_PATH, "wb") as output_file: 
    output_file.write(binary_contents)

print(f"Job results written to {OUTPUT_FILE_PATH}")