## BATCH OCR READ with Computer Vision API  <a name="RecognizeText"> </a>
Batch Read File
Use this interface to get the result of a Batch Read File operation, employing the state-of-the-art Optical Character Recognition (OCR) algorithms optimized for text-heavy documents. It can handle hand-written, printed or mixed documents. When you use the Batch Read File interface, the response contains a field called "Operation-Location". The "Operation-Location" field contains the URL that you must use for your Get Read Operation Result operation to access OCR results.​

For the result of a Batch Read File operation to be available, it requires an amount of time that depends on the length of the text and the page count. So, you may need to wait before using the Get Read Operation Result operation. The time you need to wait may be up to a few minutes for text-heavy, multi-page images. ​

Note: this technology is only available for English text.

Set `image_path` to point to the image to be recognized.

In [None]:
# IMPORTING THE LIBRARIES
import requests
%matplotlib inline
from PIL import Image
import matplotlib.patches as patches
from io import BytesIO
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.collections import PatchCollection
from matplotlib.patches import Rectangle
import time
from matplotlib.patches import Polygon

# PROXIES IF YOU ARE RUNNING THE JUPYTER NOTEBOOK FROM YOUR LAPTOP
proxy = "http://www.com:8080"
proxys = "https://www.com:8080"
proxyDict = { "http"  : proxy, "https"  : proxys }

In [None]:
subscription_key = 'your_key' 
assert subscription_key

image_path = "your_file.pdf"

In [None]:
vision_base_url = "https://PYTHONCOMPUTERVISION.cognitiveservices.azure.com/vision/v2.0/"
ocr_url = vision_base_url + "read/core/asyncBatchAnalyze"
print(ocr_url)

In [None]:
# Set image_url to the URL of an image that you want to analyze.

image_data = open(image_path, "rb").read()
headers    = {'Ocp-Apim-Subscription-Key': subscription_key, 
              "Content-Type": "application/octet-stream" }


response = requests.post(
    ocr_url, headers=headers,  data=image_data ) #, proxies=proxyDict)
response.raise_for_status()

# Extracting text requires two API calls: One call to submit the
# image for processing, the other to retrieve the text found in the image.

# Holds the URI used to retrieve the recognized text.
operation_url = response.headers["Operation-Location"]



In [None]:
operation_url

In [None]:
# The recognized text isn't immediately available, so poll to wait for completion.
'''headers = {
    # Request headers
    'Content-Type': 'application/json',
    'Ocp-Apim-Subscription-Key': '{subscription key}',
}'''


analysis = {}
poll = True
while (poll):
    response_final = requests.get(
        response.headers["Operation-Location"], headers=headers) #, proxies=proxyDict)
    analysis = response_final.json()
    print(analysis)
    time.sleep(1)
    if ("recognitionResults" in analysis):
        poll = False
    if ("status" in analysis and analysis['status'] == 'Failed'):
        poll = False


In [None]:
analysis["recognitionResults"]

In [None]:

polygons = []
if ("recognitionResults" in analysis):
    # Extract the recognized text, with bounding boxes.
    polygons = [(line["boundingBox"], line["text"]) for line in analysis["recognitionResults"][0]["lines"]]

#polygons = [(line["boundingBox"], line["text"]) for line in analysis["recognitionResult"]["lines"]]


# Display the image and overlay it with the extracted text.


Next, the recognized text along with the bounding boxes can be extracted as shown in the following line of code. An important point to note is that the handwritten text recognition API returns bounding boxes as **polygons** instead of **rectangles**. Each polygon is _p_ is defined by its vertices specified using the following convention:

<i>p</i> = [<i>x</i><sub>1</sub>, <i>y</i><sub>1</sub>, <i>x</i><sub>2</sub>, <i>y</i><sub>2</sub>, ..., <i>x</i><sub>N</sub>, <i>y</i><sub>N</sub>]

In [None]:
# PLOTTING RECTANGLES ON RECOGNIZED TEXT

plt.figure(figsize=(25,25))
#image  = Image.open(BytesIO(requests.get(image_url).content))
#image = Image.open(image_path)
image = Image.open("blank_page_as_background.png")

ax     = plt.imshow(image)
for polygon in polygons:
    vertices = [(polygon[0][i] * 105, polygon[0][i+1] * 100) for i in range(0,len(polygon[0]),2)]
    text     = polygon[1]
    #print (text,vertices)
    patch    = Polygon(vertices, closed=True,fill=False, linewidth=2, color='y')
    ax.axes.add_patch(patch)
    plt.text(vertices[0][0], vertices[0][1], text, fontsize=8, va="top")
_ = plt.axis("off")


In [None]:
for polygon in polygons:
    vertices = [(polygon[0][i] , polygon[0][i+1] ) for i in range(0,len(polygon[0]),2)]
    text     = polygon[1]
    if "nspec" in text:
        print (text,vertices)