# GC AutoML

Send images to Google Cloud AutoML to label as pathway or non.

[My Cloud Vision AutoML Models Dashboard](https://console.cloud.google.com/vision/models?project=api-project-453052878726&supportedpurview=project)

Documentation for:
* [AutoML Vision](https://cloud.google.com/vision/automl/docs) (top level)
* [Client Libraries](https://cloud.google.com/vision/automl/docs/client-libraries)
* [Python Client Library](https://googleapis.dev/python/automl/latest/index.html)
* [quickstart](https://cloud.google.com/vision/automl/docs/quickstart)
* [Code Samples](https://cloud.google.com/vision/automl/docs/samples)


In [5]:
import json
import os
import sys
from pathlib import Path

import magic
from google.cloud import automl
from google.protobuf.json_format import MessageToDict

In [6]:
google_application_credentials_path = Path(
    Path.home(),
    ".credentials/api-project-453052878726-f42cadc718aa.json",
)

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = str(
    google_application_credentials_path
)

In [7]:
project_id = "api-project-453052878726"

# the model named pfocr_20191102_single_po_10k
model_id = "ICN8336211288774410240"

prediction_client = automl.PredictionServiceClient()

# Get the full path of the model.
model_full_id = automl.AutoMlClient.model_path(
    project_id, "us-central1", model_id
)

Deploy the model here:
https://console.cloud.google.com/vision/models?project=api-project-453052878726&supportedpurview=project
 
(When you're done using it, remember to remove the model deployment.)

## Label All

In [12]:
target_date = "20210513"
images_dir = Path(f"../data/images/{target_date}")

forward_classified_images_count_path = Path(
    f"../data/images/{target_date}/forward_classified_images_count.log"
)

figure_paths = list()
for ext in ("*.jpg", "*.jpeg", "*.png"):
    for f in images_dir.rglob(ext):
        figure_paths.append(f)
total_figure_path_count = len(figure_paths)
print(f"total_figure_path_count: {total_figure_path_count}")

i = 0
invalid_count = 0
for figure_path in figure_paths:
    automl_output_path = figure_path.with_name(
        f"{figure_path.stem}_automl.json"
    )

    # don't do the same figure more than once
    if automl_output_path.exists():
        continue

    filetype = magic.from_file(str(figure_path))
    if "JPEG image data" not in filetype:
        print(f"Skipping {str(figure_path)}. Not a valid JPG.")
        print(filetype)
        continue

    # Read the file.
    with open(figure_path, "rb") as content_file:
        content = content_file.read()

    image = automl.Image(image_bytes=content)
    payload = automl.ExamplePayload(image=image)

    # params is additional domain-specific parameters.
    # score_threshold is used to filter the result
    # https://cloud.google.com/automl/docs/reference/rpc/google.cloud.automl.v1#predictrequest
    # params = {"score_threshold": "0.8"}
    params = {}

    request = automl.PredictRequest(
        name=model_full_id, payload=payload, params=params
    )
    try:
        response = prediction_client.predict(request=request)

        #    print("Prediction results:")
        #    for result in response.payload:
        #        print("Predicted class name: {}".format(result.display_name))
        #        print("Predicted class score: {}".format(result.classification.score))

        payloads = [MessageToDict(x._pb) for x in response.payload]

        if len(payloads) != 1:
            print(paylods)
            raise Exception(
                f"Got an unexpected number of payloads: {len(payloads)}"
            )

        with automl_output_path.open("w", encoding="utf8") as f:
            json.dump(payloads[0], f, ensure_ascii=False)

        if (i % 100) == 0:
            print(f"{i}")
        i += 1
        with open(forward_classified_images_count_path, "w") as f:
            f.write(f"{i} of {total_figure_path_count}\n")
    except:
        e = sys.exc_info()[0]
        print(f"failed for {str(figure_path)}")
        print("<p>Error: %s</p>" % e)

print(f"Figures classified in last run: {i}")

total_figure_path_count: 124447
failed for ../data/images/20210513/PMC7404177__ijms-21-05147-g002.jpg
<p>Error: <class 'google.api_core.exceptions.InvalidArgument'></p>
failed for ../data/images/20210513/PMC8009799__467_2020_4588_Fig2_HTML.jpg
<p>Error: <class 'google.api_core.exceptions.InvalidArgument'></p>
failed for ../data/images/20210513/PMC7753985__JCMM-24-13949-g002.jpg
<p>Error: <class 'google.api_core.exceptions.InvalidArgument'></p>
failed for ../data/images/20210513/PMC7273476__BMRI2020-7532306.005.jpg
<p>Error: <class 'google.api_core.exceptions.InvalidArgument'></p>
failed for ../data/images/20210513/PMC6435557__bsr-38-bsr20180598-g5.jpg
<p>Error: <class 'google.api_core.exceptions.InvalidArgument'></p>
failed for ../data/images/20210513/PMC7932688__elife-65552-fig1.jpg
<p>Error: <class 'google.api_core.exceptions.InvalidArgument'></p>
Figures classified in last run: 0


When you're done using it, remove the model deployment here:
https://console.cloud.google.com/vision/models?project=api-project-453052878726&supportedpurview=project