# AutoML image classification model


This notebook was written in Jupyter notebooks on the Vertex AI Workbench. You may need to adjust the code to run it in other contexts (e.g. Colab).

**Important:** Using Vertex AI can run up significant costs. Be sure to estimate your resource usage before running these notebooks.

You can find the dataset used in this model on [Kaggle.com](https://www.kaggle.com/abhinavnayak/catsvdogs-transformed).


## Install libraries, import libraries, initialize AI Platform

In [1]:
! pip install google-cloud-aiplatform google-cloud-storage

Collecting google-cloud-storage
  Downloading google_cloud_storage-1.44.0-py2.py3-none-any.whl (106 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.8/106.8 KB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Installing collected packages: google-cloud-storage
  Attempting uninstall: google-cloud-storage
    Found existing installation: google-cloud-storage 2.0.0
    Uninstalling google-cloud-storage-2.0.0:
      Successfully uninstalled google-cloud-storage-2.0.0
Successfully installed google-cloud-storage-1.44.0


In [2]:
from google.cloud import aiplatform as aip
from google.cloud import storage

In [9]:
PROJECT_ID = !gcloud config get-value project
PROJECT_ID = PROJECT_ID[0]

print(PROJECT_ID)

LOCATION = "us-central1"

aip.init(project=PROJECT_ID, location=LOCATION)

aggie-data-science-demo


## Define and import dataset

1. Create manifest file.
2. Save manifest file to Google Cloud Storage.
3. Import dataset.

Dataset uses the JSONL format. Each line must have the following JSON object:

```
{
    "imageGcsUri": "GCS_URI",
    "classificationAnnotation": {
        "displayName": "LABEL"
    }
}
```

In [13]:
# Create our training data
training_data = []

for n in range(1, 1001):
    training_data.append({
        "imageGcsUri": f"gs://2022-03-01-aggie-demo/train_transformed/cat{n}.jpg",
        "classificationAnnotation": {
            "displayName": "cat"
        }
    })
    training_data.append({
        "imageGcsUri": f"gs://2022-03-01-aggie-demo/train_transformed/dog{n}.jpg",
        "classificationAnnotation": {
            "displayName": "dog"
        }
    })


In [17]:
# Save the training data as file in Cloud Storage
bucket = storage.bucket("2022-03-01-aggie-demo")
input_str = "\n".join([str(d) for d in training_data])
file_blob = bucket.blob("training_data.jsonl")
file_blob.upload_from_string(input_str)

In [18]:
# Create the image classification dataset
dataset = aip.ImageDataset.create(
    display_name="2022-03-01-aggie-demo",
    gcs_source=["gs://2022-03-01-aggie-demo/training_data.jsonl"],
    import_schema_uri=aip.schema.dataset.ioformat.image.single_label_classification,
    sync=True
)

dataset.wait()

INFO:google.cloud.aiplatform.datasets.dataset:Creating ImageDataset
INFO:google.cloud.aiplatform.datasets.dataset:Create ImageDataset backing LRO: projects/1017734441775/locations/us-central1/datasets/6929302598151307264/operations/5134214144840433664
INFO:google.cloud.aiplatform.datasets.dataset:ImageDataset created. Resource name: projects/1017734441775/locations/us-central1/datasets/6929302598151307264
INFO:google.cloud.aiplatform.datasets.dataset:To use this ImageDataset in another session:
INFO:google.cloud.aiplatform.datasets.dataset:ds = aiplatform.ImageDataset('projects/1017734441775/locations/us-central1/datasets/6929302598151307264')
INFO:google.cloud.aiplatform.datasets.dataset:Importing ImageDataset data: projects/1017734441775/locations/us-central1/datasets/6929302598151307264
INFO:google.cloud.aiplatform.datasets.dataset:Import ImageDataset data backing LRO: projects/1017734441775/locations/us-central1/datasets/6929302598151307264/operations/2675037642063609856
INFO:googl

## Train the model

In [None]:
img_dataset_id = ""
img_dataset = aip.ImageDataset(img_dataset_id)

In [20]:
# Create the training job
job = aip.AutoMLImageTrainingJob(
    display_name="cats-and-dogs-training",
    model_type="CLOUD",
    prediction_type="classification",
    multi_label=False,
)

model = job.run(
    dataset=dataset,
    model_display_name="cats-and-dogs-model",
    budget_milli_node_hours=8000,
    disable_early_stopping=False,
    sync=True
)

model.wait()

INFO:google.cloud.aiplatform.training_jobs:No dataset split provided. The service will use a default split.
INFO:google.cloud.aiplatform.training_jobs:View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/3329944631998676992?project=1017734441775
INFO:google.cloud.aiplatform.training_jobs:AutoMLImageTrainingJob projects/1017734441775/locations/us-central1/trainingPipelines/3329944631998676992 current state:
PipelineState.PIPELINE_STATE_PENDING
INFO:google.cloud.aiplatform.training_jobs:AutoMLImageTrainingJob projects/1017734441775/locations/us-central1/trainingPipelines/3329944631998676992 current state:
PipelineState.PIPELINE_STATE_PENDING
INFO:google.cloud.aiplatform.training_jobs:AutoMLImageTrainingJob projects/1017734441775/locations/us-central1/trainingPipelines/3329944631998676992 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLImageTrainingJob projects/1017734441775/locations/us-central1/tr

## Get an online prediction

1. Create an endpoint.
2. Deploy our model to the endpoint.
3. Get prediction!

In [21]:
# Create an endpoint and deploy your model to it
endpoint = model.deploy(
    deployed_model_display_name="cats-dogs-endpoint", sync=True
)

INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/1017734441775/locations/us-central1/endpoints/1143113860887085056/operations/3628604494415134720
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/1017734441775/locations/us-central1/endpoints/1143113860887085056
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/1017734441775/locations/us-central1/endpoints/1143113860887085056')
INFO:google.cloud.aiplatform.models:Deploying model to Endpoint : projects/1017734441775/locations/us-central1/endpoints/1143113860887085056
INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/1017734441775/locations/us-central1/endpoints/1143113860887085056/operations/8240290512842522624
INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/101773

In [25]:
# Get a prediction!
import base64

with open("brave_small.jpg", "rb") as f:
    file_content = f.read()
    
encoded_content = base64.b64encode(file_content).decode("utf-8")
response = endpoint.predict(instances=[{"content": encoded_content}])

for prediction_ in response.predictions:
    ids = prediction_["ids"]
    display_names = prediction_["displayNames"]
    confidence_scores = prediction_["confidences"]
    for count, id in enumerate(ids):
        print(f"Prediction ID: {id}")
        print(f"Prediction display name: {display_names[count]}")
        print(f"Prediction confidence score: {confidence_scores[count]}")

Prediction ID: 1168482342918946816
Prediction display name: cat
Prediction confidence score: 1.0
Prediction ID: 5780168361346334720
Prediction display name: dog
Prediction confidence score: 4.0125416e-12
