<a href="https://colab.research.google.com/github/vineet96/vertex-ai-class/blob/main/Covid_Detection_Image_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



**Installation**
Install the latest version of Vertex AI SDK for Python.


In [1]:
import os

! pip3 install --upgrade --quiet google-cloud-aiplatform


**Only for Colab**

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

**Setting up the GCP environment variable**

**Setup Project ID**

In [None]:
PROJECT_ID = ""  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

**Setup GCP Region**

In [2]:
REGION = "us-central1"  # @param {type: "string"}

**Authenticate your Google Cloud account**

 1. Local JupyterLab instance, uncomment and run:

In [None]:
# ! gcloud auth login

2. For Colab, run:

In [3]:
from google.colab import auth
auth.authenticate_user()

### Set up variables

Next, set up some variables used throughout the tutorial.
### Import libraries and define constants

In [4]:
import google.cloud.aiplatform as aiplatform

## Initialize Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [5]:
aiplatform.init(project=PROJECT_ID)

**So far we have coded scafloding for using Vertex Python SDK.. Its time for actual model training**

**Create the Dataset** dataset = aiplatform.ImageDataset('dataset-id')

In [6]:
dataset = aiplatform.ImageDataset('your-detaset-id')

### Create and run training pipeline

To train an AutoML model, you perform two steps: 1) create a training pipeline, and 2) run the pipeline.

#### Create training pipeline

An AutoML training pipeline is created with the `AutoMLImageTrainingJob` class, with the following parameters:

- `display_name`: The human readable name for the `TrainingJob` resource.
- `prediction_type`: The type task to train the model for.
  - `classification`: An image classification model.
  - `object_detection`: An image object detection model.
- `multi_label`: If a classification task, whether single (`False`) or multi-labeled (`True`).
- `model_type`: The type of model for deployment.
  - `CLOUD`: Deployment on Google Cloud
  - `CLOUD_HIGH_ACCURACY_1`: Optimized for accuracy over latency for deployment on Google Cloud.
  - `CLOUD_LOW_LATENCY_`: Optimized for latency over accuracy for deployment on Google Cloud.
  - `MOBILE_TF_VERSATILE_1`: Deployment on an edge device.
  - `MOBILE_TF_HIGH_ACCURACY_1`:Optimized for accuracy over latency for deployment on an edge device.
  - `MOBILE_TF_LOW_LATENCY_1`: Optimized for latency over accuracy for deployment on an edge device.
- `base_model`: (optional) Transfer learning from existing `Model` resource -- supported for image classification only.

The instantiated object is the DAG (directed acyclic graph) for the training job.

In [None]:
dag = aiplatform.AutoMLImageTrainingJob(
    display_name="covidmodel",
    prediction_type="classification",
    multi_label=False,
    model_type="CLOUD",
    base_model=None,
)

print(dag)

#### Run the training pipeline

Next, you run the DAG to start the training job by invoking the method `run`, with the following parameters:

- `dataset`: The `Dataset` resource to train the model.
- `model_display_name`: The human readable name for the trained model.
- `training_fraction_split`: The percentage of the dataset to use for training.
- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).
- `validation_fraction_split`: The percentage of the dataset to use for validation.
- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).
- `disable_early_stopping`: If `True`, training maybe completed before using the entire budget if the service believes it cannot further improve on the model objective measurements.

The `run` method when completed returns the `Model` resource.

The execution of the training pipeline will take upto 20 minutes.

In [None]:
model = dag.run(
    dataset=dataset,
    model_display_name="covidmodel",
    training_fraction_split=0.8,
    validation_fraction_split=0.1,
    test_fraction_split=0.1,
    budget_milli_node_hours=8000,
    disable_early_stopping=False,
)

**Deploy the model**
To deploy the model, you invoke the deploy method.

In [None]:
endpoint = model.deploy()

**Online Prediction**

**Test Image/X-Ray** 

In [None]:
test_xray1 = 'gs://your bucket/Covid_110.png'

**Test Image/X-Ray**

In [21]:
test_xray2 = 'gs://your bucket/Covid_128.png'

**Importing tensorflow funtion to read the content of the image stored in Cloud Storage Bucket.**

In [22]:
import base64

import tensorflow as tf

with tf.io.gfile.GFile(test_xray1, "rb") as f:
    content = f.read()

# The format of each instance should conform to the deployed model's prediction input schema.
instances = [{"content": base64.b64encode(content).decode("utf-8")}]

In [None]:
prediction = endpoint.predict(instances=instances)

print(prediction)

In [None]:
import base64

import tensorflow as tf

with tf.io.gfile.GFile(test_xray2, "rb") as f:
    content = f.read()

# The format of each instance should conform to the deployed model's prediction input schema.
instances = [{"content": base64.b64encode(content).decode("utf-8")}]

prediction = endpoint.predict(instances=instances)

print(prediction)

**Batch Prediction**

**Make the batch input file**
Now make a batch input file, which you will store in your local Cloud Storage bucket. The batch input file can be either CSV or JSONL. You will use JSONL in this tutorial. For JSONL file, you make one dictionary entry per line for each data item (instance). The dictionary contains the key/value pairs:

content: The Cloud Storage path to the image.
mime_type: The content type. In our example, it is a png file.
For example:

      {'content': '[your-bucket]/file1.jpg', 'mime_type': 'png'}

In [30]:
BUCKET_URI = 'gs://your bucket'

In [31]:
test_xray1 = 'gs://your bucket/Covid_16.png'
test_xray2 = 'gs://your bucket/Covid_17.png'
test_xray3 = 'gs://your bucket/Covid_18.png'
test_xray4 = 'gs://your bucket/Normal_15.jpeg'
test_xray5 = 'gs://your bucket/Normal_16.jpeg'

In [None]:
import json

import tensorflow as tf

gcs_input_uri = BUCKET_URI + "/test.jsonl"
with tf.io.gfile.GFile(gcs_input_uri, "w") as f:
    data = {"content": test_xray1, "mime_type": "image/png"}
    f.write(json.dumps(data) + "\n")
    data = {"content": test_xray2, "mime_type": "image/png"}
    f.write(json.dumps(data) + "\n")
    data = {"content": test_xray3, "mime_type": "image/png"}
    f.write(json.dumps(data) + "\n")
    data = {"content": test_xray4, "mime_type": "image/jpeg"}
    f.write(json.dumps(data) + "\n")
    data = {"content": test_xray5, "mime_type": "image/jpeg"}
    f.write(json.dumps(data) + "\n")
    

print(gcs_input_uri)
! gsutil cat $gcs_input_uri

### Make the batch prediction request

Now that your Model resource is trained, you can make a batch prediction by invoking the batch_predict() method, with the following parameters:

- `job_display_name`: The human readable name for the batch prediction job.
- `gcs_source`: A list of one or more batch request input files.
- `gcs_destination_prefix`: The Cloud Storage location for storing the batch prediction resuls.
- `sync`: If set to True, the call will block while waiting for the asynchronous batch job to complete.








In [None]:
batch_predict_job = model.batch_predict(
    job_display_name="covidprediction",
    gcs_source=gcs_input_uri,
    gcs_destination_prefix=BUCKET_URI,
    sync=False,
)

print(batch_predict_job)

In [None]:
batch_predict_job.wait()

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial.

In [None]:


# Delete the dataset using the Vertex dataset object
#dataset.delete()

endpoint.undeploy_all()

endpoint.delete()

# Delete the model using the Vertex model object
model.delete()

# Delete the AutoML trainig job
dag.delete()

# Delete the batch prediction job
batch_predict_job.delete()

