# Batch inference with Gemma/PaliGemma with HF + GCP

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models, developed by Google DeepMind and other teams across Google. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications. This example showcases how to deploy any supported text-generation model, in this case [`google/gemma-7b-it`](https://huggingface.co/google/gemma-7b-it), from the Hugging Face Hub on Vertex AI using the TGI DLC available in Google Cloud Platform (GCP).

![`google/gemma-7b-it` in the Hugging Face Hub](./assets/model-in-hf-hub.png)

## Setup / Configuration

First, you need to install `gcloud` in your local machine, which is the command-line tool for Google Cloud, following the instructions at [Cloud SDK Documentation - Install the gcloud CLI](https://cloud.google.com/sdk/docs/install).

Then, you also need to install the `google-cloud-aiplatform` Python SDK, required to programmatically create the Vertex AI model, register it, acreate the endpoint, and deploy it on Vertex AI.

In [1]:
!pip install  google-cloud-aiplatform google-auth google-cloud-pipeline-components packaging tensorflow

Collecting pydantic<3 (from google-cloud-aiplatform)
  Using cached pydantic-1.10.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (152 kB)
Using cached pydantic-1.10.19-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB)
Installing collected packages: pydantic
  Attempting uninstall: pydantic
    Found existing installation: pydantic 2.10.2
    Uninstalling pydantic-2.10.2:
      Successfully uninstalled pydantic-2.10.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gradio 4.21.0 requires pydantic>=2.0, but you have pydantic 1.10.19 which is incompatible.[0m[31m
[0mSuccessfully installed pydantic-1.10.19

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m

Optionally, to ease the usage of the commands within this tutorial, you need to set the following environment variables for GCP:

In [2]:
%env PROJECT_ID=multimodal-representations
%env LOCATION=us-central1
#%env CONTAINER_URI=us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu121.2-2.ubuntu2204.py310:latest
#%env CONTAINER_URI=us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-llava-serve
%env CONTAINER_URI=us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu124.2-3.ubuntu2204.py311



env: PROJECT_ID=multimodal-representations
env: LOCATION=us-central1
env: CONTAINER_URI=us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu124.2-3.ubuntu2204.py311


Then you need to login into your GCP account and set the project ID to the one you want to use to register and deploy the models on Vertex AI.

In [3]:
!gcloud auth login
!gcloud config set project $PROJECT_ID

Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=32555940559.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&state=FrkP4LpKlMgwzrw1rhmHyXBB26Ke1B&access_type=offline&code_challenge=mSstYiivXM-2vjYq7f7e3ATcchSQkUjUwsFCB9Arux0&code_challenge_method=S256


You are now logged in as [daliumuwork@gmail.com].
Your current project is [multimodal-representations].  You can change this setting by running:
  $ gcloud config set project PROJECT_ID


To take a quick anonymous survey, run:
  $ gcloud survey

Updated property [core/project].


Once you are logged in, you need to enable the necessary service APIs in GCP, such as the Vertex AI API, the Compute Engine API, and Google Container Registry related APIs.

**Warning:** Make sure, manually, that these are disabled after running exps (even though we will explicitly write code to disable them)

In [4]:
!gcloud services enable aiplatform.googleapis.com
!gcloud services enable compute.googleapis.com
!gcloud services enable container.googleapis.com
!gcloud services enable containerregistry.googleapis.com
!gcloud services enable containerfilesystem.googleapis.com

In [13]:
# @title Setup Google Cloud project
# @markdown 1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

# @markdown 2. **[Optional]** [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) for storing experiment outputs. Set the BUCKET_URI for the experiment environment. The specified Cloud Storage bucket (`BUCKET_URI`) should be located in the same region as where the notebook was launched. Note that a multi-region bucket (eg. "us") is not considered a match for a single region covered by the multi-region range (eg. "us-central1"). If not set, a unique GCS bucket will be created instead.

BUCKET_URI = "gs://multimodal-representations-eval-data-central1/"  # @param {type:"string"}

# @markdown 3. **[Optional]** Set region. If not set, the region will be set automatically according to Colab Enterprise environment.

REGION = "us-central1"  # @param {type:"string"}

! git clone https://github.com/GoogleCloudPlatform/vertex-ai-samples.git

# Import the necessary packages
! pip install -q gradio==4.21.0
import datetime
import enum
import importlib
import io
import os
import re
import uuid
from typing import Sequence, Tuple

import gradio as gr
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from google.cloud import aiplatform
from PIL import Image

common_util = importlib.import_module(
    "vertex-ai-samples.community-content.vertex_model_garden.model_oss.notebook_util.common_util"
)

models, endpoints = {}, {}

# Get the default cloud project id.
PROJECT_ID = os.environ["PROJECT_ID"]

# Get the default region for launching jobs.
if not REGION:
    REGION = os.environ["LOCATION"]

# Enable the Vertex AI API and Compute Engine API, if not already.
print("Enabling Vertex AI API and Compute Engine API.")
! gcloud services enable aiplatform.googleapis.com compute.googleapis.com

# Cloud Storage bucket for storing the experiment artifacts.
# A unique GCS bucket will be created for the purpose of this notebook. If you
# prefer using your own GCS bucket, change the value yourself below.
now = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
BUCKET_NAME = "/".join(BUCKET_URI.split("/")[:3])

if BUCKET_URI is None or BUCKET_URI.strip() == "" or BUCKET_URI == "gs://":
    BUCKET_URI = f"gs://{PROJECT_ID}-tmp-{now}-{str(uuid.uuid4())[:4]}"
    BUCKET_NAME = "/".join(BUCKET_URI.split("/")[:3])
    ! gsutil mb -l {REGION} {BUCKET_URI}
else:
    assert BUCKET_URI.startswith("gs://"), "BUCKET_URI must start with `gs://`."
    shell_output = ! gsutil ls -Lb {BUCKET_NAME} | grep "Location constraint:" | sed "s/Location constraint://"
    bucket_region = shell_output[0].strip().lower()
    if bucket_region != REGION:
        raise ValueError(
            "Bucket region %s is different from notebook region %s"
            % (bucket_region, REGION)
        )
print(f"Using this GCS Bucket: {BUCKET_URI}")

STAGING_BUCKET = os.path.join(BUCKET_URI, "temporal")
MODEL_BUCKET = os.path.join(BUCKET_URI, "paligemma")


# Initialize Vertex AI API.
print("Initializing Vertex AI API.")
aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=STAGING_BUCKET)

# Gets the default SERVICE_ACCOUNT.
shell_output = ! gcloud projects describe $PROJECT_ID
project_number = shell_output[-1].split(":")[1].strip().replace("'", "")
SERVICE_ACCOUNT = f"{project_number}-compute@developer.gserviceaccount.com"
print("Using this default Service Account:", SERVICE_ACCOUNT)


# Provision permissions to the SERVICE_ACCOUNT with the GCS bucket
! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.admin $BUCKET_NAME

! gcloud config set project $PROJECT_ID
! gcloud projects add-iam-policy-binding --no-user-output-enabled {PROJECT_ID} --member=serviceAccount:{SERVICE_ACCOUNT} --role="roles/storage.admin"
! gcloud projects add-iam-policy-binding --no-user-output-enabled {PROJECT_ID} --member=serviceAccount:{SERVICE_ACCOUNT} --role="roles/aiplatform.user"

# @markdown ### Access PaliGemma models on Vertex AI for GPU based serving
# @markdown Accept the model agreement to access the models:
# @markdown 1. Open the [PaliGemma model card](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/363) from [Vertex AI Model Garden](https://cloud.google.com/model-garden).
# @markdown 1. Review and accept the agreement in the pop-up window on the model card page. If you have previously accepted the model agreement, there will not be a pop-up window on the model card page and this step is not needed.
# @markdown 1. After accepting the agreement of PaliGemma, a `gs://` URI containing PaliGemma pretrained models will be shared.
# @markdown 1. Paste the link in the `VERTEX_AI_MODEL_GARDEN_PALIGEMMA` field below.
# @markdown 1. The PaliGemma models will be copied into `BUCKET_URI`.
# @markdown The file transfer can take anywhere from 15 minutes to 30 minutes.
VERTEX_AI_MODEL_GARDEN_PALIGEMMA = "gs://vertex-model-garden-paligemma-us/paligemma"  # @param {type:"string", isTemplate:true}
assert (
    VERTEX_AI_MODEL_GARDEN_PALIGEMMA
), "Click the agreement of PaliGemma in Vertex AI Model Garden, and get the GCS path of PaliGemma model artifacts."
print(
    "Copying PaliGemma model artifacts from",
    VERTEX_AI_MODEL_GARDEN_PALIGEMMA,
    "to ",
    MODEL_BUCKET,
)

#! gsutil -m cp -R $VERTEX_AI_MODEL_GARDEN_PALIGEMMA/pt_224.npz $MODEL_BUCKET
! gsutil -m cp -R $VERTEX_AI_MODEL_GARDEN_PALIGEMMA/mix_224.npz os.path.join(BUCKET_URI, "mix_224.npz")
! gsutil -m cp -R $VERTEX_AI_MODEL_GARDEN_PALIGEMMA/pt_224.npz os.path.join(BUCKET_URI, "pt_224.npz")


model_path_prefix = MODEL_BUCKET
pretrained_filename_lookup = {
    "paligemma-224-float32": "pt_224.npz",
    "paligemma-224-float16": "pt_224.f16.npz",
    "paligemma-mix-224-float32": "mix_224.npz",
    "paligemma-mix-224-float16": "mix_224.f16.npz",
}


fatal: destination path 'vertex-ai-samples' already exists and is not an empty directory.
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Enabling Vertex AI API and Compute Engine API.
Operation "operations/acat.p2-841337720906-a8349369-bf69-42cf-9318-fa38ec919a93" finished successfully.
Using this GCS Bucket: gs://multimodal-representations-eval-data-central1/
Initializing Vertex AI API.
Using this default Service Account: 841337720906-compute@developer.gserviceaccount.com
No changes made to gs://multimodal-representations-eval-data-central1/
Updated property [core/project].
Copying PaliGemma model artifacts from gs://vertex-model-garden-paligemma-us/paligemma to  gs://multimodal-representations-eval-data-central1/paligemma
Copying gs://vertex-model-garden-paligemma-us/paligemma/mi

In [6]:
def deploy_model(
    model_name: str,
    checkpoint_path: str,
    machine_type: str = "g2-standard-32",
    accelerator_type: str = "NVIDIA_L4",
    accelerator_count: int = 1,
    resolution: int = 224,
) -> Tuple[aiplatform.Model, aiplatform.Endpoint]:
    """Create a Vertex AI Endpoint and deploy the specified model to the endpoint."""
    model_name_with_time = common_util.get_job_name_with_datetime(model_name)
    endpoint = aiplatform.Endpoint.create(
        display_name=f"{model_name_with_time}-endpoint"
    )
    model = aiplatform.Model.upload(
        display_name=model_name_with_time,
        serving_container_image_uri=SERVE_DOCKER_URI,
        serving_container_ports=[8080],
        serving_container_predict_route="/predict",
        serving_container_health_route="/health",
        serving_container_environment_variables={
            "CKPT_PATH": checkpoint_path,
            "RESOLUTION": resolution,
            "MODEL_ID": "google/" + model_name,
        },
    )
    print(
        f"Deploying {model_name_with_time} on {machine_type} with {accelerator_count} {accelerator_type} GPU(s)."
    )
    deployed_model = model.deploy(
        endpoint=endpoint,
        machine_type=machine_type,
        accelerator_type=accelerator_type,
        accelerator_count=accelerator_count,
        deploy_request_timeout=1800,
        service_account=SERVICE_ACCOUNT,
        enable_access_logging=True,
        min_replica_count=1,
        sync=True,
    )
    return deployed_model, endpoint

In [7]:
models = {}
endpoints = {}

In [11]:
# @title Deploy

# @markdown This section uploads the prebuilt PaliGemma model to Model Registry and deploys it to a Vertex AI Endpoint. It takes approximately 15 minutes to finish.

# @markdown Select the desired resolution and precision of prebuilt model to deploy, leaving the optional `custom_paligemma_model_uri` as is. Higher resolution and precision_type can result in better inference results, but may require additional GPU.

# @markdown You can also serve a finetuned PaliGemma model by setting `resolution` and `precision_type` to the resolution and precision type of the original base model and then setting `custom_paligemma_model_uri` to the GCS URI containing the model.

# @markdown **Note**: You cannot use accelerator type `NVIDIA_TESLA_V100` to serve prebuilt or finetuned PaliGemma models with resolution `896` and precision_type `float32`.

model_variant = "mix"  # @param ["mix", "pt"]
resolution = 224  # @param [224, 448, 896]
precision_type = "float32"  # @param ["float32", "float16", "bfloat16"]
custom_paligemma_model_uri = "gs://"#vertex-model-garden-paligemma-us/paligemma/mix_224.npz"  # @param {type: "string"}

if model_variant == "mix":
    model_name_prefix = "paligemma-mix"
else:
    model_name_prefix = "paligemma"

if custom_paligemma_model_uri == "gs://" or not custom_paligemma_model_uri:
    print("Deploying prebuilt PaliGemma model.")
    model_name = f"{model_name_prefix}-{resolution}-{precision_type}"
    checkpoint_filename = pretrained_filename_lookup[model_name]
    checkpoint_path = os.path.join(model_path_prefix, checkpoint_filename)
else:
    print("Deploying custom PaliGemma model.")
    model_name = f"{model_name_prefix}-{resolution}-{precision_type}"
    checkpoint_path = custom_paligemma_model_uri

# The pre-built serving docker image.
SERVE_DOCKER_URI = "us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/jax-paligemma-serve-gpu:20240807_0916_RC00"

# @markdown If you want to use other accelerator types not listed below, then check other Vertex AI prediction supported accelerators and regions at https://cloud.google.com/vertex-ai/docs/predictions/configure-compute. You may need to manually set the `machine_type`, `accelerator_type`, and `accelerator_count` in the code by clicking `Show code` first.
# @markdown Select the accelerator type to use to deploy the model:
accelerator_type = "NVIDIA_L4"  # @param ["NVIDIA_L4", "NVIDIA_TESLA_V100"]
machine_type = "g2-standard-32"
accelerator_count = 1


Deploying custom PaliGemma model.


In [9]:
def deploy(model_file):
    checkpoint_path = "gs://vertex-model-garden-paligemma-us/paligemma/{model_file}"
    return deploy_model(
        model_name=model_name,
        checkpoint_path=checkpoint_path,
        machine_type=machine_type,
        accelerator_type=accelerator_type,
        accelerator_count=accelerator_count,
        resolution=resolution,
    )

In [15]:

model_file = "pt_224.npz"
model_file = "mix_224.npz"
models[model_file], endpoint[model_file] = deploy_model(
        model_name=model_name,
        checkpoint_path=MODEL_BUCKET,
        machine_type=machine_type,
        accelerator_type=accelerator_type,
        accelerator_count=accelerator_count,
        resolution=resolution,
    )

Creating Endpoint
Create Endpoint backing LRO: projects/841337720906/locations/us-central1/endpoints/2625749215850004480/operations/2686509774589132800
Endpoint created. Resource name: projects/841337720906/locations/us-central1/endpoints/2625749215850004480
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/841337720906/locations/us-central1/endpoints/2625749215850004480')
Creating Model
Create Model backing LRO: projects/841337720906/locations/us-central1/models/1951489103279161344/operations/544837045541928960
Model created. Resource name: projects/841337720906/locations/us-central1/models/1951489103279161344@1
To use this Model in another session:
model = aiplatform.Model('projects/841337720906/locations/us-central1/models/1951489103279161344@1')
Deploying paligemma-mix-224-float32-20241128-150212 on g2-standard-32 with 1 NVIDIA_L4 GPU(s).
Deploying model to Endpoint : projects/841337720906/locations/us-central1/endpoints/2625749215850004480
Deploy En

FailedPrecondition: 400 Model server exited unexpectedly. Model server logs can be found at https://console.cloud.google.com/logs/viewer?project=841337720906&resource=aiplatform.googleapis.com%2FEndpoint&advancedFilter=resource.type%3D%22aiplatform.googleapis.com%2FEndpoint%22%0Aresource.labels.endpoint_id%3D%222625749215850004480%22%0Aresource.labels.location%3D%22us-central1%22.

In [None]:
eps = aiplatform.Endpoint.list()
endpoints[model_file] = eps[0]
endpoints

## Register model on Vertex AI

Once everything is set up, you can already initialize the Vertex AI session via the `google-cloud-aiplatform` Python SDK as follows:

## Evaluation task

Based on [https://huggingface.co/docs/google-cloud/main/examples/vertex-ai-notebooks-evaluate-llms-with-vertex-ai](https://huggingface.co/docs/google-cloud/main/examples/vertex-ai-notebooks-evaluate-llms-with-vertex-ai)

In [None]:
from datasets import load_dataset

dataset = load_dataset("fgqa_hs", split='test[:1000]')

dataset

In [None]:
dataset[0]

We must convert to a pandas dataset in order to use the Vertex Evaluation API

In [None]:
df = dataset.to_pandas()

In [None]:
from PIL import Image
df["img"] = df["image"].apply(lambda x: Image.open(x['path']))

In [None]:
#df['prompt'] = df.apply(lambda row: f"![]({row['img']}) answer en {row['question']}\n", axis=1)
df['prompt'] = df.apply(lambda row: (row['img'], row['question']), axis=1)

df['prompt'][1]

In [None]:
df['reference'] = df['answer']

Drop all columns that we do not need for the prediction task

In [None]:
def generate_paligemma(prompt, model='pt_224.npz'):
    answers = common_util.vqa_predict(endpoints[model],[prompt[1]] , prompt[0])
    return answers[0].lower()


In [None]:

image_url = "https://images.pexels.com/photos/4012966/pexels-photo-4012966.jpeg"  # @param {type:"string"}

image = common_util.download_image(image_url)
print(image)
display(image)

# @markdown You may leave question prompts empty and they will be ignored.
question_prompt_1 = "Which of laptop, book, pencil, clock, flower are in the image?"  # @param {type: "string"}
question_prompt_2 = "Do the book and the cup have the same color?"  # @param {type: "string"}
question_prompt_3 = "Is there a person in the image?"  # @param {type: "string"}
question_prompt_4 = "How many laptop are in the image?"  # @param {type: "string"}

# @markdown The question prompt can be non-English languages.
questions_list = [
    question_prompt_1,
    question_prompt_2,
    question_prompt_3,
    question_prompt_4,
]
questions = [question for question in questions_list if question]
print(endpoints)
answers = common_util.vqa_predict(endpoints[model_file], questions, image)

for question, answer in zip(questions, answers):
    print(f"Question: {question}")
    print(f"Answer: {answer}")
# @markdown Click "Show Code" to see more details.

In [None]:
generate_paligemma(df['prompt'][0])

In [None]:
from vertexai.evaluation import EvalTask
from vertexai.generative_models import (Part)
# 2. create eval task
eval_task = EvalTask(
        dataset=df,
        metrics=["exact_match"],
        experiment="multimodal-hypernym-semantics",
)

In [None]:
import uuid

# 3. run eval task
# Note: If the last iteration takes > 1 minute you might need to retry the evaluation
exp_results = eval_task.evaluate(
        model=generate_paligemma, experiment_run_name=f"test-gqa-{str(uuid.uuid4())[:8]}"
)

In [None]:
df

In [None]:

def predict(prompts, generation_config=generation_config):
    payloads = [prompt_to_payload(prompt, generation_config) for prompt in prompts]
    print(payloads)
    output = endpoint.predict(instances=payloads)
    generated_texts = output.predictions
    #print(output.predictions)
    return [pred.lower() for pred in generated_texts]

In [None]:
results = {}
print(exp_results.summary_metrics)
print(f"{exp_results.summary_metrics['exact_match/mean']}")
results["test"] = exp_results.summary_metrics["exact_match/mean"]

for prompt_name, score in sorted(results.items(), key=lambda x: x[1], reverse=True):
    print(f"{prompt_name}: {score}")

### Check predictions

In [None]:
df

In [None]:
exp_results.metrics_table[['question', 'response', 'reference', 'argument', 'substitution', 'exact_match/score']]
#exp_results.metrics_table[['question', 'response', 'reference', 'argument', 'substitution', 'exact_match/score', 'rouge/score']]

In [None]:
result_df.loc[:,'exact_match/score'] = (result_df['reference'] == result_df['response']).astype(int)

In [None]:
result_df = exp_results.metrics_table[['question', 'response', 'reference', 'argument', 'substitution', 'exact_match/score']]
result_df.loc[:,'exact_match/score'] = (result_df['reference'] == result_df['response']).astype(int)
#result_df = exp_results.metrics_table[['question', 'response', 'reference', 'argument', 'substitution', 'exact_match/score', 'rouge/score']]

In [None]:
result_df

In [None]:
sum(result_df['response'] == result_df['reference'])/len(result_df['reference'])

In [None]:
original_qs_df = result_df[result_df['substitution'] == ""]
sub_qs_df = result_df[result_df['substitution'] != ""]

In [None]:
original_qs_df

In [None]:
sub_qs_df

In [None]:
sum(original_qs_df['response'] == original_qs_df['reference'])/len(original_qs_df['reference'])

In [None]:
sum(sub_qs_df['response'] == sub_qs_df['reference'])/len(sub_qs_df['reference'])

In [None]:
result_df.to_csv('llava-100-test.csv')

In [None]:
aggregated_base_questions = result_df[result_df['substitution'] == ''].groupby('argument').agg({
    'exact_match/score': 'mean',
  #  'rouge/score': 'mean'
}).reset_index()

In [None]:
aggregated_base_questions

# Aggregating over all substitutions

In [None]:
aggregated_substitutions = result_df[result_df['substitution'] != ''].groupby('argument').agg({
    'exact_match/score': 'mean',
#    'rouge/score': 'mean'
}).reset_index()

In [None]:
aggregated_substitutions

In [None]:
import pandas as pd
aggregated_combined = aggregated_base_questions.rename(columns={'exact_match/score': 'base/exact_match/score',
                                                                #'rouge/score':'base/rouge/score'
                                                               })

# Merge the two dataframes on a common key (in this case, 'key')
aggregated_combined = pd.merge(aggregated_combined, aggregated_substitutions, on='argument', how='left')

# Fill empty values with 0.0
aggregated_combined = aggregated_combined.fillna(0.0)

In [None]:
aggregated_combined

In [None]:
print(exp_results.metrics_table['response'])

In [None]:
from IPython.display import Image

from PIL import Image as PImage

imgs = set([img['path'] for img in df['image'][:10]])

for image in imgs:
    #Image(filename=image['path'])
    img = mpimg.imread(image)
    plt.imshow(img)
    plt.show()
    img = image_to_base64(PImage.open(image))
    output = generate([img, "What is in the image?"])
    print(output)

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
image = mpimg.imread(image['path'])
plt.imshow(image)
plt.show()


# Batch inference

Below is example code from the GCP documentation found at (https://cloud.google.com/vertex-ai/docs/predictions/get-batch-predictions)[https://cloud.google.com/vertex-ai/docs/predictions/get-batch-predictions]

Also check (https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/generative_ai/batch_eval_llm.ipynb)[https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/generative_ai/batch_eval_llm.ipynb]
    

In [None]:
def create_batch_prediction_job_dedicated_resources_sample(
    model,
    job_display_name: str,
    gcs_source,
    gcs_destination: str,
    machine_type="g2-standard-24", #$0.8129 USD / hour
    accelerator_type="NVIDIA_L4", #$0.644046 USD / hour
    accelerator_count=2,
    instances_format: str = "jsonl",
    starting_replica_count: int = 1,
    max_replica_count: int = 1,
    sync: bool = True,
):


    batch_prediction_job = model.batch_predict(
        job_display_name=job_display_name,
        gcs_source=gcs_source,
        gcs_destination_prefix=gcs_destination,
        instances_format=instances_format,
        starting_replica_count=starting_replica_count,
        max_replica_count=max_replica_count,
        
        machine_type=machine_type,
        accelerator_type=accelerator_type,
        accelerator_count=accelerator_count,
        sync=sync,
    )

    batch_prediction_job.wait()

    print(batch_prediction_job.display_name)
    print(batch_prediction_job.resource_name)
    print(batch_prediction_job.state)
    return batch_prediction_job

In [None]:
batch_prediction_job = create_batch_prediction_job_dedicated_resources_sample(
    model,
        job_display_name="batch-llava-test-100",
        gcs_source="gs://multimodal-representations-eval-data/data.jsonl",
        gcs_destination="gs://multimodal-representations-eval-data/",
)


## Resource clean-up (DEFINITELY DO THIS)

Finally, you can already release the resources that you've created as follows, to avoid unnecessary costs:

* `deployed_model.undeploy_all` to undeploy the model from all the endpoints.
* `deployed_model.delete` to delete the endpoint/s where the model was deployed gracefully, after the `undeploy_all` method.
* `model.delete` to delete the model from the registry.

In [17]:
deployed_model.undeploy_all()
deployed_model.delete()
model.delete()

NameError: name 'deployed_model' is not defined

Alternatively, you can also remove those from the Google Cloud Console following the steps:

* Go to Vertex AI in Google Cloud
* Go to Deploy and use -> Online prediction
* Click on the endpoint and then on the deployed model/s to "Undeploy model from endpoint"
* Then go back to the endpoint list and remove the endpoint
* Finally, go to Deploy and use -> Model Registry, and remove the model

In [None]:
# Disable APIs

!gcloud services disable aiplatform.googleapis.com
!gcloud services disable compute.googleapis.com
!gcloud services disable container.googleapis.com
!gcloud services disable containerregistry.googleapis.com
!gcloud services disable containerfilesystem.googleapis.com

### PLEASE ALSO MANUALLY ENSURE ALL APIS ARE DISABLED ON GCP AFTER THIS IS DONE!


In [None]:
# Download an image from Google Cloud Storage
# Load from local file
from vertexai.generative_models import Image as V_Image

gen_model = GenerativeModel("paligemma")

image = V_Image.load_from_file(df['image'][0]['path'])

# Prepare contents
prompt = "Describe this image?"
contents = [image, prompt]

response = gen_model.generate_content(contents)

print("-------Prompt--------")
print_multimodal_prompt(contents)

print("\n-------Response--------")
print(response.text)

In [16]:

from google.cloud import aiplatform
endpoints = aiplatform.Endpoint.list()
for i in endpoints:
        i.undeploy_all()

# Alternatives for PaliGemma

https://ai.google.dev/gemma/docs/paligemma/inference-with-keras