<a href="https://colab.research.google.com/github/0x-duelker/0x-duelker/blob/main/notebooks/community/model_garden/model_garden_pytorch_wizard_coder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Vertex AI Model Garden - WizardCoder

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_pytorch_wizard_coder.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_pytorch_wizard_coder.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/notebooks/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/community/model_garden/model_garden_pytorch_wizard_coder.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
Open in Vertex AI Workbench
    </a> (A Python-3 CPU notebook is recommended)
  </td>
</table>

## Overview

This notebook demonstrates the way to serve [WizardCoder](https://huggingface.co/WizardLM), a large pretrained language model that is capable of following complicated instructions to generate Python code.

WizardCoder [model checkpoints](https://huggingface.co/WizardLM/WizardCoder-Python-7B-V1.0) are available on the Hugging Face Hub.

In this notebook, we will show you how to:
- Deploy prebuilt WizardCoder models on Vertex AI
- Send prompts to the serving endpoint


### Costs

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage pricing](https://cloud.google.com/storage/pricing), and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

## Before you begin

**NOTE**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.

### Colab only
Run the following commands for Colab and skip this section if you are using Workbench.

In [1]:
import sys

if "google.colab" in sys.modules:
    ! pip3 install --upgrade google-cloud-aiplatform
    from google.colab import auth as google_auth

    google_auth.authenticate_user()

    # Restart the notebook kernel after installs.
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

Collecting google-cloud-aiplatform
  Downloading google_cloud_aiplatform-1.81.0-py2.py3-none-any.whl.metadata (32 kB)
Downloading google_cloud_aiplatform-1.81.0-py2.py3-none-any.whl (7.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.3/7.3 MB[0m [31m30.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: google-cloud-aiplatform
  Attempting uninstall: google-cloud-aiplatform
    Found existing installation: google-cloud-aiplatform 1.79.0
    Uninstalling google-cloud-aiplatform-1.79.0:
      Successfully uninstalled google-cloud-aiplatform-1.79.0
Successfully installed google-cloud-aiplatform-1.81.0


### Setup Google Cloud project

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

1. [Enable the Vertex AI API and Compute Engine API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component).

1. [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) for storing experiment outputs.

1. [Create a service account](https://cloud.google.com/iam/docs/service-accounts-create#iam-service-accounts-create-console) with `Vertex AI User` and `Storage Object Admin` roles for deploying fine tuned model to Vertex AI endpoint.

Fill following variables for experiments environment:

In [1]:
# Cloud project id.
PROJECT_ID = "massive-physics-451614-c2"  # @param {type:"string"}

# The region you want to launch jobs in.
REGION = "europe-west3"  # @param {type:"string"}

# The Cloud Storage bucket for storing experiments output.
BUCKET_URI = "gs://wizcoder_vertex_csb"  # @param {type:"string"}

! gcloud config set project $PROJECT_ID

# The service account looks like:
# '@.iam.gserviceaccount.com'
# Please go to https://cloud.google.com/iam/docs/service-accounts-create#iam-service-accounts-create-console
# and create service account with `Vertex AI User` and `Storage Object Admin` roles.
# The service account for deploying fine tuned model.
SERVICE_ACCOUNT = "vertexai@massive-physics-451614-c2.iam.gserviceaccount.com"  # @param {type:"string"}

# The serving port.
SERVE_PORT = 7080

Updated property [core/project].


### Initialize Vertex AI API

In [2]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

### Define constants

In [3]:
# The pre-built vllm serving docker image.
VLLM_DOCKER_URI = "us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20231127_0916_RC00"

### Define utility functions

In [4]:
from datetime import datetime
from typing import Tuple


def get_job_name_with_datetime(prefix: str) -> str:
    """Gets the job name with date time when triggering training or deployment
    jobs in Vertex AI.
    """
    return prefix + datetime.now().strftime("_%Y%m%d_%H%M%S")


def deploy_model(
    model_name: str,
    model_id: str,
    service_account: str,
    machine_type="g2-standard-4",
    accelerator_type="NVIDIA_L4",
    accelerator_count=1,
) -> Tuple[aiplatform.Model, aiplatform.Endpoint]:
    """Deploys trained models into Vertex AI."""
    endpoint = aiplatform.Endpoint.create(display_name=f"{model_name}-endpoint")

    dtype = "bfloat16"
    if accelerator_type in ["NVIDIA_TESLA_T4", "NVIDIA_TESLA_V100"]:
        dtype = "float16"

    vllm_args = [
        "--host=0.0.0.0",
        "--port=7080",
        f"--model={model_id}",
        f"--tensor-parallel-size={accelerator_count}",
        "--swap-space=16",
        f"--dtype={dtype}",
        "--gpu-memory-utilization=0.9",
        "--max-num-batched-tokens=16384",
        "--disable-log-stats",
    ]
    serving_env = {
        "MODEL_ID": "WizardLM/WizardCoder",
        "DEPLOY_SOURCE": "notebook",
    }
    model = aiplatform.Model.upload(
        display_name=model_name,
        serving_container_image_uri=VLLM_DOCKER_URI,
        serving_container_ports=[SERVE_PORT],
        serving_container_command=["python", "-m", "vllm.entrypoints.api_server"],
        serving_container_args=vllm_args,
        serving_container_predict_route="/generate",
        serving_container_health_route="/ping",
        serving_container_environment_variables=serving_env,
    )
    model.deploy(
        endpoint=endpoint,
        machine_type=machine_type,
        accelerator_type=accelerator_type,
        accelerator_count=accelerator_count,
        deploy_request_timeout=1800,
        service_account=service_account,
        system_labels={
            "NOTEBOOK_NAME": "model_garden_pytorch_wizard_coder.ipynb"
        },
    )
    return model, endpoint

### Run inferences with serving images
This section uploads the model to Model Registry and deploys it on the Endpoint.

The model deployment step will take around 20 (for the 13B model) minutes to complete, since it needs to download the model weights.

In [5]:
model_id = "WizardLM/WizardCoder-Python-34B-V1.0"  # @param ["WizardLM/WizardCoder-Python-7B-V1.0", "WizardLM/WizardCoder-Python-13B-V1.0", "WizardLM/WizardCoder-Python-34B-V1.0"]

# Serve WizardCoder-Python-7B model with 1 L4 GPU.
machine_type = "g2-standard-8"
accelerator_type = "NVIDIA_L4"
accelerator_count = 1

# Serve WizardCoder-Python-13B model with 2 L4 GPUs.
# machine_type = "g2-standard-24"
# accelerator_type = "NVIDIA_L4"
# accelerator_count = 2

# Serve WizardCoder-Python-34B model with 4 L4 GPUs.
# machine_type = "g2-standard-48"
# accelerator_type = "NVIDIA_L4"
# accelerator_count = 4

In [6]:
!pip install --upgrade google-cloud-sdk
!gcloud auth application-default login


[31mERROR: Could not find a version that satisfies the requirement google-cloud-sdk (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for google-cloud-sdk[0m[31m
[0m
You are running on a Google Compute Engine virtual machine.
The service credentials associated with this virtual machine
will automatically be used by Application Default
Credentials, so it is not necessary to use this command.

If you decide to proceed anyway, your user credentials may be visible
to others with access to this virtual machine. Are you sure you want
to authenticate with your personal account?

Do you want to continue (Y/n)?  y

Go to the following link in your browser, and complete the sign-in prompts:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com&redirect_uri=https%3A%2F%2Fsdk.cloud.google.com%2Fapplicationdefaultauthcode.html&scope=openid+https%3A%2F%2Fwww.googleapis.com%2

In [7]:
!gcloud auth list
!gcloud config set account 'experimentalprocedure@gmail.com'

         Credentialed Accounts
ACTIVE  ACCOUNT
*       experimentalprocedure@gmail.com

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

Updated property [core/account].


In [8]:
from google.colab import auth
auth.authenticate_user()

In [9]:
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "massive-physics-451614-c2-21c06bd29777.json"


In [10]:
import os
from google.cloud import storage
from google.oauth2 import service_account

# Replace 'path/to/your/keyfile.json' with the actual path to your service account key file
credentials = service_account.Credentials.from_service_account_file(
    '/content/massive-physics-451614-c2-21c06bd29777.json'
)

# Initialize the Storage client with the credentials
client = storage.Client(credentials=credentials)

# List buckets
buckets = list(client.list_buckets())
print(buckets)

Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b?project=massive-physics-451614-c2&projection=noAcl&prettyPrint=false: vertexai@massive-physics-451614-c2.iam.gserviceaccount.com does not have storage.buckets.list access to the Google Cloud project. Permission 'storage.buckets.list' denied on resource (or it may not exist).

In [11]:
from google.cloud import storage

# Initialize a client and list buckets as a test
client = storage.Client()
buckets = list(client.list_buckets())
print(buckets)


Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b?project=massive-physics-451614-c2&projection=noAcl&prettyPrint=false: vertexai@massive-physics-451614-c2.iam.gserviceaccount.com does not have storage.buckets.list access to the Google Cloud project. Permission 'storage.buckets.list' denied on resource (or it may not exist).

In [12]:
#Use the service account key for authentication:
import os
from google.oauth2 import service_account

# Replace 'your-key-file.json' with the name of your uploaded key file
key_path = "massive-physics-451614-c2-21c06bd29777.json"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = key_path


In [13]:
#Run this command to verify that authentication is working correctly:
from google.cloud import aiplatform

# Initialize Vertex AI with your project and region
aiplatform.init(project="massive-physics-451614-c2", location="europe-west3")

# List all endpoints as a test
endpoints = aiplatform.Endpoint.list()
print(endpoints)
#Replace `YOUR_PROJECT_ID` and `YOUR_REGION` with your Google Cloud project ID and region (e.g., `us-central1`).

[<google.cloud.aiplatform.models.Endpoint object at 0x7cf6e3d92bd0> 
resource name: projects/268546954473/locations/europe-west3/endpoints/7633684931276701696]


In [14]:
model, endpoint = deploy_model(
    model_name=get_job_name_with_datetime(prefix="wizard-lm-serve"),
    model_id=model_id,
    service_account=SERVICE_ACCOUNT,
    machine_type=machine_type,
    accelerator_type=accelerator_type,
    accelerator_count=accelerator_count,
)
print("endpoint_name:", endpoint.name)

INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/268546954473/locations/europe-west3/endpoints/5231014530074542080/operations/6817044110222819328
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/268546954473/locations/europe-west3/endpoints/5231014530074542080
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/268546954473/locations/europe-west3/endpoints/5231014530074542080')
INFO:google.cloud.aiplatform.models:Creating Model
INFO:google.cloud.aiplatform.models:Create Model backing LRO: projects/268546954473/locations/europe-west3/models/2317374687165808640/operations/8266077290329276416
INFO:google.cloud.aiplatform.models:Model created. Resource name: projects/268546954473/locations/europe-west3/models/2317374687165808640@1
INFO:google.cloud.aiplatform.models:To use this Model

InvalidArgument: 400 Machine type "g2-standard-8" is not supported.

Once deployment succeeds, you can send text prompts to the endpoint.

In [None]:
# Load an existing endpoint as below.
# endpoint_name = endpoint.name
# aip_endpoint_name = (
#     f"projects/{PROJECT_ID}/locations/{REGION}/endpoints/{endpoint_name}"
# )
# endpoint = aiplatform.Endpoint(aip_endpoint_name)

prompt = "Check if the string is a palindrome."  # @param {type:"string"}
instance = {
    "prompt": f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{prompt}\n\n### Response:",
    "temperature": 0.1,
    "top_p": 1.0,
    "max_tokens": 512,
}
response = endpoint.predict(instances=[instance])

print(response.predictions)

### Clean up resources

In [None]:
# Undeploy model and delete endpoint.
endpoint.delete(force=True)

# Delete models.
model.delete()