In [None]:
Brand Voice using Tuned foundation model

Your brand's voice is its soul - the way it speaks to the world. This nptebook will become your essential toolkit for crafting and refining
a distinct voice of the brand for all your content creation efforts. It's designed to be a living document, guiding you in translating 
abstract brand values into tangible communication

On Vertex AI, tuning allows you to customize a foundation model for more specific tasks or knowledge domains.

While the prompt design is excellent for quick experimentation, if training data (examples) is available, tuning a model enables you to
customize the model for the characteristics of brand you want to project

Objective

This tutorial teaches you how to tune a foundation model on new unseen data and you will use the following google cloud products:
    1. Vertex AI Generative AI Studio
    2. Vertex AI pipelines
    3. Vertex AI model registry
    4. Vertex AI Endpoints
    
This steps performed include 
    1. Upload training data 
    2. Create a pipeline job
    3. Inspect your model on Vertex AI Model Registry
    4. Get predictions from your tuned model
    
    
Quota

important: Tuning the text-bison@002 model uses the tpu-v3-8 training resources and the accompanying quotas rom your google
Cloud project. Each project has a default quota of eight v3-8 cores, which allows for one to two concurrent tuning jobs. If you want to run more concurrent jobs you need
to run more concurrent jobs you need to request additional quota via Quotas page

Costs

This tutorial uses billable components of Google Cloud:
    
    1. Vertex AI Generative AI Studio
    
Learn about Vertex AI pricing and use the Pricing Calculator to generate a cost estimate based on the projected usage

Install Vertex AI SDK

In [None]:
!pip install google-cloud-aiplatform --upgrade --user --quiet

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

Authenticating the notebook environment

1. while using colab, uncomment the cell below & then continue
2. While using Vertex AI Workbench, check instruction in https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env 

In [None]:
from google.colab import auth
auth.authenticate_user()

Set the Project ID

Update project ID using gcloud or use https://support.google.com/googleapi/answer/7014113

In [None]:
PROJECT_ID = "GOOGLE_CLOUD_PROJECT_HERE"

! gcloud config set project {PROJECT_ID}

Create the bucket

Now we have to create the bucket that we will store the tuning data. To avoid name collissions b/w users on
resources created, generate a UUID for each instance session and append it to the name of the resources 
created in this tutorial

In [None]:
import random
import string

#Generate a uuid of a specified length(default=8)
def generate_uuid(length: int = 8) -> str:
    return "".join(random.choices(string.ascii_lowercase + string.digits, k=length))

UUID = generate_uuid()

Choose a bucket name and update BUCKET_NAME parameter.

In [None]:
BUCKET_NAME = "genai-mkt-dev/tune-dataset"
BUCKET_URI = f"gs://{BUCKET_NAME}"
REGION = "us-central1"

In [None]:
if BUCKET_NAME == "" or BUCKET_NAME is None or BUCKET_NAME == "<BUCKET_NAME>":
    BUCKET_NAME = "vertex-" + UUID
    BUCKET_URI = f"gs://{BUCKET_NAME}"

Only if the bucket doesn't already exist: Run the following cell to create Cloud Storage bucket

In [None]:
! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI

Finally validate access to the Cloud Storage bucket by examining its contents

In [None]:
! gsutil ls -a1 $BUCKET_URI

Import Libraries

Colab only: Run the cell to initialize the Vertex AI SDK. In Vertex AI, it isn't required

In [None]:
import vertexai

vertexai.init(project=PROJECT_ID, location=REGION)

In [None]:
from typing import Union

import pandas as pd

from google.cloud import aiplatform
from vertexai.language_models import TextGenerationModel

Tune the Model

Now it's time to create a tuning job. Tune a foundation modelby creating a pipeline job using Generative AI Studio, cURL, or the Python SDK. Here we will be using Python SDK. We will be using a Q&A with a context dataset in JSON format

Training Data

Your model tuning dataset must be a JSONL format where each line ontains a single training example. You must make 
sure that you include instructions

Upload to cloud storage bucket and add filenames below

In [None]:
training_data_filename = "tune_data_brand_voice.json1"

In [None]:
evaluation_data_filename = "tune_eval_data_brand_voice.json1"

You can check to make sure that the files are available in your Google cloud storage bucket:

In [None]:
! gsutil ls -a1 $BUCKET_URI

In [None]:
TRAINING_DATA_URI = f"${BUCKET_URI}/{training_data_filename}"
EVALUATION_DATA_URI = f"${BUCKET_URI}/{evaluation_data_filename}"

Model Tuning

Now it's tim