In [None]:
!gcloud auth application-default login


# Testing Google Colab Enterprise File Upload and Vertex AI Call

In this notebook, we'll walk through a process of uploading a file via Google Colab, processing the file content with Vertex AI, and saving the result to Google BigQuery.

## Steps:
1. Install necessary dependencies.
2. Authenticate with Google Colab.
3. Define and create the BigQuery table schema.
4. Initialize Vertex AI.
5. Load the pre-trained model from Vertex AI.
6. Define the function to expand acronyms using the model.
7. Upload the file.
8. Process the file content.
9. Save the result to BigQuery.

Let's get started!

## Step 1: Install Dependencies
We'll need the `pandas-gbq` library to interact with BigQuery.


In [None]:
!pip install pandas-gbq


[0m

## Step 2: Authenticate with Google Colab


In [None]:
from google.colab import auth as google_auth

# Google Colab authentication
google_auth.authenticate_user()


## Step 3: Define and Create BigQuery Table Schema


In [None]:
from google.cloud import bigquery

# Define table schema
client = bigquery.Client(project='<<project_id>>')

schema = [
    bigquery.SchemaField("Original Column Name", "STRING", mode="REQUIRED"),
    bigquery.SchemaField("Expanded Column Name", "STRING", mode="REQUIRED")
]

# Create table if it doesn't exist
table_id = f"<<table_id>>"
table = bigquery.Table(table_id, schema=schema)
table = client.create_table(table, exists_ok=True)  # Creates table if it doesn't exist


## Step 4: Initialize Vertex AI


In [None]:
import vertexai
from vertexai.preview.language_models import TextGenerationModel

# Initialize Vertex AI
try:
    vertexai.init(project="<<project_id>>", location="us-central1")
except Exception as e:
    print(f"Failed to initialize Vertex AI: {e}")
    raise


## Step 5: Load the Pre-trained Model


In [None]:
# Load the pre-trained model
try:
    model = TextGenerationModel.from_pretrained("text-bison")
except Exception as e:
    print(f"Failed to load the pre-trained model: {e}")
    raise


## Step 6: Define Acronym Expander Function


In [None]:
# Define constants and model parameters
INPUT_PROMPT_ACRONYM = """
Given an acronym, could you please expand the full meaning out of it?
Here are a few glossaries that could help you expand:I am an AI trained to expand acronyms. Below is a glossary of some common acronyms and their expanded forms:

- API stands for Application Programming Interface
- HTTP stands for HyperText Transfer Protocol
- AI stands for Artificial Intelligence
- ML stands for Machine Learning
- IoT stands for Internet of Things
- SaaS stands for Software as a Service

You can provide any acronym, and I'll try to expand it based on my training. Here are a couple of examples:

Example 1:
Input: AI
Output: Artificial Intelligence

Example 2:
Input: IoT
Output: Internet of Things

Now, let's try with your input:
Input: {}
"""
parameters = {
    "max_output_tokens": 1024,
    "temperature": 0.2,
    "top_p": 0.8,
    "top_k": 40
}

def acronym_expander(column_name):
    print(f"Expanding acronym for {column_name}")
    try:
        response = model.predict(INPUT_PROMPT_ACRONYM.format(column_name), **parameters)
        if "output:" in response.text:
            expanded_name = response.text.split("output:", 1)[1].strip()
            # Prepare data for BigQuery
            rows_to_insert = [{
                "Original Column Name": column_name,
                "Expanded Column Name": expanded_name
            }]
            # Insert data into BigQuery
            errors = client.insert_rows_json(table_id, rows_to_insert)
            if errors:
                print(f"Failed to insert rows: {errors}")
            else:
                print(f"Inserted {column_name}: {expanded_name} into {table_name}")
            return expanded_name
        else:
            print(f"No output found for {column_name}")
            return ""
    except Exception as e:
        print(f"Failed to expand acronym for {column_name}: {e}")
        return ""


## Step 7: Upload the File


In [None]:
from google.colab import files

print("Please upload your CSV file:")
uploaded = files.upload()


## Step 8: Process File Content


In [None]:
import pandas as pd
import os

# Check if any file is uploaded
if not uploaded:
    print("No file uploaded. Exiting.")
    exit()

filename = list(uploaded.keys())[0]
data = pd.read_csv(filename)

# Create a directory for output files
os.makedirs('/content/output_files', exist_ok=True)

expanded_names = []
for column_name in data.get('ColumnName', []):  # Use get to avoid KeyError
    print(column_name)
    expanded_name = acronym_expander(column_name)
    expanded_names.append(expanded_name)

output_data = pd.DataFrame({
    'Original Column Name': data.get('ColumnName', []),
    'Expanded Column Name': expanded_names
})

output_file_path = '/content/output_files/output.csv'
output_data.to_csv(output_file_path, index=False)
print(output_data)


In [None]:
from google.colab import auth as google_auth

from google.cloud import bigquery

# Define table schema
client = bigquery.Client(project='<<project_id>>')


query_string = """
SELECT count(*) FROM `<<gbq_table_result>>`
"""

query_job = client.query(query_string)  # API request
results = query_job.result()  # Waits for query to finish
print("all good")

Now, all the acronyms from the uploaded CSV file have been expanded using the Vertex AI model, and the results have been saved to BigQuery. Additionally, the results are saved to a CSV file in the `/content/output_files` directory.

Feel free to download the result CSV file, or query the BigQuery table to see the expanded acronyms.
