<a href="https://colab.research.google.com/github/rhodes-byu/cs-stat-180/blob/main/labs/11-gen-ai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a><p><b>After clicking the "Open in Colab" link, copy the notebook to your own Google Drive before getting started, or it will not save your work</b></p>

# Lab 11: Introduction to Generative AI with Google Gemini

## Section 1\. Overview

In this lab, you will move beyond the web chat interface of Large Language Models (LLMs) and interact with Google's state-of-the-art model, **Gemini**, using Python code.

By the end of this lab, you will be able to:

  * Set up the Google Generative AI SDK in a Python environment.
  * Securely handle your API key.
  * Send programmatic prompts to the Gemini model.
  * Understand and control the "temperature" (creativity) of the model's output.
  * Use **Few-Shot Prompting** to force the model to generate structured data (JSON) suitable for data science tasks.
  * Learn about **Chain-of-Thought** prompting. Experiment with some simple examples.


## Section 2\. Prerequisites: Getting your API Key

To use the Gemini model through code, you need a unique API key. This key acts like a password that connects your script to Google's servers.

1.  Go to **Google AI Studio**: `https://aistudio.google.com/`
2.  Sign in with your Google account.
3.  On the left sidebar, click on **"Get API key"**.
4.  Click **"Create API key in new project"**.
5.  **CRITICAL:** Copy the long string of letters and numbers that appears. This is your key. Do not share it publicly.

----

## Section 3\. Lab Steps


### Step 1: Install the SDK

First, we need to install the Python library that allows us to communicate with Google's AI models.

*Run the following cell:*

In [None]:
# Install the Google Generative AI SDK
!pip install -q -U google-generativeai
print("Installation complete.")

### Step 2: Securely Setup your API Key

It is a security best practice never to hard-code your API key directly into your code, especially if you might share this notebook. Google Colab provides a secure way to store such secrets.

1.  On the left-hand sidebar of this Colab window, click the **Key icon** (Secrets).
2.  Click **"Add new secret"**.
3.  In the "Name" field, enter: `GOOGLE_API_KEY`
4.  In the "Value" field, paste the long API key you copied from Google AI Studio.
5.  Click the toggle button next to your new secret to enable notebook access.

*Now, run the following cell to securely load your key:*

In [None]:
import google.generativeai as genai
from google.colab import userdata

try:
    # Access the key from Colab secrets
    API_KEY = userdata.get('GOOGLE_API_KEY')

    # Configure the SDK with your key
    genai.configure(api_key=API_KEY)

    print("API Key configured successfully!")
except Exception as e:
    print(f"Error: {e}")
    print("Please ensure you have correctly added the 'GOOGLE_API_KEY' secret in the sidebar.")

### Step 3: Your First Generation

Now we are ready to use the model. We will initialize the `gemini-2.5-flash` model, which is designed for text-based tasks, and send it a simple prompt.

*Run the following cell:*

In [None]:
# Initialize the model
# Use the model name "gemini-2.5-flash" as a common compatible model name
model = genai.GenerativeModel('gemini-2.5-flash')

# Define a simple prompt
prompt = "Explain the concept of 'Data Science' to a 10-year-old in three sentences."

print(f"Sending prompt: '{prompt}'...\n")

# Generate content
response = model.generate_content(prompt)

# Print the result
print("--- Gemini Response ---")
print(response.text)

Analysis: How well did the LLM answer the question? How would you answer it differently? Create another prompt for a university student. How well did it answer? Try a few additional prompts until you get an answer that best fits with your understanding from the course.

### Step 4: Controlling Creativity (Temperature)

LLMs don't just pick the single most likely next word; they sample from a distribution of probable words. The **Temperature** parameter controls how "risky" this sampling is.

  * **Low Temperature (e.g., 0.0 - 0.3):** The model is more deterministic, choosing the most probable words. The output is more consistent, factual, and logical. Better for data tasks.
  * **High Temperature (e.g., 0.7 - 1.0):** The model takes more risks, choosing less probable words. The output is more creative and varied but can be less factual. Better for creative writing.

Let's see this in action by asking for something creative.

*Run the following cell multiple times to see how the high-temperature output changes.*

In [None]:
# import google.generativeai as genai  # already imported above

# --- Low Temperature Run ---
# Create a configuration for low temperature (consistent/deterministic)
low_temp_config = genai.types.GenerationConfig(temperature=0.1)

print("Generating with LOW temperature (0.1)...")
response_low = model.generate_content(
    "Generate a creative name for an AI start-up formed by BYU CS 180 students.",
    generation_config=low_temp_config
)
print(f"--- Low Temp Result ---\n{response_low.text}\n")

The output generates a lot of responses. How would you limit the number of responses?

### Rerun the prompt by implementing some of the suggestions in the previous output. Did this improve the output?

In [None]:
# --- High Temperature Run ---
# Create a configuration for high temperature (creative/varied)
# Gemini's effective temperature range is typically 0.0 to 1.0
high_temp_config = genai.types.GenerationConfig(temperature=0.9)

print("Generating with HIGH temperature (1.0)...")
response_high = model.generate_content(
    "Generate a creative name for an AI start-up formed by BYU CS 180 students.",
    generation_config=high_temp_config
)
print(f"--- High Temp Result ---\n{response_high.text}")

How did changing the temperature affect the responses? Try higher temperatures. What happened?

### Step 5: Few-Shot Prompting for Structured Data

This is one of the most practical techniques for a data scientist. Often, you don't want a chatty paragraph from the model; you want structured data (like JSON) that you can immediately use in a pipeline or load into a database.

Modern models are good at following instructions, but they are **much better** when you provide examples of the exact input-to-output pattern you desire. This is called **Few-Shot Prompting**.

In this example, we want to extract specific details from unstructured restaurant reviews and format them into strict JSON.

*Run the following cell:*

In [None]:
# Try a prompt that has no examples of the desired pattern
# This is called a "zero-shot" prompt

zero_shot_prompt = """
Your task is to analyze restaurant reviews and extract information into strict JSON format.
Do not provide any introductory or concluding text outside the JSON block.

Input Text: "I ordered the blackened salmon. It was cooked perfectly and very flavorful, but the service was slow."
Output JSON:
"""
# We use a low temperature because we want strict adherence to the JSON format,
# not creative improvisation.
formatting_config = genai.types.GenerationConfig(temperature=0.0)

print("Sending zero-shot prompt to extract JSON...")
response = model.generate_content(
    zero_shot_prompt,
    generation_config=formatting_config
)

print("\n--- Model Output ---")
print(response.text)

How well did it do? What happens when you don't use examples? That is, just ask for whether the review was positive, neutral, or negative?

In [None]:
# We define a prompt that includes examples of the desired pattern.
# This "teaches" the model the format we expect.

few_shot_prompt = """
Your task is to analyze restaurant reviews and extract information into strict JSON format.
Do not provide any introductory or concluding text outside the JSON block.

Examples:

Input Text: "The pepperoni pizza was greasy but tasty."
Output JSON: {"food_item": "pepperoni pizza", "sentiment": "mixed"}

Input Text: "Worst mushroom slice I've ever had. Cold and rubbery."
Output JSON: {"food_item": "mushroom slice", "sentiment": "negative"}

Input Text: "OMG the pineapple and jalapeño combo is life-changing! So good."
Output JSON: {"food_item": "pineapple and jalapeño combo", "sentiment": "positive"}

***

Now, process this new input:

Input Text: "I ordered the blackened salmon. It was cooked perfectly and very flavorful, but the service was slow."
Output JSON:
"""

# We use a low temperature because we want strict adherence to the JSON format,
# not creative improvisation.
formatting_config = genai.types.GenerationConfig(temperature=0.0)

print("Sending few-shot prompt to extract JSON...")
response = model.generate_content(
    few_shot_prompt,
    generation_config=formatting_config
)

print("\n--- Model Output ---")
print(response.text)

How well did it do? What happens when you use few-shot prompting? How is the output different? Why?

### Step 6: Chain-of-Thought Prompting

Chain-of-Thought (CoT) prompting is a technique that enables complex reasoning capabilities in large language models. By providing intermediate reasoning steps, the model can generate more accurate results on multi-step problems.

In this experiment, we'll give the model a riddle and explicitly ask it to "think step by step" to solve it. This encourages the model to show its reasoning process before arriving at the final answer.

In [None]:
# import google.generativeai as genai. # already imported above

# Re-initialize the model if needed, or use the one from Step 3 if it's still available.
# Use 'gemini-2.5-flash' as it was used successfully earlier in the notebook.
# If you encountered quota errors earlier, you might need to wait or use a different model/API key.

model = genai.GenerativeModel('gemini-2.5-flash')

chain_of_thought_prompt = """
A man is looking at a photograph and says, "Brothers and sisters I have none, but that man's father is my father's son." Who is the man in the photograph?

Let's think step by step to solve this riddle.
"""

# We'll use a low temperature to encourage focused and logical reasoning.
cot_config = genai.types.GenerationConfig(temperature=0.2)

print("Sending chain-of-thought prompt...")
response = model.generate_content(
    chain_of_thought_prompt,
    generation_config=cot_config
)

print("\n--- Model Output (Chain-of-Thought) ---")
print(response.text)

Come up with an interesting chain-of-thought problem and try it with the LLM. How well did it do? Can you figure out a problem to fool the LLM? Ask another LLM.

-----

## Step 7\. Submission Task

**To Submit:** Upload the code file with outputs and .pdf to Canvas