# CMPE 258 - Assignment 1A: Multimodal AI with Google Gemini

**Author:** Yashaswini Dinesh

**Course:** CMPE 258 - Deep Learning, San Jose State University

---

## Overview

This notebook demonstrates Google Gemini multimodal capabilities:
1. Text-to-Image Generation (Imagen 3)
2. Text-to-Video Generation (Veo 2)
3. Image Analysis - Extract insights from uploaded images
4. Multi-turn Chat - Conversational AI demonstration

---

## HOW TO GET YOUR API KEY

1. Go to: https://aistudio.google.com/apikey
2. Sign in with your Google account
3. Click "Create API Key"
4. Copy the key
5. Paste it in the cell below where it says YOUR_API_KEY_HERE

## Section 1: Setup

In [None]:
# Install required libraries
!pip install -q -U google-genai google-generativeai pillow matplotlib
print("Libraries installed successfully")

In [None]:
# Import libraries
import os
import time
import base64
from datetime import datetime
from IPython.display import display, HTML, Markdown, Image as IPImage
import PIL.Image
import matplotlib.pyplot as plt
import io

print("Libraries imported successfully")
print("Date:", datetime.now().strftime('%Y-%m-%d %H:%M:%S'))

In [None]:
# Create output directories
os.makedirs("outputs/images", exist_ok=True)
os.makedirs("outputs/videos", exist_ok=True)
os.makedirs("outputs/transcripts", exist_ok=True)
print("Output directories created")

## Section 2: Enter Your API Key

### IMPORTANT: Replace YOUR_API_KEY_HERE with your actual Gemini API key

Get your key from: https://aistudio.google.com/apikey

In [None]:
#############################################
# PASTE YOUR API KEY BELOW (between the quotes)
#############################################

GEMINI_API_KEY = "YOUR_API_KEY_HERE"

#############################################
# DO NOT CHANGE ANYTHING BELOW THIS LINE
#############################################

if GEMINI_API_KEY == "YOUR_API_KEY_HERE" or GEMINI_API_KEY == "":
    print("")
    print("ERROR: You need to enter your API key!")
    print("")
    print("Steps:")
    print("1. Go to https://aistudio.google.com/apikey")
    print("2. Click 'Create API Key'")
    print("3. Copy the key")
    print("4. Replace YOUR_API_KEY_HERE above with your key")
    print("5. Run this cell again")
    print("")
    API_READY = False
else:
    print("API Key is set!")
    API_READY = True

In [None]:
# Initialize Gemini
if API_READY:
    try:
        import google.generativeai as genai
        genai.configure(api_key=GEMINI_API_KEY)
        print("Gemini initialized successfully!")
        GEMINI_READY = True
    except Exception as e:
        print("Error initializing Gemini:", str(e))
        GEMINI_READY = False
else:
    print("Please set your API key in the cell above first")
    GEMINI_READY = False

## Section 3: Model Configuration

In [None]:
# Model names
TEXT_MODEL = "gemini-2.0-flash"
FALLBACK_MODEL = "gemini-1.5-flash"
IMAGE_MODEL = "imagen-3.0-generate-001"
VIDEO_MODEL = "veo-2.0-generate-001"

print("Models configured:")
print("  Text/Chat:", TEXT_MODEL)
print("  Image:", IMAGE_MODEL, "(may require special access)")
print("  Video:", VIDEO_MODEL, "(may require special access)")

---

## Part 1: Text-to-Image Generation

Note: Imagen 3 requires special API access. If you get a 404 error, this is normal - the notebook will continue with other features.

In [None]:
# Image generation prompt
IMAGE_PROMPT = """
A majestic futuristic cityscape on Mars at sunset, with towering
bio-domes containing lush green forests, flying vehicles with
holographic trails, and the distant Earth visible in the pink-orange
Martian sky. Ultra-detailed, cinematic lighting, 8K quality.
"""

print("Image Prompt:")
print("-" * 50)
print(IMAGE_PROMPT.strip())

In [None]:
# Try to generate image (may not work without special access)
if GEMINI_READY:
    print("Attempting image generation...")
    print("Note: Imagen 3 requires special API access.")
    print("If this fails, the notebook will continue with other demos.")
    print("")

    try:
        from google import genai as genai_new
        from google.genai import types
        client = genai_new.Client(api_key=GEMINI_API_KEY)

        response = client.models.generate_images(
            model=IMAGE_MODEL,
            prompt=IMAGE_PROMPT.strip(),
            config=types.GenerateImagesConfig(
                number_of_images=1,
                aspect_ratio="16:9"
            )
        )

        if response.generated_images:
            image_data = response.generated_images[0].image.image_bytes
            with open("outputs/images/generated_image.png", "wb") as f:
                f.write(image_data)
            print("Image saved to outputs/images/generated_image.png")

            img = PIL.Image.open(io.BytesIO(image_data))
            plt.figure(figsize=(12, 8))
            plt.imshow(img)
            plt.axis('off')
            plt.title("Generated: Futuristic Mars City")
            plt.show()
        else:
            print("No image generated")

    except Exception as e:
        print("Image generation not available:", str(e)[:100])
        print("This is normal - Imagen 3 requires special access.")
        print("Continuing with other demos...")
else:
    print("Please set your API key first")

---

## Part 2: Text-to-Video Generation

Note: Veo 2 requires special API access. If you get a 404 error, this is normal.

In [None]:
# Video generation prompt
VIDEO_PROMPT = """
A mesmerizing journey through a bioluminescent underwater cave.
The camera slowly glides through crystal-clear water as glowing
jellyfish float past. Ethereal blue and green glow.
"""

print("Video Prompt:")
print("-" * 50)
print(VIDEO_PROMPT.strip())

In [None]:
# Try to generate video (may not work without special access)
if GEMINI_READY:
    print("Attempting video generation...")
    print("Note: Veo 2 requires special API access.")
    print("If this fails, the notebook will continue with other demos.")
    print("")

    try:
        from google import genai as genai_new
        from google.genai import types
        client = genai_new.Client(api_key=GEMINI_API_KEY)

        operation = client.models.generate_videos(
            model=VIDEO_MODEL,
            prompt=VIDEO_PROMPT.strip(),
            config=types.GenerateVideosConfig(
                aspect_ratio="16:9",
                number_of_videos=1,
                duration_seconds=8
            )
        )

        print("Video generation started...")
        for i in range(10):
            if operation.done:
                break
            print("  Waiting...", i+1)
            time.sleep(10)
            operation = client.operations.get(operation)

        if operation.done and operation.response:
            video_data = operation.response.generated_videos[0].video.video_bytes
            with open("outputs/videos/generated_video.mp4", "wb") as f:
                f.write(video_data)
            print("Video saved to outputs/videos/generated_video.mp4")
        else:
            print("Video generation timed out")

    except Exception as e:
        print("Video generation not available:", str(e)[:100])
        print("This is normal - Veo 2 requires special access.")
        print("Continuing with other demos...")
else:
    print("Please set your API key first")

---

## Part 3: Image Analysis

Upload any image and Gemini will analyze it in detail.

### What image to upload?
You can upload ANY image:
- A photo from your phone
- A screenshot
- An image from the internet
- A diagram or chart
- Anything you want analyzed!

In [None]:
# Upload an image
from google.colab import files

print("Please upload an image for analysis")
print("You can upload any image: photo, screenshot, diagram, etc.")
print("-" * 50)

uploaded_file = None
try:
    uploaded = files.upload()
    if uploaded:
        uploaded_file = list(uploaded.keys())[0]
        print("Uploaded:", uploaded_file)

        # Show the image
        img = PIL.Image.open(uploaded_file)
        plt.figure(figsize=(10, 8))
        plt.imshow(img)
        plt.axis('off')
        plt.title("Your Uploaded Image")
        plt.show()
except:
    print("No image uploaded - you can continue to the chat demo")

In [None]:
# Analyze the image
if GEMINI_READY and uploaded_file:
    print("Analyzing image with Gemini...")
    print("-" * 50)

    analysis_prompt = """
    Analyze this image and provide:

    ## 1. Description
    Describe what you see in detail.

    ## 2. Ten Observations
    List 10 interesting details (numbered 1-10).

    ## 3. Five Hypotheses
    Make 5 educated guesses about this image (H1-H5).

    ## 4. Three Stories
    Write 3 short creative stories inspired by this image.

    ## 5. Five Captions
    Write 5 different captions (Professional, Funny, Poetic, Mysterious, Inspirational).
    """

    try:
        model = genai.GenerativeModel(TEXT_MODEL)
        img = PIL.Image.open(uploaded_file)
        response = model.generate_content([analysis_prompt, img])
        analysis = response.text

        # Save to file
        with open("outputs/transcripts/image_analysis.md", "w") as f:
            f.write("# Image Analysis Report\n\n")
            f.write("Generated: " + datetime.now().strftime('%Y-%m-%d %H:%M:%S') + "\n\n")
            f.write(analysis)

        print("Analysis saved to outputs/transcripts/image_analysis.md")
        print("")
        print("=" * 60)
        print("IMAGE ANALYSIS RESULTS")
        print("=" * 60)
        print(analysis)

    except Exception as e:
        print("Error:", str(e))
        # Try fallback model
        try:
            print("Trying fallback model...")
            model = genai.GenerativeModel(FALLBACK_MODEL)
            response = model.generate_content([analysis_prompt, img])
            analysis = response.text

            with open("outputs/transcripts/image_analysis.md", "w") as f:
                f.write("# Image Analysis\n\n" + analysis)

            print("Analysis saved!")
            print(analysis)
        except Exception as e2:
            print("Fallback failed:", str(e2))

elif not GEMINI_READY:
    print("Please set your API key first")
else:
    print("Please upload an image first (run the cell above)")

---

## Part 4: Multi-turn Chat

This demonstrates Gemini's conversational abilities with context.

In [None]:
# Multi-turn chat demonstration
if GEMINI_READY:
    print("Starting Chat Demo")
    print("=" * 60)

    # Conversation
    messages = [
        "Imagine you are an AI historian from year 2150. What are the three most significant tech developments of the 2020s?",
        "As this future historian, describe how these technologies evolved and what unexpected consequences emerged.",
        "If you could send one piece of advice back to 2024 about AI development, what would it be?"
    ]

    transcript = "# Chat Transcript\n\n"
    transcript += "Date: " + datetime.now().strftime('%Y-%m-%d %H:%M:%S') + "\n\n"

    try:
        model = genai.GenerativeModel(TEXT_MODEL)
        chat = model.start_chat(history=[])
        print("Using model:", TEXT_MODEL)
        print("")

        for i, msg in enumerate(messages, 1):
            print("-" * 60)
            print("Turn", i, "- User:")
            print(msg)
            print("")

            response = chat.send_message(msg)
            reply = response.text

            print("Gemini:")
            if len(reply) > 800:
                print(reply[:800] + "...")
            else:
                print(reply)
            print("")

            transcript += "## Turn " + str(i) + "\n\n"
            transcript += "**User:** " + msg + "\n\n"
            transcript += "**Gemini:** " + reply + "\n\n---\n\n"

        # Save transcript
        with open("outputs/transcripts/chat.md", "w") as f:
            f.write(transcript)

        print("=" * 60)
        print("Chat saved to outputs/transcripts/chat.md")

    except Exception as e:
        print("Error:", str(e))
        # Try fallback
        try:
            print("Trying fallback model...")
            model = genai.GenerativeModel(FALLBACK_MODEL)
            chat = model.start_chat(history=[])

            for i, msg in enumerate(messages, 1):
                print("Turn", i)
                response = chat.send_message(msg)
                print(response.text[:500] + "...")
                transcript += "## Turn " + str(i) + "\n" + response.text + "\n\n"

            with open("outputs/transcripts/chat.md", "w") as f:
                f.write(transcript)
            print("Chat saved!")

        except Exception as e2:
            print("Fallback failed:", str(e2))
else:
    print("Please set your API key first (in Section 2)")

---

## Summary

In [None]:
# Summary of generated files
print("=" * 60)
print("ASSIGNMENT 1A - SUMMARY")
print("=" * 60)
print("")
print("Author: Yashaswini Dinesh")
print("Date:", datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
print("")
print("Generated Files:")
print("-" * 40)

total = 0
for folder in ["outputs/images", "outputs/videos", "outputs/transcripts"]:
    if os.path.exists(folder):
        files_list = os.listdir(folder)
        for f in files_list:
            path = os.path.join(folder, f)
            size = os.path.getsize(path)
            print(" ", path, "(", size, "bytes)")
            total += 1

if total == 0:
    print("  No files generated yet")
    print("")
    print("  Make sure you:")
    print("  1. Set your API key in Section 2")
    print("  2. Run the chat demo (Part 4)")
    print("  3. Upload and analyze an image (Part 3)")
else:
    print("")
    print("Total:", total, "file(s)")

print("")
print("=" * 60)

In [None]:
# Download all outputs
import shutil
from google.colab import files

# Check if there are files
has_files = False
for folder in ["outputs/images", "outputs/videos", "outputs/transcripts"]:
    if os.path.exists(folder) and os.listdir(folder):
        has_files = True
        break

if has_files:
    shutil.make_archive("assignment_1a_outputs", 'zip', 'outputs')
    print("Created: assignment_1a_outputs.zip")
    print("Downloading...")
    files.download("assignment_1a_outputs.zip")
else:
    print("No files to download yet.")
    print("")
    print("To generate files:")
    print("1. Make sure your API key is set in Section 2")
    print("2. Run Part 4 (Chat Demo) - this always works")
    print("3. Optionally upload an image in Part 3")

---

## Conclusion

This notebook demonstrated:

1. **Text-to-Image** - Imagen 3 (requires special access)
2. **Text-to-Video** - Veo 2 (requires special access)
3. **Image Analysis** - Works with any Gemini API key
4. **Multi-turn Chat** - Works with any Gemini API key

### References
- https://aistudio.google.com/
- https://ai.google.dev/docs
- https://www.datacamp.com/blog/janus-pro
- https://www.datacamp.com/blog/deepseek-r1

---

CMPE 258 - Assignment 1A - Yashaswini Dinesh - San Jose State University