<a href="https://colab.research.google.com/github/whereissam/gemini-workshop/blob/main/Part_3_thinking_and_tools.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2025 Patrick Loeber

In [None]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Workshop: Build with Gemini (Part 3)

<a target="_blank" href="https://colab.research.google.com/github/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-3-thinking-and-tools.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This workshop teaches how to build with Gemini using the Gemini API and Python SDK.

Course outline:

- **[Part1: Quickstart + Text prompting](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-1-text-prompting.ipynb)**

- **[Part 2: Multimodal understanding (image, video, audio, docs, code)](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-2-multimodal-understanding.ipynb)**

- **Part 3 (this notebook): Thinking models + agentic capabilities (tool usage)**
  - Thinking models
  - Structured outputps
  - Code execution
  - Grounding with Google Search
  - Function calling
  - Final excercise: Give Gemini access to the PokéAPI to answer Pokémon questions

## 0. Use the Google AI Studio as playground

Explore and play with all models in the [Google AI Studio](https://aistudio.google.com/apikey).

## 1. Setup

Get a free API key in the [Google AI Studio](https://aistudio.google.com/apikey) and set up the [Google Gen AI Python SDK](https://github.com/googleapis/python-genai)

In [None]:
%pip install -U -q google-genai

In [None]:
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

In [None]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

## Thinking models

Starting with Gemini 2.5, all models have thinking capabilities. These models use an internal "thinking process" during response generation. This process contributes to their improved reasoning capabilities and allows them to solve complex tasks, particularly complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents.

Thinking models are also great at working with tools to perform actions beyond generating text. This allows them to interact with external systems, execute code, or access real-time information, incorporating the results into their reasoning and final response.

(Note: Tools are also available with Gemini 2.0 models)

In [None]:
MODEL = "gemini-2.5-pro-exp-03-25"  # with paid tier: gemini-2.5-pro-preview-03-25

In [None]:
# TODO

## **!! Exercise !!** ##

- Go to [Google AI Studio](https://ai.dev/?model=gemini-2.5-pro-preview-03-25), use Gemini 2.5 Pro, give it a complex task, and pbserve the thinking process. For example, create a p5js game in one shot:

```
Make a p5js soccer game simulation. There should be 2 teams and each player on the team should have their path traveled displayed. Add live stats on the right side and score in the top bar. no HTML
```

## Structured output

Gemini generates unstructured text by default, but some applications require structured text. For these use cases, you can constrain Gemini to respond with JSON, a structured data format suitable for automated processing. You can also constrain the model to respond with one of the options specified in an enum.

In [None]:
from pydantic import BaseModel

class Recipe(BaseModel):
  recipe_name: str
  ingredients: list[str]

response = client.models.generate_content(
    model=MODEL,
    contents='List a three popular cookie recipes. Be sure to include the amounts of ingredients.',
    config={
        'response_mime_type': 'application/json',
        'response_schema': list[Recipe],
    },
)
# Use the response as a JSON string.
print(response.text)

# Use instantiated objects.
my_recipes: list[Recipe] = response.parsed

[
  {
    "recipe_name": "Classic Chocolate Chip Cookies",
    "ingredients": [
      "2 1/4 cups all-purpose flour",
      "1 tsp baking soda",
      "1 tsp salt",
      "1 cup (2 sticks) unsalted butter, softened",
      "3/4 cup granulated sugar",
      "3/4 cup packed brown sugar",
      "1 tsp vanilla extract",
      "2 large eggs",
      "2 cups semi-sweet chocolate chips"
    ]
  },
  {
    "recipe_name": "Oatmeal Raisin Cookies",
    "ingredients": [
      "1 1/2 cups all-purpose flour",
      "1 tsp baking soda",
      "1 tsp ground cinnamon",
      "1/2 tsp salt",
      "1 cup (2 sticks) unsalted butter, softened",
      "1 cup packed light brown sugar",
      "1/2 cup granulated sugar",
      "2 large eggs",
      "1 tsp vanilla extract",
      "3 cups old-fashioned rolled oats",
      "1 cup raisins"
    ]
  },
  {
    "recipe_name": "Peanut Butter Cookies",
    "ingredients": [
      "1 1/4 cups all-purpose flour",
      "3/4 tsp baking soda",
      "1/2 tsp baking powder"

Contrain to enums:

In [None]:
# TODO


Or use the builtin Python enum class:

In [None]:
# TODO

## Code execution

The code execution feature enables the model to generate and run Python code and learn iteratively from the results until it arrives at a final output. You can use this code execution capability to build applications that benefit from code-based reasoning and that produce text output. For example, you could use code execution in an application that solves equations or processes text.

In [None]:
# TODO

In [None]:
response

In [None]:
from IPython.display import Image, Markdown, Code, HTML

def display_code_execution_result(response):
  for part in response.candidates[0].content.parts:
    if part.text is not None:
      display(Markdown(part.text))
    if part.executable_code is not None:
      code_html = f'<pre style="background-color: #6a0cad;">{part.executable_code.code}</pre>' # Change code color
      display(HTML(code_html))
    if part.code_execution_result is not None:
      display(Markdown("#### Output"))
      display(Markdown(part.code_execution_result.output))
    if part.inline_data is not None:
      display(Image(data=part.inline_data.data, format="png"))
    display(Markdown("---"))

display_code_execution_result(response)

## Grounding with Google Search

If Google Search is configured as a tool, Gemini can decide when to use Google Search to improve the accuracy and recency of responses.

Here's a question about a recent event without Google Search:



In [None]:
response = client.models.generate_content(
    model=MODEL,
    contents="Who won the super bowl this year?",
)

print(response.text)

In [None]:
# TODO

In [None]:
for part in response.candidates[0].content.parts:
    print(part.text)

In [None]:
# To get grounding metadata as web content.
HTML(response.candidates[0].grounding_metadata.search_entry_point.rendered_content)

#### **!! Exercise !!**

Use Gemini with Google Search for the current weather and the forecast for the next weekend in Berlin

In [None]:
# TODO

## Function calling

Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model understands when to call specific functions and provides the necessary parameters to execute real-world actions.

In [None]:
from google.genai import types

# Define the function declaration for the model
weather_function = {
    "name": "get_current_temperature",
    "description": "Gets the current temperature for a given location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The city name",
            },
        },
        "required": ["location"],
    },
}

# Configure the client and tools
# TODO

# Send request with function declarations
# TODO

Check for a function call

In [None]:
if response.candidates[0].content.parts[0].function_call:
    function_call = response.candidates[0].content.parts[0].function_call
    print(f"Function to call: {function_call.name}")
    print(f"Arguments: {function_call.args}")
    #  In a real app, you would call your function here:
    #  result = get_current_temperature(**function_call.args)
else:
    print("No function call found in the response.")
    print(response.text)

### Automatic Function Calling (Python Only)

When using the Python SDK, you can provide Python functions directly as tools.

The SDK handles the function call and returns the final text.

In [None]:
# Define the function with type hints and docstring
def get_current_temperature(location: str) -> dict:
    # TODO


# TODO

Check the function calling history:

In [None]:
for content in response.automatic_function_calling_history:
    for part in content.parts:
        if part.function_call:
            print(part.function_call)

## Exercise: Get Pokémon stats

- Define a function that can work with the PokéAPI and get Pokémon stats.
- Endpoint to use: `GET https://pokeapi.co/api/v2/pokemon/<pokekon_name>`
- Call Gemini and give it access to the function, then answer questions like: `"What stats does the Pokemon Squirtle have?"`


In [None]:
# TODO

## Recap & Next steps

Awesome work! You learned about thinking models with advanced reasoning capabilities and how to combine Gemini with tools for agentic use cases.

More helpful resources:

- [Thinking docs](https://ai.google.dev/gemini-api/docs/thinking)
- [Structured output docs](https://ai.google.dev/gemini-api/docs/structured-output?lang=python)
- [Code execution docs](https://ai.google.dev/gemini-api/docs/code-execution?lang=python)
- [Grounding docs](https://ai.google.dev/gemini-api/docs/grounding?lang=python)
- [Function calling docs](https://ai.google.dev/gemini-api/docs/function-calling?example=weather)

🎉🎉**Conratulations, you completed the workshop!**🎉🎉

**Next steps**: There's even more you can do with Gemini which we didn't cover in this workshop:

- [Image creation and editing with Gemini 2.0](https://github.com/patrickloeber/genai-tutorials/blob/main/notebooks/gemini-image-editing.ipynb)
- [Live API: Talk to Gemini and share your camera](https://aistudio.google.com/live) & [Live API cookbook](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Get_started_LiveAPI.ipynb)
