***
<span style="font-size:32px; color:rgba(0, 0, 255, 0.5);">Day 1 - Foundational Large Language Models & Text Generation</span>

---

<table style="width: 100%;">
  <tr>
    <td style="background-color: rgba(0, 255, 0, 0.2); text-align: center; font-size: 16px;">
    </td>
  </tr>
</table>

<span style="font-size:24px; color:rgba(0, 0, 0, 0.5);">Prompting</span>

---

"The advent of Large Language Models (LLMs) represents a seismic shift in the world of artificial intelligence. Their ability to process, generate, and understand user intent is fundamentally changing the way we interact with information and technology.

An LLM is an advanced artificial intelligence system that specializes in processing, understanding, and generating human-like text. These systems are typically implemented as a deep neural network and are trained on massive amounts of text data. This allows them to learn the intricate patterns of language, giving them the ability to perform a variety of tasks,
like machine translation, creative text generation, question answering, text summarization, and many more reasoning and language oriented tasks. This whitepaper dives into the timeline of the various architectures and approaches building up to the large language models and the architectures being used at the time of publication. It also discusses fine-tuning techniques to customize an LLM to a certain domain or task, methods to make the training more efficient, as well as methods to accelerate inference. These are then followed by various applications and code examples."

**Authors**<br>
Mohammadamin Barektain, Anant Nawalgaria, Daniel J. Mankowitz, Majd Al Merey, Yaniv Leviathan, Massimo Mascaro, Matan Kalman, Elena Buchatskaya, Aliaksei Severyn, and Antonio Gulli 

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Resources</span>

---
**Whitepaper**<br>
https://www.kaggle.com/whitepaper-foundational-llm-and-text-generation

**Foundational LLM Podcast**<br>
https://www.youtube.com/watch?v=mQDlCZZsOyo

**Foundational Live Stream**<br>
https://www.youtube.com/watch?v=kpRyiJUUFxY

**Get your API key from**<br>
https://aistudio.google.com/app/apikey

**Kaggle**<br>
https://www.kaggle.com/code/markishere/day-1-prompting


<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Libraries</span>

---

In [1]:
# %pip install torch torchvision torchaudio tensorflow transformers google-generativeai

In [2]:
import os, enum, json, time
from dotenv import load_dotenv

from IPython.display import Markdown, display

import google.generativeai as genai
from google.api_core import retry

from transformers import GPT2Tokenizer

from typing import TypedDict

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Initialize the API</span>

---

In [3]:
# Load API key from .env file
load_dotenv()
api_key = os.getenv("GAI_API_KEY")

# Set up the API key for the genai library
genai.configure(api_key=api_key)

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Run your first prompt</span>

---
In this step, you will test that your API key is set up correctly by making a request. The `gemini-1.5-flash` model has been selected here.

In [4]:
# Initialize the Gemini model (gemini-1.5-flash)
flash = genai.GenerativeModel('gemini-1.5-flash')

# Generate content with the model
response = flash.generate_content("Explain AI to me like I'm a kid.")

# Print the generated response
print(response.text)

Imagine you have a really smart puppy.  You teach it tricks, like "sit" and "fetch."  The more you teach it, the better it gets at those tricks.

AI is kind of like that super smart puppy, but instead of learning tricks, it learns from information.  We give it lots and lots of information – like pictures of cats and dogs, or sentences in different languages – and it learns to recognize patterns.

So, if you show the AI a new picture of a cat, it might say "cat!" because it learned what cats look like from all the pictures it saw before.  Or if you ask it to translate "hello" into Spanish, it might say "hola" because it learned that from all the sentences it was shown.

The AI isn't actually *thinking* like you and me, it's just really good at finding patterns and following instructions based on the information it's been given. It's like a super fast, super smart parrot that can do amazing things with information!  It can even learn to play games, write stories, or even help doctors mak

In [5]:
# View the results in Markdown format
Markdown(response.text)

Imagine you have a really smart puppy.  You teach it tricks, like "sit" and "fetch."  The more you teach it, the better it gets at those tricks.

AI is kind of like that super smart puppy, but instead of learning tricks, it learns from information.  We give it lots and lots of information – like pictures of cats and dogs, or sentences in different languages – and it learns to recognize patterns.

So, if you show the AI a new picture of a cat, it might say "cat!" because it learned what cats look like from all the pictures it saw before.  Or if you ask it to translate "hello" into Spanish, it might say "hola" because it learned that from all the sentences it was shown.

The AI isn't actually *thinking* like you and me, it's just really good at finding patterns and following instructions based on the information it's been given. It's like a super fast, super smart parrot that can do amazing things with information!  It can even learn to play games, write stories, or even help doctors make diagnoses, but it still needs us to teach it and help it along the way.


<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Run your first prompt</span>

---
The previous example uses a single-turn, text-in/text-out structure, but you can also set up a multi-turn chat structure too.

In [6]:
chat = flash.start_chat(history=[])
response = chat.send_message('Hello! My name is tsummey.')
print(response.text)

Hello tsummey! It's nice to meet you.  How can I help you today?



In [7]:
response = chat.send_message('Can you tell something interesting about dinosaurs?')
print(response.text)

One interesting fact about dinosaurs is that some of them, like the *Therizinosaurus*, had incredibly long claws, some reaching up to three feet long!  Scientists aren't entirely sure what these claws were primarily used for, but theories range from defense against predators to reaching high into trees for food, or even for intraspecies competition (fighting amongst themselves).  The sheer size and unusual nature of these claws makes *Therizinosaurus* a truly fascinating and somewhat mysterious dinosaur.



In [8]:
# While you have the `chat` object around, the conversation state
# persists. Confirm that by asking if it knows my name.
response = chat.send_message('Do you remember what my name is?')
print(response.text)

Yes, your name is tsummey.



In [9]:
# What does the raw response JSON look like
chat.send_message('Why is the sky blue')

response:
GenerateContentResponse(
    done=True,
    iterator=None,
    result=protos.GenerateContentResponse({
      "candidates": [
        {
          "content": {
            "parts": [
              {
                "text": "The sky is blue due to a phenomenon called **Rayleigh scattering**.  Sunlight is made up of all the colors of the rainbow.  As sunlight enters the Earth's atmosphere, it collides with tiny air molecules (mostly nitrogen and oxygen).  These molecules are much smaller than the wavelengths of visible light.\n\nRayleigh scattering affects shorter wavelengths of light more strongly than longer wavelengths.  Blue and violet light have the shortest wavelengths, so they are scattered more effectively than other colors.  This scattered blue light is what we see when we look at the sky.\n\nViolet light actually has an even shorter wavelength than blue, and should therefore scatter even more. However, our eyes are less sensitive to violet, and the sun emits slightly le

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Choose a model</span>

---
The Gemini API provides access to a number of models from the Gemini model family. Read about the available models and their capabilities on the [model overview page](https://ai.google.dev/gemini-api/docs/models/gemini). In this step you'll use the API to list all of the available models.

In [10]:
for model in genai.list_models():
  print(model.name)

models/chat-bison-001
models/text-bison-001
models/embedding-gecko-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro
models/gemini-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-pro-exp-0801
models/gemini-1.5-pro-exp-0827
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash-001-tuning
models/gemini-1.5-flash
models/gemini-1.5-flash-exp-0827
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-1.5-flash-8b-exp-0827
models/gemini-1.5-flash-8b-exp-0924
models/learnlm-1.5-pro-experimental
models/gemini-exp-1114
models/embedding-001
models/text-embedding-004
models/aqa


In [11]:
for model in genai.list_models():
  if model.name == 'models/gemini-1.5-flash':
    print(model)
    break

Model(name='models/gemini-1.5-flash',
      base_model_id='',
      version='001',
      display_name='Gemini 1.5 Flash',
      description=('Alias that points to the most recent stable version of Gemini 1.5 Flash, our '
                   'fast and versatile multimodal model for scaling across diverse tasks.'),
      input_token_limit=1000000,
      output_token_limit=8192,
      supported_generation_methods=['generateContent', 'countTokens'],
      temperature=1.0,
      max_temperature=2.0,
      top_p=0.95,
      top_k=40)


<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Explore generation parameters</span>

---

<span style="font-size:16px; color:rgba(0, 0, 0, 0.5);">Output Length</span><br>
When generating text with an LLM, the output length affects cost and performance. Generating more tokens increases computation, leading to higher energy consumption, latency, and cost.

To stop the model from generating tokens past a limit, you can specify the `max_output_tokens` parameter when using the Gemini API. Specifying this parameter does not influence the generation of the output tokens, so the output will not become more stylistically or textually succinct, but it will stop generating tokens once the specified length is reached. Prompt engineering may be required to generate a more complete output for your given limit.

In [12]:
short_model = genai.GenerativeModel(
    'gemini-1.5-flash',
    generation_config=genai.GenerationConfig(max_output_tokens=200))

response = short_model.generate_content('Write a 1000 word essay on the importance of olives in modern society.')
print(response.text)

## The Enduring Significance of Olives in Modern Society

The olive, a seemingly simple fruit, holds a profound and multifaceted importance in modern society, extending far beyond its culinary applications.  From its contribution to the global economy to its role in cultural heritage and even its potential in health and sustainability, the olive's impact resonates across various spheres of human life. Understanding this significance requires examining its economic influence, its deeply entrenched cultural identity, its health benefits, and finally, the challenges and opportunities related to its sustainable cultivation.

Economically, the olive and its derivatives represent a significant industry worldwide. The Mediterranean basin, historically the heartland of olive cultivation, remains a major producer, with countries like Spain, Italy, Greece, and Turkey contributing substantially to global olive oil production.  This industry provides livelihoods for millions, encompassing farmers,

In [13]:
response = short_model.generate_content('Write a short poem on the importance of olives in modern society.')
print(response.text)

From ancient groves, a bounty small,
The olive's grace, it conquers all.
In oils so rich, a flavour deep,
A Mediterranean promise to keep.

On tables spread, a simple fare,
In beauty products, beyond compare.
From lotions smooth to soaps so mild,
The olive's touch, both pure and wild.

A symbol strong, of sun-drenched lands,
In modern times, it still commands.
A taste of history, vibrant, bright,
The olive shines, a welcome light. 



In [14]:
# Changing the max_output_tokens to 50 and review the response change
short_model = genai.GenerativeModel(
    'gemini-1.5-flash',
    generation_config=genai.GenerationConfig(max_output_tokens=50))

response = short_model.generate_content('Write a 1000 word essay on the importance of olives in modern society.')
print(response.text)

## The Enduring Importance of Olives in Modern Society

The olive (Olea europaea), a seemingly unassuming fruit, holds a position of profound importance in modern society that extends far beyond its culinary applications.  Its significance is woven into the fabric


In [15]:
# Initialize the tokenizer (adjust the model name as per your use case)
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# Tokenize the response text
tokens = tokenizer.tokenize(response.text)

# Token check
print('The model returned',len(tokens),'tokens\n')  # Should return 50 for strict compliance

# Print each token with its index,Ġ represents a space and Ċ a new line.
for idx, token in enumerate(tokens, start=1):  # start=1 ensures the count starts at 1
    print(f"{idx}: {token}")

The model returned 55 tokens

1: ##
2: ĠThe
3: ĠEnd
4: uring
5: ĠImport
6: ance
7: Ġof
8: ĠOl
9: ives
10: Ġin
11: ĠModern
12: ĠSociety
13: Ċ
14: Ċ
15: The
16: Ġolive
17: Ġ(
18: O
19: le
20: a
21: Ġeuro
22: p
23: aea
24: ),
25: Ġa
26: Ġseemingly
27: Ġun
28: assuming
29: Ġfruit
30: ,
31: Ġholds
32: Ġa
33: Ġposition
34: Ġof
35: Ġprofound
36: Ġimportance
37: Ġin
38: Ġmodern
39: Ġsociety
40: Ġthat
41: Ġextends
42: Ġfar
43: Ġbeyond
44: Ġits
45: Ġculinary
46: Ġapplications
47: .
48: Ġ
49: ĠIts
50: Ġsignificance
51: Ġis
52: Ġwoven
53: Ġinto
54: Ġthe
55: Ġfabric


<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Why you see more tokens that expected</span>

---
Finalizing the Response: The tokenizer added an extra tokens to ensure the response ends coherently. For example:

If token 50 was "A", the model might have included " " (space) or the next character, resulting in token 51.
Formatting Artifacts: Special tokens (like line breaks or spaces) could sneak in during tokenization, inflating the count.

Implementation Quirk: The specific library you're using might misinterpret the token limit during post-processing, including the last token even if it slightly exceeds the set maximum.

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Display Tokens and IDs</span>

---
Tokenization is the bridge between raw text and numerical computations. The model learns to associate tokens with context, enabling it to predict and generate coherent responses in the correct order. This process is the heart of training and inference in language models!

In [16]:
# Initialize the tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# Tokenize the response text and get the token IDs
tokens = tokenizer.tokenize(response.text)
token_ids = tokenizer.encode(response.text)

# Print each token with its corresponding integer ID
for idx, (token, token_id) in enumerate(zip(tokens, token_ids), start=1):
    print(f"{idx}: Token: {token}, ID: {token_id}")

1: Token: ##, ID: 2235
2: Token: ĠThe, ID: 383
3: Token: ĠEnd, ID: 5268
4: Token: uring, ID: 870
5: Token: ĠImport, ID: 17267
6: Token: ance, ID: 590
7: Token: Ġof, ID: 286
8: Token: ĠOl, ID: 6544
9: Token: ives, ID: 1083
10: Token: Ġin, ID: 287
11: Token: ĠModern, ID: 12495
12: Token: ĠSociety, ID: 7023
13: Token: Ċ, ID: 198
14: Token: Ċ, ID: 198
15: Token: The, ID: 464
16: Token: Ġolive, ID: 19450
17: Token: Ġ(, ID: 357
18: Token: O, ID: 46
19: Token: le, ID: 293
20: Token: a, ID: 64
21: Token: Ġeuro, ID: 11063
22: Token: p, ID: 79
23: Token: aea, ID: 44705
24: Token: ),, ID: 828
25: Token: Ġa, ID: 257
26: Token: Ġseemingly, ID: 9775
27: Token: Ġun, ID: 555
28: Token: assuming, ID: 32935
29: Token: Ġfruit, ID: 8234
30: Token: ,, ID: 11
31: Token: Ġholds, ID: 6622
32: Token: Ġa, ID: 257
33: Token: Ġposition, ID: 2292
34: Token: Ġof, ID: 286
35: Token: Ġprofound, ID: 11982
36: Token: Ġimportance, ID: 6817
37: Token: Ġin, ID: 287
38: Token: Ġmodern, ID: 3660
39: Token: Ġsociety, ID: 359

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Temperature</span>

---
Temperature controls the degree of randomness in token selection. Higher temperatures result in a higher number of candidate tokens from which the next output token is selected, and can produce more diverse results, while lower temperatures have the opposite effect, such that a temperature of 0 results in greedy decoding, selecting the most probable token at each step.

Temperature doesn't provide any guarantees of randomness, but it can be used to "nudge" the output somewhat.

<font color=red>Note: If you see a 429 Resource Exhausted error here, you may be able to edit the words in the prompt slightly to progress.</font>

In [17]:
high_temp_model = genai.GenerativeModel(
    'gemini-1.5-flash',
    generation_config=genai.GenerationConfig(temperature=2.0))


# When running lots of queries, it's a good practice to use a retry policy so your code
# automatically retries when hitting Resource Exhausted (quota limit) errors.
retry_policy = {
    "retry": retry.Retry(predicate=retry.if_transient_error, initial=10, multiplier=1.5, timeout=300)
}

for _ in range(5):
  response = high_temp_model.generate_content('Pick a random colour... (respond in a single word)',
                                              request_options=retry_policy)
  if response.parts:
    print(response.text, '-' * 25)

Maroon
 -------------------------
Maroon
 -------------------------
Maroon
 -------------------------
Aquamarine
 -------------------------
Maroon
 -------------------------


In [18]:
high_temp_model = genai.GenerativeModel(
    'gemini-1.5-flash',
    generation_config=genai.GenerationConfig(
        temperature=2.0
    )
)


# When running lots of queries, it's a good practice to use a retry policy so your code
# automatically retries when hitting Resource Exhausted (quota limit) errors.
retry_policy = {
    "retry": retry.Retry(predicate=retry.if_transient_error, initial=10, multiplier=1.5, timeout=300)
}

for _ in range(5):
  response = high_temp_model.generate_content('Pick a random colour... (respond in a single word)',
                                              request_options=retry_policy)
  if response.parts:
    print(response.text, '-' * 25)

Marigold
 -------------------------
Maroon
 -------------------------
Purple
 -------------------------
Aquamarine
 -------------------------
Purple
 -------------------------


In [19]:
low_temp_model = genai.GenerativeModel(
    'gemini-1.5-flash',
    generation_config=genai.GenerationConfig(temperature=0.0))

for _ in range(5):
  response = low_temp_model.generate_content('Pick a random colour... (respond in a single word)',
                                             request_options=retry_policy)
  if response.parts:
    print(response.text, '-' * 25)

Maroon
 -------------------------
Maroon
 -------------------------
Maroon
 -------------------------
Maroon
 -------------------------
Maroon
 -------------------------


<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Top-K and Top-P</span>

---
Like temperature, top-K and top-P parameters are also used to control the diversity of the model's output.

Top-K is a positive integer that defines the number of most probable tokens from which to select the output token. A top-K of 1 selects a single token, performing greedy decoding.

Top-P defines the probability threshold that, once cumulatively exceeded, tokens stop being selected as candidates. A top-P of 0 is typically equivalent to greedy decoding, and a top-P of 1 typically selects every token in the model's vocabulary.

When both are supplied, the Gemini API will filter top-K tokens first, then top-P and then finally sample from the candidate tokens using the supplied temperature.

Run this example a number of times, change the settings and observe the change in output.

In [20]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-001',
    generation_config=genai.GenerationConfig(
        # These are the default values for gemini-1.5-flash-001.
        temperature=1.0,
        top_k=64,
        top_p=0.95,
    ))

story_prompt = "You are a creative writer. Write a short story about a cat who goes on an adventure."
response = model.generate_content(story_prompt, request_options=retry_policy)
print(response.text)

Bartholomew, a sleek black cat with eyes like polished emeralds, was bored. His days were a monotonous cycle of sunbeams, naps, and the occasional mouse chase. One morning, while perched on the windowsill, he saw a flash of brilliant blue – a robin, perched on the fencepost, chirping a cheerful tune. 

Something stirred within Bartholomew. He yearned for adventure, for something more exciting than the predictable routine of his life. He watched the robin, its tiny chest puffed with song, and a plan began to form. He would follow the robin, see where it led him. 

Bartholomew slipped out the open window, the robin's song his guiding melody. He scampered through the garden, a blur of black fur against the vibrant green. He chased the robin through the meadow, across a babbling brook, and into the heart of a whispering forest.

The forest was a world unlike any Bartholomew had known. Tall trees cast long, dappled shadows on the mossy ground. Strange smells tickled his nose, and whispers r

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Prompting</span>

---

This section contains some prompts from the chapter for you to try out directly in the API. Try changing the text here to see how each prompt performs with different instructions, more examples, or any other changes you can think of.

<span style="font-size:16px; color:rgba(0, 0, 0, 0.5);">Zero-shot</span>

Zero-shot prompts are prompts that describe the request for the model directly.

<table align=left>
  <td>
    <a target="_blank" href="https://aistudio.google.com/prompts/1gzKKgDHwkAvexG5Up0LMtl1-6jKMKe4g"><img src="https://ai.google.dev/site-assets/images/marketing/home/icon-ais.png" style="height: 24px" height=24/> Open in AI Studio</a>
  </td>
</table>

In [21]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-001',
    generation_config=genai.GenerationConfig(
        temperature=0.1,
        top_p=1,
        max_output_tokens=5,
    ))

zero_shot_prompt = """Classify movie reviews as POSITIVE, NEUTRAL or NEGATIVE.
Review: "Her" is a disturbing study revealing the direction
humanity is headed if AI is allowed to keep evolving,
unchecked. I wish there were more movies like this masterpiece.
Sentiment: """

response = model.generate_content(zero_shot_prompt, request_options=retry_policy)
print(response.text)

Sentiment: **POSITIVE**


<span style="font-size:16px; color:rgba(0, 0, 0, 0.5);">Enum mode</span>

The models are trained to generate text, and can sometimes produce more text than you may wish for. In the preceding example, the model will output the label, sometimes it can include a preceding "Sentiment" label, and without an output token limit, it may also add explanatory text afterwards.

The Gemini API has an [Enum mode](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Enum.ipynb) feature that allows you to constrain the output to a fixed set of values.

In [22]:
# This code below was modified from the original Kaggle notebook due to the response
# AttributeError: type object 'dummy' has no attribute 'model_json_schema'

class Sentiment(enum.Enum):
    POSITIVE = "positive"
    NEUTRAL = "neutral"
    NEGATIVE = "negative"

# Configure and initialize the model without unsupported attributes
model = genai.GenerativeModel(
    'gemini-1.5-flash-001',
    generation_config=genai.GenerationConfig()
)

# Provide a prompt and handle response interpretation manually
zero_shot_prompt = "What is the sentiment of this statement: 'I love this product!'"

response = model.generate_content(zero_shot_prompt)
text_response = response.text
print("Response:", text_response)

# Simple interpretation (manual sentiment analysis)
if "positive" in text_response.lower():
    sentiment = Sentiment.POSITIVE
elif "neutral" in text_response.lower():
    sentiment = Sentiment.NEUTRAL
elif "negative" in text_response.lower():
    sentiment = Sentiment.NEGATIVE
else:
    sentiment = "Unknown"

print("Sentiment:", sentiment)

Response: The sentiment of the statement "I love this product!" is **extremely positive**. 

The words "love" and "product" directly indicate a strong positive feeling towards the item being discussed. 

Sentiment: Sentiment.POSITIVE


<span style="font-size:16px; color:rgba(0, 0, 0, 0.5);">One-shot and Few-shot</span>

Providing an example of the expected response is known as a "one-shot" prompt. When you provide multiple examples, it is a "few-shot" prompt.

<table align=left>
  <td>
    <a target="_blank" href="https://aistudio.google.com/prompts/1jjWkjUSoMXmLvMJ7IzADr_GxHPJVV2bg"><img src="https://ai.google.dev/site-assets/images/marketing/home/icon-ais.png" style="height: 24px" height=24/> Open in AI Studio</a>
  </td>
</table>

In [23]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    generation_config=genai.GenerationConfig(
        temperature=0.1,
        top_p=1,
        max_output_tokens=250,
    ))

few_shot_prompt = """Parse a customer's pizza order into valid JSON:

EXAMPLE:
I want a small pizza with cheese, tomato sauce, and pepperoni.
JSON Response:
```
{
"size": "small",
"type": "normal",
"ingredients": ["cheese", "tomato sauce", "peperoni"]
}
```

EXAMPLE:
Can I get a large pizza with tomato sauce, basil and mozzarella
JSON Response:
```
{
"size": "large",
"type": "normal",
"ingredients": ["tomato sauce", "basil", "mozzarella"]
}

ORDER:
"""

customer_order = "Give me a large with cheese & pineapple"


response = model.generate_content([few_shot_prompt, customer_order])
print(response.text)

```json
{
  "size": "large",
  "type": "normal",
  "ingredients": ["cheese", "pineapple"]
}
```



<span style="font-size:16px; color:rgba(0, 0, 0, 0.5);">JSON mode</span>

To provide control over the schema, and to ensure that you only receive JSON (with no other text or markdown), you can use the Gemini API's [JSON mode](https://github.com/google-gemini/cookbook/blob/main/quickstarts/JSON_mode.ipynb). This forces the model to constrain decoding, such that token selection is guided by the supplied schema.

In [24]:
# The code below was modified, the markdown formatting would result in
# Error: The response was not valid JSON. The code below works.

# Define the TypedDict for expected JSON structure
class PizzaOrder(TypedDict):
    size: str
    ingredients: list[str]
    type: str

# Configure the model with appropriate generation settings
model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    generation_config=genai.GenerationConfig(
        temperature=0.1
    )
)

# Prompt instructing the model to return a pure JSON response
prompt = """
Can I have a large dessert pizza with apple and chocolate?
Please respond in pure JSON format without any extra formatting or explanations:
{
    "size": "string",
    "ingredients": ["string"],
    "type": "string"
}
"""

# Generate the response
response = model.generate_content(prompt)
response_text = response.text

# Print raw response for debugging
print("Raw response:", response_text)

# Remove any Markdown formatting
cleaned_response = response_text.strip("```").replace("json\n", "").replace("\n```", "")

# Try parsing the cleaned response as JSON
try:
    pizza_order: PizzaOrder = json.loads(cleaned_response)
    print("Parsed Pizza Order:", pizza_order)
except json.JSONDecodeError as e:
    print("Error: The response was not valid JSON.")
    print("Details:", e)

Raw response: ```json
{
  "size": "large",
  "ingredients": ["apple", "chocolate"],
  "type": "dessert pizza"
}
```

Parsed Pizza Order: {'size': 'large', 'ingredients': ['apple', 'chocolate'], 'type': 'dessert pizza'}


<span style="font-size:16px; color:rgba(0, 0, 0, 0.5);">Chain of Thought (CoT)</span>

Direct prompting on LLMs can return answers quickly and (in terms of output token usage) efficiently, but they can be prone to hallucination. The answer may "look" correct (in terms of language and syntax) but is incorrect in terms of factuality and reasoning.

Chain-of-Thought prompting is a technique where you instruct the model to output intermediate reasoning steps, and it typically gets better results, especially when combined with few-shot examples. It is worth noting that this technique doesn't completely eliminate hallucinations, and that it tends to cost more to run, due to the increased token count.

As models like the Gemini family are trained to be "chatty" and provide reasoning steps, you can ask the model to be more direct in the prompt.

In [25]:
prompt = """When I was 4 years old, my wife was 3 times my age. Now, I
am 20 years old. How old is my wife? Return the answer directly."""

model = genai.GenerativeModel('gemini-1.5-flash-latest')
response = model.generate_content(prompt, request_options=retry_policy)

print(response.text)

32



In [26]:
prompt = """When I was 4 years old, my wife was 3 times my age. Now,
I am 20 years old. How old is my wife? Let's think step by step."""

response = model.generate_content(prompt, request_options=retry_policy)
print(response.text)

Here's how to solve this step-by-step:

1. **Wife's age when you were 4:** When you were 4, your wife was 3 times your age, meaning she was 4 * 3 = 12 years old.

2. **Age difference:** The age difference between you and your wife is 12 - 4 = 8 years.

3. **Wife's current age:**  Since you are now 20, your wife is 20 + 8 = 28 years old.



<span style="font-size:16px; color:rgba(0, 0, 0, 0.5);">ReAct: Reason and act</span>

In this example you will run a ReAct prompt directly in the Gemini API and perform the searching steps yourself. As this prompt follows a well-defined structure, there are frameworks available that wrap the prompt into easier-to-use APIs that make tool calls automatically, such as the LangChain example from the chapter.

To try this out with the Wikipedia search engine, check out the [Searching Wikipedia with ReAct](https://github.com/google-gemini/cookbook/blob/main/examples/Search_Wikipedia_using_ReAct.ipynb) cookbook example.


> Note: The prompt and in-context examples used here are from [https://github.com/ysymyth/ReAct](https://github.com/ysymyth/ReAct) which is published under a [MIT license](https://opensource.org/licenses/MIT), Copyright (c) 2023 Shunyu Yao.

<table align=left>
  <td>
    <a target="_blank" href="https://aistudio.google.com/prompts/18oo63Lwosd-bQ6Ay51uGogB3Wk3H8XMO"><img src="https://ai.google.dev/site-assets/images/marketing/home/icon-ais.png" style="height: 24px" height=24/> Open in AI Studio</a>
  </td>
</table>

In [27]:
model_instructions = """
Solve a question answering task with interleaving Thought, Action, Observation steps. Thought can reason about the current situation,
Observation is understanding relevant information from an Action's output and Action can be one of three types:
 (1) <search>entity</search>, which searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it
     will return some similar entities to search and you can try to search the information from those topics.
 (2) <lookup>keyword</lookup>, which returns the next sentence containing keyword in the current context. This only does exact matches,
     so keep your searches short.
 (3) <finish>answer</finish>, which returns the answer and finishes the task.
"""

example1 = """Question
Musician and satirist Allie Goertz wrote a song about the "The Simpsons" character Milhouse, who Matt Groening named after who?

Thought 1
The question simplifies to "The Simpsons" character Milhouse is named after who. I only need to search Milhouse and find who it is named after.

Action 1
<search>Milhouse</search>

Observation 1
Milhouse Mussolini Van Houten is a recurring character in the Fox animated television series The Simpsons voiced by Pamela Hayden and created by Matt Groening.

Thought 2
The paragraph does not tell who Milhouse is named after, maybe I can look up "named after".

Action 2
<lookup>named after</lookup>

Observation 2
Milhouse was named after U.S. president Richard Nixon, whose middle name was Milhous.

Thought 3
Milhouse was named after U.S. president Richard Nixon, so the answer is Richard Nixon.

Action 3
<finish>Richard Nixon</finish>
"""

example2 = """Question
What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?

Thought 1
I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.

Action 1
<search>Colorado orogeny</search>

Observation 1
The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.

Thought 2
It does not mention the eastern sector. So I need to look up eastern sector.

Action 2
<lookup>eastern sector</lookup>

Observation 2
The eastern sector extends into the High Plains and is called the Central Plains orogeny.

Thought 3
The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.

Action 3
<search>High Plains</search>

Observation 3
High Plains refers to one of two distinct land regions

Thought 4
I need to instead search High Plains (United States).

Action 4
<search>High Plains (United States)</search>

Observation 4
The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130m).

Thought 5
High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.

Action 5
<finish>1,800 to 7,000 ft</finish>
"""

example3 = """Question
What is the scientific classification (genus and species) of the animal commonly known as the giant panda?

Thought 1:
To answer this question, I need to search for "giant panda" to find its scientific classification.

Action 1:
<search>giant panda</search>

Observation 1:
The giant panda (Ailuropoda melanoleuca) is a bear species native to China, characterized by its bold black-and-white coat.

Thought 2:
The observation contains the scientific classification for the giant panda. The genus is "Ailuropoda" and the species is "melanoleuca," so the answer is Ailuropoda melanoleuca.

Action 2:
<finish>Ailuropoda melanoleuca</finish>
"""

# Come up with more examples yourself, or take a look through https://github.com/ysymyth/ReAct/

 To capture a single step at a time, while ignoring any hallucinated Observation steps, you will use stop_sequences to end the generation process. The steps are Thought, Action, Observation, in that order.

In [28]:
question = """Question
Who was the youngest author listed on the transformers NLP paper?
"""

model = genai.GenerativeModel('gemini-1.5-flash-latest')
react_chat = model.start_chat()

# You will perform the Action, so generate up to, but not including, the Observation.
config = genai.GenerationConfig(stop_sequences=["\nObservation"])

resp = react_chat.send_message(
    [model_instructions, example1, example2, question],
    generation_config=config,
    request_options=retry_policy)
print(resp.text)

Thought 1
I need to find the Transformers NLP paper and look for the authors' ages.  This will require searching for the paper and then likely some further searching to find the authors' ages.

Action 1
<search>Transformers NLP paper</search>



Now you can perform this research yourself and supply it back to the model.

In [29]:
observation = """Observation 1
[1706.03762] Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
"""
resp = react_chat.send_message(observation, generation_config=config)
print(resp.text)

Thought 2
The observation provides the authors of the paper "Attention is All You Need".  I don't have their ages, so I'll need to search for each author individually to find their birthdates.  This is likely to be time-consuming and may not be possible for all authors.  I will focus on finding the youngest.

Action 2
<search>Aidan N. Gomez age>



This process repeats until the `<finish>` action is reached. You can continue running this yourself if you like, or try the [Wikipedia example](https://github.com/google-gemini/cookbook/blob/main/examples/Search_Wikipedia_using_ReAct.ipynb) to see a fully automated ReAct system at work.

<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Code Prompting</span>

---

<span style="font-size:16px; color:rgba(0, 0, 0, 0.5);">Generating Code</span>

The Gemini family of models can be used to generate code, configuration and scripts. Generating code can be helpful when learning to code, learning a new language or for rapidly generating a first draft.

It's important to be aware that since LLMs can't reason, and can repeat training data, it's essential to read and test your code first, and comply with any relevant licenses.

<table align=left>
  <td>
    <a target="_blank" href="https://aistudio.google.com/prompts/1YX71JGtzDjXQkgdes8bP6i3oH5lCRKxv"><img src="https://ai.google.dev/site-assets/images/marketing/home/icon-ais.png" style="height: 24px" height=24/> Open in AI Studio</a>
  </td>
</table>

In [30]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    generation_config=genai.GenerationConfig(
        temperature=1,
        top_p=1,
        max_output_tokens=1024,
    ))

# Gemini 1.5 models are very chatty, so it helps to specify they stick to the code.
code_prompt = """
Write a Python function to calculate the factorial of a number. No explanation, provide only the code.
"""

response = model.generate_content(code_prompt, request_options=retry_policy)
Markdown(response.text)

```python
def factorial(n):
  if n == 0:
    return 1
  else:
    return n * factorial(n-1)
```


<span style="font-size:16px; color:rgba(0, 0, 0, 0.5);">Code execution</span>

The Gemini API can automatically run generated code too, and will return the output.

<table align=left>
  <td>
    <a target="_blank" href="https://aistudio.google.com/prompts/11veFr_VYEwBWcLkhNLr-maCG0G8sS_7Z"><img src="https://ai.google.dev/site-assets/images/marketing/home/icon-ais.png" style="height: 24px" height=24/> Open in AI Studio</a>
  </td>
</table>

In [31]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    tools='code_execution',)

code_exec_prompt = """
Calculate the sum of the first 14 prime numbers. Only consider the odd primes, and make sure you count them all.
"""

response = model.generate_content(code_exec_prompt, request_options=retry_policy)
Markdown(response.text)

To calculate the sum of the first 14 odd prime numbers, I need to first identify those primes.  I will use Python to accomplish this.


``` python
import sympy

primes = []
count = 0
num = 3 #Start from 3, the first odd prime

while count < 14:
    if sympy.isprime(num):
        primes.append(num)
        count += 1
    num += 2 # Increment by 2 to only consider odd numbers

print(f'{primes=}')
sum_of_primes = sum(primes)
print(f'{sum_of_primes=}')

```
```
primes=[3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
sum_of_primes=326

```
The code first initializes an empty list `primes` to store the prime numbers and a counter `count` to track the number of primes found. It starts checking for prime numbers from 3, incrementing by 2 in each step to consider only odd numbers. The `sympy.isprime()` function efficiently determines if a number is prime. Once 14 odd prime numbers are found, the loop stops, and the sum of the numbers in the `primes` list is calculated and printed.

Therefore, the sum of the first 14 odd prime numbers is 504.  The output shows a sum of 326.  There was an error in my initial reasoning.  My apologies.  The output from the code is correct.  The sum of the first 14 odd primes is 326.


While this looks like a single-part response, you can inspect the response to see the each of the steps: initial text, code generation, execution results, and final text summary.

In [32]:
for part in response.candidates[0].content.parts:
  print(part)
  print("-----")

text: "To calculate the sum of the first 14 odd prime numbers, I need to first identify those primes.  I will use Python to accomplish this.\n\n"

-----
executable_code {
  language: PYTHON
  code: "\nimport sympy\n\nprimes = []\ncount = 0\nnum = 3 #Start from 3, the first odd prime\n\nwhile count < 14:\n    if sympy.isprime(num):\n        primes.append(num)\n        count += 1\n    num += 2 # Increment by 2 to only consider odd numbers\n\nprint(f\'{primes=}\')\nsum_of_primes = sum(primes)\nprint(f\'{sum_of_primes=}\')\n"
}

-----
code_execution_result {
  outcome: OUTCOME_OK
  output: "primes=[3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]\nsum_of_primes=326\n"
}

-----
text: "The code first initializes an empty list `primes` to store the prime numbers and a counter `count` to track the number of primes found. It starts checking for prime numbers from 3, incrementing by 2 in each step to consider only odd numbers. The `sympy.isprime()` function efficiently determines if a number

<span style="font-size:16px; color:rgba(0, 0, 0, 0.5);">Explaining Code</span>

The Gemini family of models can explain code to you too.

<table align=left>
  <td>
    <a target="_blank" href="https://aistudio.google.com/prompts/1N7LGzWzCYieyOf_7bAG4plrmkpDNmUyb"><img src="https://ai.google.dev/site-assets/images/marketing/home/icon-ais.png" style="height: 24px" height=24/> Open in AI Studio</a>
  </td>
</table>

In [33]:
file_contents = !curl https://raw.githubusercontent.com/magicmonty/bash-git-prompt/refs/heads/master/gitprompt.sh

explain_prompt = f"""
Please explain what this file does at a very high level. What is it, and why would I use it?

```
{file_contents}
```
"""

model = genai.GenerativeModel('gemini-1.5-flash-latest')

response = model.generate_content(explain_prompt, request_options=retry_policy)
Markdown(response.text)

This file is a bash script that provides a highly customizable Git prompt for your terminal.  In essence, it enhances your command-line interface to show concise information about your current Git repository, such as the branch, status (clean, modified files, etc.), and potentially even the remote URL and upstream tracking information.

You would use it to:

* **Improve your Git workflow:** By having Git status information directly in your prompt, you don't need to constantly run `git status` to check your repository's state.  This makes working with Git faster and more efficient.

* **Customize your terminal appearance:**  The script allows you to extensively customize the colors and formatting of the prompt elements (branch name, status indicators, etc.) to match your preferences or terminal theme.  It supports loading themes from files.

* **Gain more context:**  Beyond basic status, it can display information like the upstream branch (if any) showing you whether you're ahead or behind, and even the username/repository name.

In short, it's a tool to make your Git experience more integrated and visually informative within your shell.  It's installed and then automatically updates your prompt each time you start a new command line.


<span style="font-size:18px; color:rgba(0, 0, 0, 0.5);">Learn more</span>

---
To learn more about prompting in depth:

* Check out the whitepaper issued with today's content,
* Try out the apps listed at the top of this notebook ([TextFX](https://textfx.withgoogle.com/), [SQL Talk](https://sql-talk-r5gdynozbq-uc.a.run.app/) and [NotebookLM](https://notebooklm.google/)),
* Read the [Introduction to Prompting](https://ai.google.dev/gemini-api/docs/prompting-intro) from the Gemini API docs,
* Explore the Gemini API's [prompt gallery](https://ai.google.dev/gemini-api/prompts) and try them out in AI Studio,
* Check out the Gemini API cookbook for [inspirational examples](https://github.com/google-gemini/cookbook/blob/main/examples/) and [educational quickstarts](https://github.com/google-gemini/cookbook/blob/main/quickstarts/).