Skip to content

Document 2.5 Flash thinking in REST/curl examples #723

@maflcko

Description

@maflcko

Description of the feature request:

I was looking for documentation on how to use the new 2.5 Flash Thinking model with REST/curl. However, I couldn't find any examples or docs on how to toggle the thinking mode or set the thinking budget via curl.

I tried the following search: https://github.com/search?q=repo%3Agoogle-gemini%2Fcookbook+thinking+path%3Aquickstarts%2Frest+&type=code

What problem are you trying to solve with this feature?

No response

Any other information you'd like to share?

No response

Activity

X901

X901 commented on Apr 18, 2025

@X901

I ask the same thing here

but I don't get response

I didn't get any errors but it's not feel as it realy disabled as the OpenRouter model

const requestBody = {
            contents: [{
                parts: [
                    { text: preamble },
                    { text: prompt }
                ]
            }],
            generationConfig: {
                temperature: 0.3,   
                response_mime_type: "application/json",
                thinkingConfig: { thinkingBudget: 0 }
                ,response_schema: {
                    type: "object",
                    properties: {
                        translations: {
                            type: "array",
                            items: { type: "string" }
                        }
                    },
                    required: ["array"]
                }
            }
        };
andycandy

andycandy commented on Apr 19, 2025

@andycandy
Collaborator

Here this is how you do it:

prompt = """
    You are playing the 20 question game. You know that what you are looking for
    is a aquatic mammal that doesn't live in the sea, and that's smaller than a
    cat. What could that be and how could you make sure?
"""

MODEL_ID = "gemini-2.5-flash-preview-04-17"
url = f"https://generativelanguage.googleapis.com/v1beta/models/{MODEL_ID}:generateContent?key={GOOGLE_API_KEY}"  

headers = {
    'Content-Type': 'application/json',
}
data = {
    "contents": [{
            "parts":[
                {"text": prompt}
            ]
        }],
    "generationConfig": {
        "thinkingConfig": {
            "thinkingBudget": 0
        }
    }
}
response = requests.post(
    url,
    headers=headers,
    data=json.dumps(data)
)

Output:

{
    "candidates": [
        {
            "content": {
                "parts": [
                    {
                        "text": "Okay, playing 20 Questions with those constraints! Here's what fits the criteria and how to refine:\n\n**What it could be:**\n\nGiven the constraints \u2013 an aquatic mammal that *doesn't* live in the sea and is smaller than a cat \u2013 the most likely candidate is a. .................."
                    }
                ],
                "role": "model"
            },
            "finishReason": "STOP",
            "index": 0
        }
    ],
    "usageMetadata": {
        "promptTokenCount": 58,
        "candidatesTokenCount": 609,
        "totalTokenCount": 667,
        "promptTokensDetails": [
            {
                "modality": "TEXT",
                "tokenCount": 58
            }
        ]
    },
    "modelVersion": "gemini-2.5-flash-preview-04-17"
}

Here you can see it didn't use any thinking token.

Moreover if you rerun the same code but without limiting the token you get this output:

Code:

#before and after stays the same
data = {
    "contents": [{
            "parts":[
                {"text": prompt}
            ]
        }],
}

Output:

{
    "candidates": [
        {
            "content": {
                "parts": [
                    {
                        "text": "Okay, let's break down the clues:\n\n1.  **Aquatic Mammal:** Lives in or spends significant time in water, and is a mammal ..............."
                    }
                ],
                "role": "model"
            },
            "finishReason": "STOP",
            "index": 0
        }
    ],
    "usageMetadata": {
        "promptTokenCount": 58,
        "candidatesTokenCount": 3622,
        "totalTokenCount": 3680,
        "promptTokensDetails": [
            {
                "modality": "TEXT",
                "tokenCount": 58
            }
        ],
        "thoughtsTokenCount": 2696
    },
    "modelVersion": "gemini-2.5-flash-preview-04-17"
}
andycandy

andycandy commented on Apr 19, 2025

@andycandy
Collaborator

@Giom-V Regarding this query should I make a new Get_started_with_thinking_REST notebook to address this?

maflcko

maflcko commented on Apr 21, 2025

@maflcko
Author

The docs are also here: https://ai.google.dev/gemini-api/docs/thinking#set-budget

So I'll close this for now, because my immediate question has been answered. The notebook can still be added, if needed, or course.

X901

X901 commented on Apr 21, 2025

@X901

I tried it, it's too slow as if thinking is still enable when I use the Google AI Studio

but in OpenRouter it faster like Flash 2.0

why performance are different ?

maflcko

maflcko commented on Apr 21, 2025

@maflcko
Author

It could be due to https://twitter.com/OfficialLoganK/status/1912986097765789782, but I don't have any insight here.

In some rare cases, the model still thinks a little even with thinking budget = 0, we are hoping to fix this before we make this model stable and you won't be billed for thinking. The thinking budget = 0 is what triggers the billing switch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @maflcko@X901@andycandy

      Issue actions

        Document 2.5 Flash thinking in REST/curl examples · Issue #723 · google-gemini/cookbook