# Gemini API: Audio Quickstart

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Audio.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

This notebook provides an example of how to prompt Gemini 1.5 Pro using an audio file. In this case, you'll use a [sound recording](https://www.jfklibrary.org/asset-viewer/archives/jfkwha-006) of President John F. Kennedy’s 1961 State of the Union address.

In [1]:
!pip install -U -q google-generativeai

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/142.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━[0m [32m133.1/142.1 kB[0m [31m4.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m142.1/142.1 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/663.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━[0m [32m419.8/663.6 kB[0m [31m12.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m663.6/663.6 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import google.generativeai as genai

## Configure your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [4]:
from google.colab import userdata
genai.configure(api_key=userdata.get('GOOGLE_API_KEY'))

## Upload an audio file with the File API

To use an audio file in your prompt, you **must first upload** it using the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb).

Unlinke image input, which only requires us to open the image, we need to upload the audio to the Gemini API.

In [5]:
!wget -q "https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3" -O sample.mp3

In [7]:
#path should have the file path directly in the Google colab folder.
your_file = genai.upload_file(path='./sample.mp3')

## Use the file in your prompt

In [8]:
prompt = "Listen carefully to the following audio file. Provide a brief summary."
model = genai.GenerativeModel('models/gemini-1.5-pro-latest')
response = model.generate_content([prompt, your_file]) # And feed the prompt and file path to the generate_content directly.
print(response.text)

## Summary of President John F. Kennedy's 1961 State of the Union Address:

**Domestic Challenges:**

*   The US economy was in a recession with high unemployment and slow economic growth.
*   Kennedy proposed a national program to stimulate the economy, including:
    *   Improved unemployment benefits and aid programs.
    *   Investment in infrastructure and housing.
    *   Increased minimum wage.
    *   Tax incentives for businesses.
    *   Development of natural resources.
*   He highlighted the need for investments in education, housing, healthcare, and infrastructure.

**International Challenges:**

*   The Cold War and the threat of communism from the Soviet Union and China were major concerns. 
*   Kennedy emphasized the importance of a strong military and alliances like NATO. 
*   He also stressed the need for economic aid and development programs to help other nations resist communism. 
*   He discussed the situation in Laos, the Congo, and Latin America, where communist 

## Count audio tokens

You can count the number of tokens in your audio file like this.

In [12]:
model.count_tokens([your_file])

total_tokens: 83552

## Learning more

* Learn more about the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb) with the quickstart.

* Learn more about prompting with [media files](https://ai.google.dev/tutorials/prompting_with_media) in the docs, including the supported formats and maximum length for audio files.