# Gemini API: Embeddings Quickstart

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Embeddings.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

The Gemini API generates state-of-the-art text embeddings. An embedding is a list of floating point numbers that represent the meaning of a word, sentence, or paragraph. You can use embeddings in many downstream applications like document search.

A more deep-dice comprehensive knowledge of text/word embedding, [Text Embeddings: Comprehensive Guide](https://towardsdatascience.com/text-embeddings-comprehensive-guide-afd97fce8fb5)


This notebook provides quick code examples that show you how to get started generating embeddings.

In [None]:
!pip install -U -q google.generativeai # Install the Python SDK

## Configure your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see  [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [2]:
import google.generativeai as genai
from google.colab import userdata
genai.configure(api_key=userdata.get('GOOGLE_API_KEY'))

## Embed content

Call the `embed_content` method with the `models/embedding-001` model to generate text embeddings.

In [None]:
text = "Hello world"
result = genai.embed_content(model="models/embedding-001", content=text)

# Print just a part of the embedding to keep the output manageable
print(str(result['embedding'])[:50], '... TRIMMED]')

[0.04703258, -0.040190056, -0.029026963, -0.026809 ... TRIMMED]


In [15]:
text = 'What is going on here?'
result = genai.embed_content(model='models/embedding-001', content = text)

#I prefer to see the length of embedded list.
print(len(result['embedding']))

768


In [None]:
print(len(result['embedding'])) # The embeddings have 768 dimensions for wordvectoring

768


## Batch embed content

You can embed a list of multiple prompts with one API call for efficiency. Because `GenerativeModel.embed_content` returns **dictionary** of each input string with its embedding result.

In [None]:
result = genai.embed_content(
    model="models/embedding-001",
    content=[
      'What is the meaning of life?',
      'How much wood would a woodchuck chuck?',
      'How does the brain work?'])

for embedding in result['embedding']:
  print(str(embedding)[:50], '... TRIMMED]')

[-0.0002620658, -0.05592018, -0.012463195, -0.0206 ... TRIMMED]
[-0.0151748555, -0.050790474, -0.032357067, -0.058 ... TRIMMED]
[0.025271073, -0.064161226, -0.025818137, -0.00611 ... TRIMMED]


## Use `task_type` to provide a hint to the model how you'll use the embeddings

Let's look at all the parameters the `embed_content` method takes. There are four:

* `model`: Required. Must be `models/embedding-001`.
* `content`: Required. The content that you would like to embed. As example above, content can be input as a list of strings and it will return in dictionary
*`task_type`: Optional. The task type for which the embeddings will be used. See below for possible values.
* `title`: Optional. You should only set this parameter if your task type is `retrieval_document` (or `document`).

`task_type` is an optional parameter that provides a hint to the API about how you intend to use the embeddings in your application.

The following task_type parameters are accepted:

* `unspecified`: If you do not set the value, it will **default** to `retrieval_query`.
* `retrieval_query` (or `query`): The given text is a query in a search/retrieval setting.
* `retrieval_document` (or `document`): The given text is a document from a corpus being searched. Optionally, also set the `title` parameter with the title of the document.
* `semantic_similarity` (or `similarity`): The given text will be used for  Semantic Textual Similarity (STS).
* `classification`: The given text will be classified.
* `clustering`: The embeddings will be used for clustering.


In [None]:
# Notice the API returns different embeddings depending on `task_type`
result1 = genai.embed_content(
    model="models/embedding-001",
    content="Hello world")

result2 = genai.embed_content(
    model="models/embedding-001",
    content="Hello world",
    task_type="document",)

print(str(result1['embedding'])[:50], '... TRIMMED]')
print(str(result2['embedding'])[:50], '... TRIMMED]')

[0.04703258, -0.040190056, -0.029026963, -0.026809 ... TRIMMED]
[0.05889487, -0.004501751, -0.067298084, -0.012740 ... TRIMMED]


## Learning more

Check out these examples in the Cookbook to learn more about what you can do with embeddings(I would recommend this practice after [Function_calling.ipynb](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Function_calling.ipynb) and [Function_calling_config.ipynb](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Function_calling_config.ipynb)):

* [Search Reranking](https://github.com/google-gemini/cookbook/blob/main/examples/Search_reranking_using_embeddings.ipynb): Use embeddings from the Gemini API to rerank search results from Wikipedia.

* [Anomaly detection with embeddings](https://github.com/google-gemini/cookbook/blob/main/examples/Anomaly_detection_with_embeddings.ipynb): Use embeddings from the Gemini API to detect potential outliers in your dataset.

* [Train a text classifier](https://github.com/google-gemini/cookbook/blob/main/examples/Classify_text_with_embeddings.ipynb): Use embeddings from the Gemini API to train a model that can classify different types of newsgroup posts based on the topic.

* Embeddings have many applications in Vector Databases, too. Check out this [example with Chroma DB](https://github.com/google/generative-ai-docs/blob/main/examples/gemini/python/vectordb_with_chroma/vectordb_with_chroma.ipynb).

You can learn more about embeddings in general on ai.google.dev in the [embeddings guide](https://ai.google.dev/docs/embeddings_guide)

* You can find additional code examples with the Python SDK [here](https://ai.google.dev/tutorials/python_quickstart#use_embeddings).

* You can also find more details in the API Reference for [embedContent](https://ai.google.dev/api/rest/v1/models/embedContent) and [batchEmbedContents](https://ai.google.dev/api/rest/v1/models/batchEmbedContents).