In [None]:
import os
import requests

# Document retrieval: upsert and basic query usage

In this walkthrough we will go over the Retrieval API with a Azure CosmosDB Mongo vCore datastore for semantic search.

Before running the notebook please initialize the retrieval API and have it running locally somewhere. Please follow the instructions to start the Retreival API provided [here](https://github.com/openai/chatgpt-retrieval-plugin#quickstart). 

[Azure Cosmos DB](https://azure.microsoft.com/en-us/products/cosmos-db/) Azure Cosmos DB is a fully managed NoSQL and relational database for modern app development. Using Azure Cosmos DB for MongoDB vCore, you can store vector embeddings in your documents and perform [vector similarity search](https://learn.microsoft.com/azure/cosmos-db/mongodb/vcore/vector-search) on a fully managed MongoDB-compatible database service.

Learn more about Azure Cosmos DB for MongoDB vCore [here](https://learn.microsoft.com/azure/cosmos-db/mongodb/vcore/). If you don't have an Azure account, you can start setting one up [here](https://azure.microsoft.com/).

## Document

First we will create a list of documents. From the perspective of the retrieval plugin, a [document](https://github.com/openai/chatgpt-retrieval-plugin/blob/main/models/models.py) consists of an "id", "text", "embedding"(optional) and a collection of "metadata". The "metadata" has "source", "source_id", "created_at", "url" and "author" fields. Query metadata does not expose the "url" field.

For this example we have taken some data about a few dog breeds. 

In [None]:
document_1 = {
    "id": "Siberian Husky",
    "text": "Siberian Huskies are strikingly beautiful and energetic Arctic breed dogs known for their captivating blue eyes and remarkable endurance in cold climates."
}

document_2 = {
    "id": "Alaskan Malamute",
    "text": "The Alaskan Malamute is a powerful and friendly Arctic sled dog breed known for its strength, endurance, and affectionate nature."
}

document_3 = {
    "id": "Samoyed",
    "text": "The Samoyed is a cheerful and fluffy Arctic breed, renowned for its smile and gentle disposition, originally used for herding reindeer and pulling sleds in Siberia."
}

## Indexing the Docs

On the first insert, the datastore will create the collection and index if necessary on the field `embedding`. Currently hybrid search is not yet supported. 

To make these requests to the retrieval app API, we will need to provide authorization in the form of the BEARER_TOKEN we set earlier. We do this below:

In [None]:
BEARER_TOKEN_HERE = ""
endpoint_url = 'http://0.0.0.0:8000'
headers = {
    "Authorization": f"Bearer {BEARER_TOKEN_HERE}"
}

In [None]:
response = requests.post(
    f"{endpoint_url}/upsert",
    headers=headers,
    json={"documents": [document_1, document_2, document_3]
    }
)

response.json()

## Querying the datastore
Let's query the data store for dogs based on the place of their origin.

In [None]:
queries = [
    {
        "query":"I want dog breeds from Siberia.",
        "top_k":2
    },
    {
        "query":"I want dog breed from Alaska.",
        "top_k":1
    }
]

response = requests.post(
    f"{endpoint_url}/query",
    headers=headers,
    json={"queries":queries}
)

response.json()

## Deleting the data from the datastore
You can either delete all the data, or provide a list of docIds to delete

In [None]:
response = requests.delete(
    f"{endpoint_url}/delete",
    headers=headers,
    json={"ids":["doc:SiberianHusky:chunk:SiberianHusky_0"]}
)

response.json()

In [None]:
response = requests.delete(
    f"{endpoint_url}/delete",
    headers=headers,
    json={"delete_all":True}
)

response.json()