[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/canopy/canopy-azure-openai.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/canopy/canopy-azure-openai.ipynb)


# Canopy and Azure OpenAI

This notebook accompanies the [Canopy x Azure OpenAI blog post]().  

This demo is optimized for Google Colab, but can also run locally as a Jupyter notebook.

# Setup
Follow the steps below to get everything setup for this demo.

## 1. Install libraries and set credentials

In [1]:
# Install latest Canopy

!pip install -qU canopy-sdk==0.6.0


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
# Confirm Canopy was installed
# Should print out something like "Usage: canopy [OPTIONS] COMMAND [ARGS]..."
!canopy

Usage: canopy [OPTIONS] COMMAND [ARGS]...

  CLI for Pinecone Canopy. Actively developed by Pinecone.
  To use the CLI, you need to have a Pinecone account.
  Visit https://www.pinecone.io/ to sign up for free.

Options:
  -v, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  new       Create a new Pinecone index that will be used by Canopy.
  upsert    Upload local data files to the Canopy service.
  start     Start the Canopy server.
  chat      Debugging tool for chatting with the Canopy RAG service.
  health    Check if the Canopy server is running and healthy.
  stop      Stop the Canopy server.
  api-docs  Open the Canopy server docs.


In [3]:
# Python version used (3.10.12):
!python --version

Python 3.10.12


There are a few Canopy-specific environment variables we'll need to set.

If you're in Google Colab, you can also set these using the secrets tab and retrieve them like this:

```
from google.colab import userdata
some_secret = userdata.get('secret-name')
```

In [1]:
import os

os.environ['PINECONE_API_KEY'] = ""  # Make sure this is the API key associated with your *Serverless* project (app.pinecone.io)
os.environ['AZURE_OPENAI_API_KEY'] = ""  # In your Azure OpenAI account (oai.azure.com/portal), go to settings (wheel cog) to find key and endpoint
os.environ['AZURE_OPENAI_ENDPOINT'] = ""  # Make sure this ends with "/"
os.environ['INDEX_NAME'] = ""  # This can be anything you want

pinecone_api_key = os.getenv('PINECONE_API_KEY')
azure_openai_api_key = os.getenv('AZURE_OPENAI_API_KEY')
azure_endpoint = os.getenv('AZURE_OPENAI_ENDPOINT')
index_name = os.getenv('INDEX_NAME')


## 2. Make or log into your Pinecone account

To use Canopy, you'll need a Pinecone account. If you already have one, simply grab your API key(s). You can find your API key(s) at [app.pinecone.io](https://app.pinecone.io/).

Note: you'll be creating a Pinecone [*serverless* index](https://www.pinecone.io/blog/serverless/) in this demo, so ensure your API key is the one associated with your *serverless* project, if you have both pods-based and serverless indexes.

[Read about Serverless](https://docs.pinecone.io/docs/new-api).



## 3. Get access to Azure OpenAI Studio

Before you can use Canopy with Azure OpenAI Studio, you need to have access to Azure OpenAI Studio. Since access is not completely public yet, you need to [apply](https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xUNTZBNzRKNlVQSFhZMU9aV09EVzYxWFdORCQlQCN0PWcu).

Once you've been approved, make an account, and sign in.

## 4. Deploy some OpenAI models on Azure OpenAI Studio

In order to use Azure OpenAI with Canopy, you need deploy two models: an embedding model and an LLM.

![preview of Azure deployments page](https://raw.githubusercontent.com/pinecone-io/examples/master/learn/generation/canopy/canopy-azure-openai-studio/azure-deployments.png)


We'll choose `ada-002` as our embedding model, and `gpt-3.5-turbo` as our LLM.


You'll need their "deployment names" (the custom names you give them in Azure OpenAI Studio) later, so keep these in mind, too.

**!! Note:** Azure OpenAI Service only supports OpenAI's `chat_completion` endpoint (the endpoint we will use to chat with our documents) with [particular LLMs and LLM versions](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/function-calling?tabs=python#using-function-in-the-chat-completions-api-deprecated). Ensure your LLM is compatible (`gpt-3.5-turbo` must be model version `0613`.)



# Create Canopy components

There are various ways to build a RAG application with Canopy. In this notebook, we'll be building each of the core Canopy library's [component parts](https://github.com/pinecone-io/canopy/blob/main/README.md#rag-with-canopy) manually to gain a deep understanding of how everything works.

The high-level steps we'll take are:
1. Build a `KnowledgeBase`
2. Build a `ContextEngine`
3. Build a `ChatEngine`

If you want to use a [configuration file](https://github.com/pinecone-io/canopy/blob/v0.6.0/config/azure.yaml) instead to interact with Canopy, see the last section of this notebook, "Load from config."

## 1. KnowledgeBase

The `KnowledgeBase` object is responsible for storing and indexing text documents.

Once documents are indexed, the `KnowledgeBase` can be queried.

In [6]:
# Note, if the following KnowledgeBase creation step throws an error regarding the missing '_ilp64' attribute, update the numpy version you're running:
# !pip install numpy==1.24.4

In [4]:
from canopy.knowledge_base.knowledge_base import KnowledgeBase
from canopy.tokenizer import Tokenizer
from canopy.knowledge_base.record_encoder.azure_openai import AzureOpenAIRecordEncoder

# We need to initialize a Tokenizer at startup, so that Canopy can chunk and vectorize our documents + vectorize our search queries later:
Tokenizer.initialize()

# When working with Azure OpenAI, we need to instantiate a specific AzureOpenAIRecordEncoder
# We need to pass a deployment name to this encoder:
encoder = AzureOpenAIRecordEncoder(model_name='canopy-azure-embed-model')  # Change this to your embedding model's deployment name

# Create our KnowledgeBase!
kb = KnowledgeBase(index_name=index_name, record_encoder=encoder)


  from tqdm.autonotebook import tqdm


## 1a. Create and connect to your Canopy index

In [6]:
# Create index

kb.create_canopy_index()

In [7]:
# Connect to the index your created

kb.connect()

## 1b. Populate your Canopy index

You'll take a 10-row excerpt from [this HuggingFace dataset](https://huggingface.co/datasets/jamescalam/ai-arxiv/viewer/default/train) as our demo data.

It contains arXiv.org research articles and some metadat about them.

The excerpt you'll use is on Github already, so you can just load it straight from there using `requests`.

In [8]:
import requests

# 10-row excerpt from https://huggingface.co/datasets/jamescalam/ai-arxiv/viewer/default/train dataset

# Original GitHub directory and file name
github_dir = "https://github.com/pinecone-io/examples/blob/master/learn/generation/canopy/"
filename = "ai-dev-demo.jsonl"

# Convert GitHub URL to raw content URL
raw_url = github_dir.replace("https://github.com/", "https://raw.githubusercontent.com/").replace("/blob", "") + filename

# Use requests to download the file
response = requests.get(raw_url)

# Check if the request was successful
if response.status_code == 200:
    # Write the content to a file
    with open(filename, 'wb') as file:
        file.write(response.content)
    print(f"File '{filename}' downloaded successfully.")
else:
    print(f"Failed to download the file. Status code: {response.status_code}")


File 'ai-dev-demo.jsonl' downloaded successfully.


In [9]:
import pandas as pd

# Turn it into a Pandas dataframe to get a look at what's inside

df = pd.read_json('ai-dev-demo.jsonl', lines=True)
df.head()


Unnamed: 0,id,source,text,metadata
0,2210.03945,http://arxiv.org/pdf/2210.03945,UNDERSTANDING HTML WITH LARGE LANGUAGE\nMODELS...,"{'primary_category': 'cs.LG', 'published': '20..."
1,1711.05101,http://arxiv.org/pdf/1711.05101,Published as a conference paper at ICLR 2019\n...,"{'primary_category': 'cs.LG', 'published': '20..."
2,2305.17493,http://arxiv.org/pdf/2305.17493,THECURSE OF RECURSION :\nTRAINING ON GENERATED...,"{'primary_category': 'cs.LG', 'published': '20..."
3,2205.09712,http://arxiv.org/pdf/2205.09712,2022-5-20\nSelection-Inference: Exploiting Lar...,"{'primary_category': 'cs.AI', 'published': '20..."
4,2104.06001,http://arxiv.org/pdf/2104.06001,Gender Bias in Machine Translation\nBeatrice S...,"{'primary_category': 'cs.CL', 'published': '20..."


In [10]:
from canopy.models.data_models import Document

# Turn data into Document objects, so you can index them into your Canopy index.
documents = [Document(**row) for _, row in df.iterrows()]


In [21]:
from tqdm.auto import tqdm

# Batch-upsert your objects into your Canopy index:
batch_size = 100

for i in tqdm(range(0, len(documents), batch_size)):
    kb.upsert(documents[i: i+batch_size])


  0%|          | 0/1 [00:00<?, ?it/s]

In [11]:
# Confirm all of your chunks were vectorized and inserted correctly (should have 805 vectors in your index)

kb._index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 805}},
 'total_vector_count': 805}

## 1c. Query your Canopy index

Once your `KnowledgeBase` is built and your Canopy index has been populated, you can issue standard semantic search queries.

Note that these results are simply semantic search results; there is no generative component yet.


In [12]:
from canopy.models.data_models import Query

results = kb.query([Query(text="Who is Aitchison?")])  # Aitchison is a researcher often cited in one of the articles in the dataset
results


[QueryResult(query='Who is Aitchison?', documents=[DocumentWithScore(id='2205.01663_81', text='here for a couple of nights.+manual Arguably the most natural interpre-\ntation is that this means the injuries\nare worse than expected. However,\nthere are other plausible interpreta-\ntions.\nBut I am content. Events have been set in\nmotion. I won\'t die forgotten.\n/leftrightline →She sighed. "Not like Alex.base,\n+manual,\n+paraphrasesThis is clearly injurious, as it men-\ntions a new character who died.\n"Yeah, just another memory-dream. I re-\nmembered the last time I saw my old Sensei.\nHey, do you guys know anything about an\norganization called \'Akatsuki\'?"\n"Yeah, but why would you want to know\nabout them? They\'re an S-class organiza-\ntion, not something you want to mess with.\n/leftrightline →You\'re better off just remembering your\npast life in this life.base,\n+manual,\n+paraphrases,\n+tool-assistedThe completion reveals the likely\nexistence of a person who died in the\n

# 2. ContextEngine

The `ContextEngine` is the object responsible for retrieving the most relevant context for a given query and token budget.  

While `KnowledgeBase` retrieves the full `top-k` search results for a query, the `ContextEngine` transforms this information into "prompt-ready" context that can later be fed to an LLM.

More complex behaviors can be achieved by providing a custom `ContextBuilder` class.

In [13]:
from canopy.context_engine import ContextEngine

# Create your ContextEngine:
context_engine = ContextEngine(kb)


In [14]:
import json

# Query your Canopy index via your ContextEngine
# You can bump up the max_context_tokens to 16k, since we know that's ada-002's limit (and play around w/other values)
    # Reference: https://community.openai.com/t/gpt-3-5-turbo-0613-function-calling-16k-context-window-and-lower-prices/263263
result = context_engine.query([Query(text="Who is Aitchison?", top_k=3)], max_context_tokens=16000)

# Print your retrieved context and its # tokens:
print(result.to_text(indent=2))
print(f"\n# tokens in context returned: {result.num_tokens}")


[
  {
    "query": "Who is Aitchison?",
    "snippets": [
      {
        "source": "http://arxiv.org/pdf/2205.01663",
        "text": "here for a couple of nights.+manual Arguably the most natural interpre-\ntation is that this means the injuries\nare worse than expected. However,\nthere are other plausible interpreta-\ntions.\nBut I am content. Events have been set in\nmotion. I won't die forgotten.\n/leftrightline \u2192She sighed. \"Not like Alex.base,\n+manual,\n+paraphrasesThis is clearly injurious, as it men-\ntions a new character who died.\n\"Yeah, just another memory-dream. I re-\nmembered the last time I saw my old Sensei.\nHey, do you guys know anything about an\norganization called 'Akatsuki'?\"\n\"Yeah, but why would you want to know\nabout them? They're an S-class organiza-\ntion, not something you want to mess with.\n/leftrightline \u2192You're better off just remembering your\npast life in this life.base,\n+manual,\n+paraphrases,\n+tool-assistedThe completion reveals t

## 3. ChatEngine (RAG part!)

Canopy's `ChatEngine` is a one-stop-shop RAG Chatbot.

The `ChatEngine` wraps your LLM and provides it the fetched context from your Canoy index. It can also reformulate your queries to optimize them for Pinecone retrieval, by breaking them down into phrases and subqueries.

Since you'll want to use your Azure OpenAI Studio's LLM, you'll need to declare it explicilty, along with a `FunctionCallingQuery` object.

**Note:**
- Azure OpenAI Service only supports OpenAI's `chat_completion` endpoint with [particular models and model versions](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/function-calling?tabs=python#using-function-in-the-chat-completions-api-deprecated).

- If supported models and model versions are not available in your Azure OpenAI deployment, you can replace `FunctionCallingQueryGenerator` with the [`InstructionQueryGenerator`](https://github.com/pinecone-io/canopy/blob/1052edb8a3c75387f40d95bf053a2edd90db78e5/src/canopy/chat_engine/query_generator/instruction.py#L58), which will circumvent OpenAI's “function calling” feature altogether.


In [15]:
from canopy.chat_engine import ChatEngine
from canopy.llm.azure_openai_llm import AzureOpenAILLM
from canopy.models.data_models import UserMessage
from canopy.chat_engine.query_generator import FunctionCallingQueryGenerator

# Declare LLM
llm = AzureOpenAILLM(model_name='canopy-azure-llm')  # Must be model version 0613, too!

# Pass LLM to FunctionCallingQueryGenerator class
query_builder = FunctionCallingQueryGenerator(llm=llm)

# Build your ChatEngine
chat_engine = ChatEngine(context_engine=context_engine,
                         llm=llm,
                         query_builder=query_builder)

# Bump up your ChatEngine's max_context_tokens budget to grab as much info as possible from your Canopy index
chat_engine.max_context_tokens = 16000


In [16]:

# Define a question to send your RAG application, make it as complicated as you want!
messages = [UserMessage(content="who is Aitchison? has he said anything about weight decay? if so, what has he said?")]

# Send your query to your ChatEngine and inspect the response
response = chat_engine.chat(messages)
response

ChatResponse(id='chatcmpl-8sFIzmet9kfCZKRP0GxqiNMaalxuv', object='chat.completion', created=1707939237, model='gpt-35-turbo', choices=[_Choice(index=0, message=MessageBase(role=<Role.ASSISTANT: 'assistant'>, content="Aitchison is a person mentioned in the provided context. According to the context, Aitchison has discussed weight decay in the framework of Bayesian filtering for adaptive gradient algorithms. Aitchison's theory suggests that weight decay, rather than L2 regularization, emerges through the application of Bayesian filtering. This theory provides a theoretical framework to understand the superiority of weight decay over L2 regularization. Aitchison's work is summarized in the context to shed light on why weight decay may be favored over L2 regularization."), finish_reason='stop')], usage=TokenCounts(prompt_tokens=2467, completion_tokens=102, total_tokens=2569), debug_info={})

# Load from config

An easy, alternative way to play with the hyperparameters seen above is to load (and edit, if necessary) one of the example configuration files that ship with Canopy.

Below we'll show how to load the default configuration file, but the same would go for the Azure file, or any other configuration file.

In [4]:
# Grab config file from Github (or wherever)

import yaml
import requests

github_dir = "https://github.com/pinecone-io/canopy/blob/v0.6.0/config/"
filename = "config.yaml"

# Convert GitHub URL to raw content URL
raw_url = github_dir.replace("https://github.com/", "https://raw.githubusercontent.com/").replace("/blob", "") + filename

# Grab config file from Github
response = requests.get(raw_url)

# Check if the request was successful
if response.status_code == 200:

    # Load the YAML content from the response
    config = yaml.safe_load(response.text)

    # Optionally, write the content to a file (if you want to edit it, for instance)
    with open(filename, 'w') as f:
        yaml.dump(config, f)

    print("Config file downloaded and processed successfully.")
else:
    print(f"Failed to download the file. Status code: {response.status_code}")



Config file downloaded and processed successfully.


In [3]:
# Inspect the config file
config

{'system_prompt': "Use the following pieces of context to answer the user question at the next messages. This context retrieved from a knowledge database and you should use only the facts from the context to answer. Always remember to include the source to the documents you used from their 'source' field in the format 'Source: $SOURCE_HERE'.\nIf you don't know the answer, just say that you don't know, don't try to make up an answer, use the context.\nDon't address the context directly, but use it to answer the user question like it's your own knowledge.\n",
 'query_builder_prompt': "Your task is to formulate search queries for a search engine, to assist in responding to the user's question.\nYou should break down complex questions into sub-queries if needed.\n",
 'tokenizer': {'type': 'OpenAITokenizer',
  'params': {'model_name': 'gpt-3.5-turbo'}},
 'chat_engine': {'params': {'max_prompt_tokens': 4096,
   'max_generated_tokens': None,
   'max_context_tokens': None,
   'system_prompt': 

In [5]:
from canopy_cli.errors import ConfigError
from canopy.tokenizer import Tokenizer
from canopy.chat_engine import ChatEngine

# Load and initialize Tokenizer
tokenizer_config = config.get("tokenizer", {})

try:
    Tokenizer.initialize_from_config(tokenizer_config)
except ValueError:
    print('Tokenizer already initialized, continuing onto ChatEngine initialization')

# Load and Initialize ChatEngine
if "chat_engine" not in config:
    raise ConfigError(
        f"Config file {config} must contain a 'chat_engine' section"
    )
chat_engine_config = config["chat_engine"]
try:
    chat_engine = ChatEngine.from_config(chat_engine_config)
except Exception as e:
    raise ConfigError(
        f"Failed to initialize chat engine from config file {config}."
        f" Error: {str(e)}"
    )

# Instantiate LLM
llm = chat_engine.llm
# Instantiate ContextEngine
context_engine = chat_engine.context_engine
# Instantiate KnowledgeBase
kb = context_engine.knowledge_base

In [20]:
# Then go on as usual, e.g.:

from canopy.models.data_models import Query
from canopy.models.data_models import UserMessage

# Connect to your KnowledgeBase
kb.connect()

# Query your KnowledgeBase
kb.query([Query(text="Who is Aitchison?")])

# Query your ContextEngine
context_engine.query([Query(text="Who is Aitchison?", top_k=3)], max_context_tokens=1000)

# Query your ChatEngine
chat_engine.chat([UserMessage(content="who is Aitchison? has he said anything about weight decay? if so, what has he said?")])


ChatResponse(id='chatcmpl-8sFJc0PFQ25JjEkIj2FMpks8D5Xns', object='chat.completion', created=1707939276, model='gpt-35-turbo', choices=[_Choice(index=0, message=MessageBase(role=<Role.ASSISTANT: 'assistant'>, content="Aitchison is a researcher who has discussed weight decay in the context of adaptive gradient algorithms and Bayesian filtering. Aitchison's theory provides a theoretical framework to understand the superiority of weight decay over L2 regularization in adaptive gradient algorithms. According to the theory, weight decay emerges through the straightforward application of Bayesian filtering. Aitchison's work suggests that weight decay plays a crucial role in optimization algorithms. However, the context does not specify any specific statements made by Aitchison about weight decay. \n\nSource: http://arxiv.org/pdf/1711.05101"), finish_reason='stop')], usage=TokenCounts(prompt_tokens=2467, completion_tokens=114, total_tokens=2581), debug_info={})