# Prompt Engineering with Azure Cognitive Search

Use Azure Cognitive Search to retrieve relevant content to build effective prompt for Azure Open AI. The example below uses LangChain modules to perform the task.

## Setup
#### Follow [README](https://github.com/tirtho/open-ai/blob/main/README.md) and perform setup before running the notebooks

#### Reference :
- [Azure Open AI](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/overview)
- [LangChain home page](https://python.langchain.com/docs/get_started/introduction.html)
- [Azure Cognitive Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search)

#### Load the API key and relevant Python libaries.

#### Install the python libraries
- > pip install openai num2words matplotlib plotly scipy scikit-learn pandas tiktoken 

- > pip install --index-url=https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/ azure-search-documents==11.4.0a20230509004



#### Load the API keys

In [1]:
import openai
import sys

from azure_openai_setup import set_openai_config, get_openai_global_config_parameters

set_openai_config()

theOpenAIParams, modelName, modelDeploymentName = get_openai_global_config_parameters()

Got Azure OpenAI Credentials from Azure Key Vault with Azure CLI Auth


#### Get the Azure Cognitive Search keys from Azure Key Vault
Note: You need the Search Admin Key

In [2]:
from azure.core.credentials import AzureKeyCredential

from azure_cognitive_search_setup import set_cognitive_search_config

azureSearchAdminKey, azureSearchEndpoint, azureSearchIndexName = set_cognitive_search_config()

Getting Azure Cognitive Search Credentials from Azure Key Vault with Azure CLI Auth


#### Other modules needed

In [3]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.schema import BaseRetriever
from langchain.vectorstores.azuresearch import AzureSearch

#### Create the Azure Open AI Embeddings and AzureSearch classes:

In [4]:
from azure_openai_setup import get_azure_openai_embeddings 

embeddings = get_azure_openai_embeddings()

In [5]:
vector_store: AzureSearch = AzureSearch(
                                azure_search_endpoint=azureSearchEndpoint,
                                azure_search_key=azureSearchAdminKey,
                                index_name=azureSearchIndexName,
                                embedding_function=embeddings.embed_query,
                            )

### Load the BillSum Dataset
BillSum is a dataset of United States Congressional and California state bills. For illustration purposes, we'll look only at the US bills. The corpus consists of bills from the 103rd-115th (1993-2018) sessions of Congress. The data was split into 18,949 train bills and 3,269 test bills. The BillSum corpus focuses on mid-length legislation from 5,000 to 20,000 characters in length. More information on the project and the original academic paper where this dataset is derived from can be found on the BillSum project's GitHub repository.

We saved it in ../data/bill_sum_data.csv

#### Load, cleanup, select text, summary and title columns and select rows with less than 8192 tokens 

In [6]:
from num2words import num2words
import os
import pandas as pd
import numpy as np
import tiktoken
import sys

In [9]:
df=pd.read_csv(os.path.join(os.getcwd(),'./data/bill_sum_data.csv')) # This assumes that you have placed the bill_sum_data.csv in the same directory you are running Jupyter Notebooks
df_bills = df[['bill_id', 'title']]
from langchain.document_loaders import DataFrameLoader

loader = DataFrameLoader(df_bills, page_content_column="title")
docs = loader.load()

#print(docs)

In [8]:
# TODO: Find out the type of 'content_vector' search index field and
# use it to definte the search index in Cognitive Search properly.
# Cognitive Search errors out for type = Collection(Edm.Single)

vector_store.add_documents(documents = docs)

HttpResponseError: () The request is invalid. Details: Cannot convert the literal '-0.0253281369805336' to the expected type 'Edm.String'.
Code: 
Message: The request is invalid. Details: Cannot convert the literal '-0.0253281369805336' to the expected type 'Edm.String'.

## Vector similarity search

In [None]:
search_result_docs = vector_store.similarity_search(
                        query="encourage businesses to improve math and science education at elementary and secondary schools",
                        k=3,
                        search_type="similarity" # do not pass this argument to try a hybrid search
                     )
print(docs[0].page_content)

## TODO
Get the searched text from Azure Cognitive Search and then use it in the prompt for Azur eOpenAI