### Setup

In [1]:
import os

import azure.identity
import dotenv
import openai
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery

dotenv.load_dotenv()

azure_credential = azure.identity.DefaultAzureCredential()

# Initialize Azure OpenAI client
AZURE_OPENAI_SERVICE = os.getenv("AZURE_OPENAI_SERVICE")
AZURE_OPENAI_ADA_DEPLOYMENT = os.getenv("AZURE_OPENAI_ADA_DEPLOYMENT")

token_provider = azure.identity.get_bearer_token_provider(azure_credential, "https://cognitiveservices.azure.com/.default")
openai_client = openai.AzureOpenAI(
    api_version="2023-07-01-preview",
    azure_endpoint=f"https://{AZURE_OPENAI_SERVICE}.openai.azure.com",
    azure_ad_token_provider=token_provider)

def get_embedding(text):
    get_embeddings_response = openai_client.embeddings.create(model=AZURE_OPENAI_ADA_DEPLOYMENT, input=text)
    return get_embeddings_response.data[0].embedding

# Initialize Azure search client
AZURE_SEARCH_SERVICE = os.getenv("AZURE_SEARCH_SERVICE")
AZURE_SEARCH_ENDPOINT = f"https://{AZURE_SEARCH_SERVICE}.search.windows.net"

AZURE_SEARCH_FULL_INDEX = "gptkbindex"
search_client = SearchClient(AZURE_SEARCH_ENDPOINT, AZURE_SEARCH_FULL_INDEX, credential=azure_credential)


#### Vector search is not always optimal

For example, consider searches for exact strings.

In [2]:
search_query = "$45.00"
search_vector = get_embedding(search_query)
r = search_client.search(None, top=3, vector_queries=[
    VectorizedQuery(vector=search_vector,k_nearest_neighbors=50, fields="embedding")])
for doc in r:
    found_content = "Not found"
    if search_query in doc['content']:
        found_content = doc['content'][doc['content'].find(search_query):].replace("\n", " ")
    print(f"Score: {doc['@search.score']:.5f}\tMatching text: {found_content}")

Score: 0.81583	Matching text: Not found
Score: 0.81474	Matching text: Not found
Score: 0.81295	Matching text: Not found


Compare to a text search for same query:

In [3]:
r = search_client.search(search_query, top=3)
for doc in r:
    found_content = "Not found"
    if search_query in doc['content']:
        found_content = doc['content'][doc['content'].find(search_query):].replace("\n", " ")
    print(f"Score: {doc['@search.score']:.5f}\tMatching text: {found_content}")

Score: 5.00821	Matching text: $45.00</td><td>$55.00</td></tr><tr><td>Employee +1</td><td>$65.00</td><td>$71.00</td></tr><tr><td>Employee +2 or more</td><td>$78.00</td><td>$89.00</td></tr></table> Next Steps We hope that this information has been helpful in understanding the differences between Northwind Health Plus and Northwind Standard. We are confident that you will find the right plan for you and your family. Thank you for choosing Contoso Electronics!
Score: 3.68404	Matching text: Not found


#### Hybrid retrieval

Uses RRF to combine vector and text results.

In [4]:
search_vector = get_embedding(search_query)
r = search_client.search(search_query, top=15, vector_queries=[
    VectorizedQuery(vector=search_vector, k_nearest_neighbors=10, fields="embedding")])
for doc in r:
    found_content = "Not found"
    if search_query in doc['content']:
        found_content = doc['content'][doc['content'].find(search_query):].replace("\n", " ")
    print(f"Score: {doc['@search.score']:.5f}\tMatching text: {found_content}")

Score: 0.03229	Matching text: $45.00</td><td>$55.00</td></tr><tr><td>Employee +1</td><td>$65.00</td><td>$71.00</td></tr><tr><td>Employee +2 or more</td><td>$78.00</td><td>$89.00</td></tr></table> Next Steps We hope that this information has been helpful in understanding the differences between Northwind Health Plus and Northwind Standard. We are confident that you will find the right plan for you and your family. Thank you for choosing Contoso Electronics!
Score: 0.01667	Matching text: Not found
Score: 0.01639	Matching text: Not found
Score: 0.01639	Matching text: Not found
Score: 0.01613	Matching text: Not found
Score: 0.01587	Matching text: Not found
Score: 0.01538	Matching text: Not found
Score: 0.01515	Matching text: Not found
Score: 0.01493	Matching text: Not found
Score: 0.01471	Matching text: Not found
Score: 0.01449	Matching text: Not found


#### Hybrid ranking is not always optimal

In [5]:
search_query = "learning about underwater activities"
search_vector = get_embedding(search_query)
r = search_client.search(search_query, top=5, vector_queries=[
    VectorizedQuery(vector=search_vector, k_nearest_neighbors=10, fields="embedding")])
for doc in r:
    content = doc["content"].replace("\n", " ")[:150]
    print(f"Score: {doc['@search.score']:.5f}\t\tContent: {content}")

Score: 0.03128		Content:  To receive coverage for habilitation services, the individual must be referred to a qualified provider by their primary care physician. When seeking 
Score: 0.03009		Content:  PerksPlus is not only designed to support employees' physical health, but also their mental health. Regular exercise has been shown to reduce stress,
Score: 0.02803		Content:  Some of the lessons covered under PerksPlus include: · Skiing and snowboarding lessons · Scuba diving lessons · Surfing lessons · Horseback riding le
Score: 0.02663		Content:  Occupational Therapy Occupational therapy helps individuals develop, maintain, or restore skills for daily living and work. It can help individuals w
Score: 0.02158		Content:  · Talk to your doctor or health care provider about the trial and ask any questions you may have. · Ask about the potential risks and benefits of par


#### Hybrid + semantic reranking 🎉

In [6]:
search_query = "learning about underwater activities"
search_vector = get_embedding(search_query)
r = search_client.search(
        search_query,
        top=5, 
        vector_queries=[
                VectorizedQuery(vector=search_vector, k_nearest_neighbors=50, fields="embedding")],
        query_type="semantic",
        semantic_configuration_name="default")

for doc in r:
    content = doc["content"].replace("\n", " ")[:150]
    print(f"Score: {doc['@search.score']:.5f}\tReranker: {doc['@search.reranker_score']:.5f}\tContent: {content}")

Score: 0.02803	Reranker: 1.81212	Content:  Some of the lessons covered under PerksPlus include: · Skiing and snowboarding lessons · Scuba diving lessons · Surfing lessons · Horseback riding le
Score: 0.03009	Reranker: 1.44179	Content:  PerksPlus is not only designed to support employees' physical health, but also their mental health. Regular exercise has been shown to reduce stress,
Score: 0.01333	Reranker: 1.14614	Content: than a sign and post)</td></tr></table> 142 :unselected:Appendix A: Crosswalk Policy THE DITY EL CERRITO <table><tr><th colSpan=4>TABLE A-2: UNCONTROL
Score: 0.01389	Reranker: 0.91713	Content:  goto ( "https: //demo. playwright. dev/todomvc/#/" ) 6 page. get_by_placeholder ( "What needs to be done?" ) . click ( ) 7 page. get_by_placeholder (
Score: 0.01562	Reranker: 0.83134	Content:  To learn more about generating tests check out or detailed guide on Codegen. Generating locators You can generate locators with the test generator. .
