# Hybrid Search with Azure OpenAI

This recipe will show you how to run hybrid search with embeddings from Azure OpenAI.

## Requirements

1. Weaviate cluster
    1. You can create a 14-day free sandbox on [WCD](https://console.weaviate.cloud/)
    2. [Embedded Weaviate](https://weaviate.io/developers/weaviate/installation/embedded)
    3. [Local deployment](https://weaviate.io/developers/weaviate/installation/docker-compose#starter-docker-compose-file)
    4. [Other options](https://weaviate.io/developers/weaviate/installation)

2. Azure API key. Grab one [here](https://portal.azure.com/).

## Import Dependencies, Libraries, and Keys

In [None]:
!pip install --q weaviate-client

In [1]:
import weaviate
from weaviate.classes.init import Auth
import weaviate.classes.config as wc
from weaviate.embedded import EmbeddedOptions
import weaviate.classes.query as wq


import os
import requests
import json

## Connect to Weaviate

Only choose one option from the below.

**Weaviate Cloud Deployment**

In [13]:
WCD_URL = os.environ["WEAVIATE_URL"] # Replace with your Weaviate cluster URL
WCD_AUTH_KEY = os.environ["WEAVIATE_AUTH"] # Replace with your cluster auth key
AZURE_OPENAI_KEY = os.environ["AZURE_OPENAI_API_KEY"] # Replace with your Azure key


# Weaviate Cloud Deployment
client = weaviate.connect_to_wcs(
    cluster_url=WCD_URL,
    auth_credentials=weaviate.auth.AuthApiKey(WCD_AUTH_KEY),
      headers={ "X-Azure-Api-Key": AZURE_OPENAI_KEY}
)

print(client.is_ready())

True


**Embedded Weaviate**

In [None]:
# AZURE_KEY = os.environ["AZURE_API_KEY"] # Replace with your Azure key

# client = weaviate.WeaviateClient(
#     embedded_options=EmbeddedOptions(
#         version="1.28.2",
#         additional_env_vars={
#             "ENABLE_MODULES": "text2vec-openai"
#         }),
#         additional_headers={
#             "X-Azure-Api-Key": AZURE_KEY
#         }
# )

# client.connect()

**Local Deployment**

In [None]:
# AZURE_KEY = os.environ["AZURE_API_KEY"] # Replace with your Azure key

# client = weaviate.connect_to_local(
#   headers={
#     "X-Azure-Api-Key": AZURE_KEY
#   }
# )
# print(client.is_ready())

## Create a collection
> Collection stores your data and vector embeddings.

In [18]:
# Note: This will delete your data stored in "JeopardyQuestion" and
# it will require you to re-import again.

# Delete the collection if it already exists
if (client.collections.exists("JeopardyQuestion")):
    client.collections.delete("JeopardyQuestion")

client.collections.create(
    name="JeopardyQuestion",

    vectorizer_config=wc.Configure.Vectorizer.text2vec_azure_openai(
        resource_name="xyz", # name of your resource
        deployment_id="text-embedding-3-small", # model deployed
    ),

    properties=[ # defining properties (data schema) is optional
        wc.Property(name="Question", data_type=wc.DataType.TEXT), 
        wc.Property(name="Answer", data_type=wc.DataType.TEXT),
        wc.Property(name="Category", data_type=wc.DataType.TEXT) 
    ]
)

print("Successfully created collection: JeopardyQuestion.")

Successfully created collection: JeopardyQuestion.


## Import the Data

In [19]:
url = 'https://raw.githubusercontent.com/weaviate/weaviate-examples/main/jeopardy_small_dataset/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)

# Get a collection object for "JeopardyQuestion"
jeopardy = client.collections.get("JeopardyQuestion")

# Insert data objects
response = jeopardy.data.insert_many(data)

# Note, the `data` array contains 10 objects, which is great to call insert_many with.
# However, if you have a milion objects to insert, then you should spit them into smaller batches (i.e. 100-1000 per insert)

if (response.has_errors):
    print(response.errors)
else:
    print("Insert complete.")

Insert complete.


## Hybrid Search

The `alpha` parameter determines the weight given to the sparse and dense search methods. `alpha = 0` is pure sparse (bm25) search, whereas `alpha = 1` is pure dense (vector) search. 

Alpha is an optional parameter. The default is set to `0.75`.

### Hybrid Search only

The below query is finding Jeopardy questions about animals and is limiting the output to only two results. Notice `alpha` is set to `0.80`, which means it is weighing the vector search results more than bm25. If you were to set `alpha = 0.25`, you would get different results. 

In [20]:
response = jeopardy.query.hybrid(
    query="northern beast",
    query_properties=["question"],
    alpha=0.8,
    limit=3
)

for item in response.objects:
    print("ID:", item.uuid)
    print("Data:", json.dumps(item.properties, indent=2), "\n")

ID: f6849411-1553-43fa-a240-13d69114945d
Data: {
  "answer": "species",
  "question": "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification",
  "category": "SCIENCE"
} 

ID: 0de6ee13-b260-4cf9-b368-e7ffeddb14a4
Data: {
  "answer": "the diamondback rattler",
  "question": "Heaviest of all poisonous snakes is this North American rattlesnake",
  "category": "ANIMALS"
} 

ID: 5e3c2cf9-4633-400a-8c97-78ab4b183b8a
Data: {
  "answer": "Antelope",
  "question": "Weighing around a ton, the eland is the largest species of this animal in Africa",
  "category": "ANIMALS"
} 



### Hybrid Search with a `where` filter

Find Jeopardy questions about elephants, where the category is set to Animals.

In [21]:
response = jeopardy.query.hybrid(
    query="northern beast",
    alpha=0.8,
    filters=wq.Filter.by_property("category").equal("Animals"),
    limit=3
)

for item in response.objects:
    print("ID:", item.uuid)
    print("Data:", json.dumps(item.properties, indent=2), "\n")

ID: 0de6ee13-b260-4cf9-b368-e7ffeddb14a4
Data: {
  "answer": "the diamondback rattler",
  "question": "Heaviest of all poisonous snakes is this North American rattlesnake",
  "category": "ANIMALS"
} 

ID: 5e3c2cf9-4633-400a-8c97-78ab4b183b8a
Data: {
  "answer": "Antelope",
  "question": "Weighing around a ton, the eland is the largest species of this animal in Africa",
  "category": "ANIMALS"
} 

ID: 2b4424c2-65f2-4295-974d-906ad74e0e55
Data: {
  "answer": "Elephant",
  "question": "It's the only living mammal in the order Proboseidea",
  "category": "ANIMALS"
} 

