[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/integrations/data-platforms/aryn/weaviate_blog_post.ipynb)

# Reranking with Contextual AI

This notebook demonstrates how to use Contextual AI's reranking model (`ctxl-rerank-v2-instruct-multilingual`) with Weaviate to improve search result quality.

## Requirements

1. Weaviate Database >= `1.34.0`
2. Weaviate Python Client >= `4.18.2`
3. Contextual API key - you can grab one from [the console](https://app.contextual.ai/).

In [1]:
!pip install weaviate-client==4.18.2 --q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Import the Libraries

In [14]:
import weaviate
from weaviate.classes.config import Configure, Property, DataType
from weaviate.classes.query import Rerank

import os
import json
import requests, json
import pandas as pd
from io import StringIO

## Connect to Weaviate Cloud

You can create a free 14-day sandbox on [Weaviate Cloud](https://console.weaviate.cloud)!

In [3]:
# os.environ["WEAVIATE_URL"] = ""
# os.environ["WEAVIATE_API_KEY"] = ""
# os.environ["CONTEXTUALAI_API_KEY"] = ""

# client = weaviate.connect_to_weaviate_cloud(
#     cluster_url=os.getenv("WEAVIATE_URL"),
#     auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")),
#     headers={
#     "X-ContextualAI-Api-Key": os.getenv("CONTEXTUALAI_API_KEY"),
#     }
# )

WEAVIATE_URL = "mjcsoay1rizplybg7d21w.c0.us-west3.gcp.weaviate.cloud"
WEAVIATE_API_KEY = "elJvQWM2Zlg4SSt0N2xZQV9jL002OXduWGxGTC9WUm1JRkFJVHVweGFoT2JYMXZLV0M0c0JPMHhyNnJvPV92MjAw"
CONTEXTUALAI_API_KEY = "key-QGodBgkoMMAEHv8f4l9F3G_-63ZesNF-xyWsK_BDjQKKIIIbQ"


client = weaviate.connect_to_weaviate_cloud(
    cluster_url=WEAVIATE_URL,
    auth_credentials=weaviate.auth.AuthApiKey(WEAVIATE_API_KEY),
    headers={
    "X-ContextualAI-Api-Key": CONTEXTUALAI_API_KEY
    }
)

## Define Weaviate Collection

You can create a new collection with the below cell block, or you can connect to your existing collection and skip the below cell.

In [None]:
# Note: in practice, you shouldn"t rerun this cell, as it deletes your data
# in "JeopardyQuestion", and then you need to re-import it again.

collection_name = "JeopardyQuestions"

# Delete the collection if it already exists
if (client.collections.exists(collection_name)):
    client.collections.delete(collection_name)

client.collections.create(
    "JeopardyQuestions",

    vector_config=
    Configure.Vectors.text2vec_weaviate(
        model="Snowflake/snowflake-arctic-embed-l-v2.0"
    ),
    reranker_config= 
    Configure.Reranker.contextualai(
        model="ctxl-rerank-v2-instruct-multilingual"
    ),

    properties=[ # defining properties (data schema) is optional
        Property(name="Question", data_type=DataType.TEXT), 
        Property(name="Answer", data_type=DataType.TEXT),
        Property(name="Category", data_type=DataType.TEXT, skip_vectorization=True),
        Property(name="Value", data_type=DataType.TEXT, skip_vectorization=True) 
    ]
)

print("Successfully created collection: JeopardyQuestions.")

Successfully created collection: JeopardyQuestions.


## Import Data

We will use the small jeopardy dataset as an example. It has 1,000 objects.

In [5]:
url = 'https://raw.githubusercontent.com/weaviate/weaviate-examples/main/jeopardy_small_dataset/jeopardy_small.csv'
resp = requests.get(url)

df = pd.read_csv(StringIO(resp.text))

In [11]:
# Get a collection object for "JeopardyQuestion"
collection = client.collections.use("JeopardyQuestions")

# Insert data objects with batch import
with collection.batch.dynamic() as batch:
    for _, row in df.iterrows():
        properties = {
            "question": row['Question'],
            "answer": row['Answer'],
            "category": row["Category"],
            "value": row["Value"]
        }
        batch.add_object(properties)

failed_objects = collection.batch.failed_objects
if failed_objects:
    print(f"Number of failed imports: {len(failed_objects)}")
else:
    print("Insert complete.")

Insert complete.


In [12]:
# count the number of objects

collection = client.collections.use("JeopardyQuestions")
response = collection.aggregate.over_all(total_count=True)

print(response.total_count)

1000


## Query Time

### Hybrid Search

The `alpha` parameter determines the weight given to the sparse and dense search methods. `alpha = 0` is pure sparse (bm25) search, whereas `alpha = 1` is pure dense (vector) search. 

Alpha is an optional parameter. The default is set to `0.75`.

In [16]:
jeopardy = client.collections.get("JeopardyQuestions")

response = jeopardy.query.hybrid(
    query="unicorn-like artic animal",
    alpha=0.75,
    limit=3
)

for item in response.objects:
    print("ID:", item.uuid)
    print("Data:", json.dumps(item.properties, indent=2), "\n")

ID: f59b284a-8c14-4bd6-b51e-cdd7038899e3
Data: {
  "value": "NaN",
  "question": "A part of this marine mammal was prized by medieval folk, who thought it belonged to a unicorn",
  "answer": "the narwhal",
  "category": "THE ANIMAL KINGDOM"
} 

ID: d5bde835-2125-4e48-996f-ce5a917cd2cc
Data: {
  "value": "$400",
  "question": "You could say this Arctic mammal, Odobenus rosmarus, has a Wilford Brimley mustache",
  "answer": "the walrus",
  "category": "MAMMALS"
} 

ID: 49618708-f547-421f-a8cb-09d35896efcb
Data: {
  "value": "$800",
  "question": "Kodiak Island is the habitat of this type of bear, Ursus arctos middendorffi",
  "answer": "Kodiak bear",
  "category": "STUPID ANSWERS"
} 



### Query with Reranker
We're using ContextualAI's reranker model in Weaviate. 

In [17]:
collection = client.collections.use("JeopardyQuestions")

response = collection.query.hybrid(
    query="unicorn-like artic animal",
    alpha=0.7,
    limit=2,
    rerank=Rerank(
        prop= "question", # property to rerank on
        query="artic animal" # rerank query. If none is provided, the original query is sent
    )
)

for item in response.objects:
    print("ID:", item.uuid)
    print("Data:", json.dumps(item.properties, indent=2), "\n")

ID: d5bde835-2125-4e48-996f-ce5a917cd2cc
Data: {
  "value": "$400",
  "question": "You could say this Arctic mammal, Odobenus rosmarus, has a Wilford Brimley mustache",
  "answer": "the walrus",
  "category": "MAMMALS"
} 

ID: f59b284a-8c14-4bd6-b51e-cdd7038899e3
Data: {
  "value": "NaN",
  "question": "A part of this marine mammal was prized by medieval folk, who thought it belonged to a unicorn",
  "answer": "the narwhal",
  "category": "THE ANIMAL KINGDOM"
} 

