# Cohere Rerank 3 - Search over Tabular Data

This notebook shows an example how the JSON capabilities of [Cohere Rerank 3](https://txt.cohere.com/rerank-3/) can be used to search over tabular data, in this example for a Pandas DataFrame with top-5000 movies.

In [1]:
# !pip install -q cohere pandas numpy

In [2]:
from dotenv import load_dotenv

import pandas as pd
import cohere
import numpy as np

In [3]:
load_dotenv()

True

In [4]:
# Get your free API key from: www.cohere.com
# cohere_key = "<<YOUR_COHERE_API_KEY>>"
# co = cohere.Client(cohere_key)
co = cohere.Client()

In [16]:
# Here we load a pandas DataFrame with around top-5000 movies
df = pd.read_parquet("https://huggingface.co/datasets/Cohere/movies/resolve/main/movies.parquet")
df

Unnamed: 0,title,overview,genres,producer,cast
0,Avatar,"In the 22nd century, a paraplegic Marine is di...","Action, Adventure, Fantasy, Science Fiction","Ingenious Film Partners, Twentieth Century Fox...","Sam Worthington as Jake Sully, Zoe Saldana as ..."
1,Titanic,"84 years later, a 101-year-old woman named Ros...","Drama, Romance, Thriller","Paramount Pictures, Twentieth Century Fox Film...","Kate Winslet as Rose DeWitt Bukater, Leonardo ..."
2,The Avengers,When an unexpected enemy emerges and threatens...,"Science Fiction, Action, Adventure","Paramount Pictures, Marvel Studios","Robert Downey Jr. as Tony Stark / Iron Man, Ch..."
3,Jurassic World,Twenty-two years after the events of Jurassic ...,"Action, Adventure, Science Fiction, Thriller","Universal Studios, Amblin Entertainment, Legen...","Chris Pratt as Owen Grady, Bryce Dallas Howard..."
4,Furious 7,Deckard Shaw seeks revenge against Dominic Tor...,Action,"Universal Pictures, Original Film, Fuji Televi...","Vin Diesel as Dominic Toretto, Paul Walker as ..."
...,...,...,...,...,...
4798,Midnight Cabaret,A Broadway producer puts on a play with a Devi...,Horror,,"Lisa Hart Carroll as Dawn, Michael Des Barres ..."
4799,Growing Up Smith,"In 1979, an Indian family moves to America wit...","Comedy, Family, Drama",,"Roni Akurati as Smith Bhatnagar, Brighton Shar..."
4800,8 Days,"After sneaking to a party with her friends, 16...","Thriller, Drama",After Eden Pictures,"Nicole Smolen as Amber, Kim Baldwin as BB, Ari..."
4801,Running Forever,After being estranged since her mother's death...,Family,New Kingdom Pictures,


In [17]:
# Randomly get 50 rows with no None value in all columns
df = df.dropna(axis=0, how='all').sample(n=50, random_state=42)
df

Unnamed: 0,title,overview,genres,producer,cast
596,Tropic Thunder,"Vietnam veteran 'Four Leaf' Tayback's memoir, ...","Action, Comedy","DreamWorks SKG, Goldcrest Pictures, Red Hour F...","Ben Stiller as Tugg Speedman, Jack Black as Je..."
3372,Bats,Genetically mutated bats escape and it's up to...,"Horror, Thriller",Destination Films,"Lou Diamond Phillips as Sheriff Emmett Kimsey,..."
2702,Drop Dead Gorgeous,"In a small Minnesota town, the annual beauty p...",Comedy,"New Line Cinema, Capella International, KC Med...","Kirsten Dunst as Amber Atkins, Ellen Barkin as..."
2473,Adventureland,"In the summer of 1987, a college graduate take...",Comedy,"Miramax Films, Sidney Kimmel Entertainment, Th...","Jesse Eisenberg as James Brennan, Kristen Stew..."
8,Minions,"Minions Stuart, Kevin and Bob are recruited by...","Family, Animation, Adventure, Comedy","Universal Pictures, Illumination Entertainment","Sandra Bullock as Scarlet Overkill (voice), Jo..."
577,The Holiday,"Two women, one (Cameron Diaz) from America and...","Comedy, Romance","Columbia Pictures, Universal Pictures, Waverly...","Cameron Diaz as Amanda Woods, Kate Winslet as ..."
3172,"4 Months, 3 Weeks and 2 Days","Gabita is pregnant, abortion is strictly forbi...",Drama,"Saga Film, Mobra Films, Mindshare Media, Centr...","Anamaria Marinca as Otilia, Laura Vasiliu as G..."
811,The Color Purple,An epic tale spanning forty years in the life ...,Drama,"Amblin Entertainment, The Guber-Peters Company...","Whoopi Goldberg as Celie, Margaret Avery as Sh..."
2077,Rob Roy,"In the highlands of Scotland in the 1700s, Rob...",Adventure,"United Artists, Talisman Productions","Liam Neeson as Robert Roy MacGregor (Rob Roy),..."
4032,Bobby Jones: Stroke of Genius,"The story of golf icon and legend, Bobby Jones...",Drama,"Dean River Productions, Bobby Jones Films LLC,...","Jim Caviezel as Bobby Jones, Claire Forlani as..."


In [18]:
# First, we embed all movies to be able to go from 50 movies to top-10 movies
docs_txt = [str(doc) for doc in df.to_dict('records')]
print(docs_txt[0])

{'title': 'Tropic Thunder', 'overview': "Vietnam veteran 'Four Leaf' Tayback's memoir, Tropic Thunder, is being made into a film, but Director Damien Cockburn can’t control the cast of prima donnas. Behind schedule and over budget, Cockburn is ordered by a studio executive to get filming back on track, or risk its cancellation. On Tayback's advice, Cockburn drops the actors into the middle of the jungle to film the remaining scenes but, unbeknownst to the actors and production, the group have been dropped in the middle of the Golden Triangle, the home of heroin-producing gangs.", 'genres': 'Action, Comedy', 'producer': 'DreamWorks SKG, Goldcrest Pictures, Red Hour Films, Internationale Filmproduktion Stella-del-Sud Second', 'cast': "Ben Stiller as Tugg Speedman, Jack Black as Jeff Portnoy, Robert Downey Jr. as Kirk Lazarus, Nick Nolte as Four Leaf Tayback, Steve Coogan as Damien Cockburn, Jay Baruchel as Kevin Sandusky, Danny McBride as Cody, Brandon T. Jackson as Alpa Chino, Bill Hade

In [19]:
doc_emb = np.asarray(co.embed(texts=docs_txt, model="embed-v4.0", input_type="search_document").embeddings)
print(f"Doc embeddings: {doc_emb.shape}")

Doc embeddings: (50, 1536)


# Embedding-Only Search Results

In [20]:
# Now we define a specific query and first run embedding search to go from 50 movies to 10 movies
query = "Action movies with Christian Bale"

# Get the query embedding
query_emb = np.asarray(co.embed(texts=[query], model="embed-v4.0", input_type="search_query").embeddings)

# Compute the cosine similarity / dot product for the query embedding and all document embeddings
dot_scores = np.matmul(query_emb, doc_emb.transpose())[0]

# Get the top_10 hits
top_10_hits = np.argsort(-dot_scores).tolist()[0:10]

# Create a data frame with the top-10 results

# We see that the data frame contains some Christian Bale movies, but
# not necessarily at the top positions
emb_results = df.iloc[top_10_hits].copy()
emb_results.insert(loc=0, column='rank', value=(np.arange(len(emb_results))+1))
emb_results

Unnamed: 0,rank,title,overview,genres,producer,cast
227,1,Taken 2,"In Istanbul, retired CIA operative Bryan Mills...","Action, Crime, Thriller","Twentieth Century Fox Film Corporation, M6 Fil...","Liam Neeson as Bryan Mills, Maggie Grace as Ki..."
1323,2,Exit Wounds,Maverick cop Orin Boyd always brings down the ...,"Action, Crime, Thriller","Village Roadshow Pictures, NPV Entertainment, ...","Steven Seagal as Orin Boyd, DMX as Latrell Wal..."
534,3,Braveheart,"Enraged at the slaughter of Murron, his new br...","Action, Drama, History, War","Icon Entertainment International, The Ladd Com...","Mel Gibson as William Wallace, Catherine McCor..."
486,4,The Social Network,"On a fall night in 2003, Harvard undergrad and...",Drama,"Columbia Pictures, Scott Rudin Productions, Re...","Jesse Eisenberg as Mark Zuckerberg, Andrew Gar..."
4015,5,Remember Me,Still reeling from a heartbreaking family even...,"Drama, Romance",Summit Entertainment,"Robert Pattinson as Tyler Hawkins, Emilie de R..."
79,6,The Twilight Saga: Eclipse,Bella once again finds herself surrounded by d...,"Adventure, Fantasy, Drama, Romance","Summit Entertainment, Maverick Films, Imprint ...","Kristen Stewart as Isabella 'Bella' Swan, Robe..."
586,7,Superman II,Three escaped criminals from the planet Krypto...,"Action, Adventure, Fantasy, Science Fiction","Warner Bros., Dovemead Films, Film Export A.G.","Gene Hackman as Lex Luthor, Christopher Reeve ..."
2992,8,Four Lions,Four Lions tells the story of a group of Briti...,"Comedy, Crime, Drama","Film4, Drafthouse Films, Warp Films, Wild Bunc...","Riz Ahmed as Omar, Nigel Lindsay as Barry, Kay..."
759,9,The Monuments Men,Based on the true story of the greatest treasu...,"War, Drama, History, Action","Columbia Pictures, Studio Babelsberg, Fox 2000...","Matt Damon as James Granger, Cate Blanchett as..."
596,10,Tropic Thunder,"Vietnam veteran 'Four Leaf' Tayback's memoir, ...","Action, Comedy","DreamWorks SKG, Goldcrest Pictures, Red Hour F...","Ben Stiller as Tugg Speedman, Jack Black as Je..."


# Rerank 3 Search Results
We transform the dataframe to a list of JSON records, and then use the JSON capabilities of Rerank 3 for the ranking.

In [21]:
# We now use rerank to improve our results. We rerank the results from embedding search

# Convert the data frame with the results from embeddings search to a list with JSON rows
docs = emb_results.to_dict('records')

#We rank over all fields in our DataFrame. You could also limit the columns you want to include in the search
rank_fields = list(docs[0].keys())

#We pass this to co.rerank
results = co.rerank(query=query, documents=docs, top_n=5, model='rerank-v3.5', rank_fields=rank_fields)

# Collect the indices for the top results
top_ids = [hit.index for hit in results.results]

# Create the DataFrame with the rerank results
rerank_results = emb_results.iloc[top_ids].copy()
rerank_results['rank'] = (np.arange(len(rerank_results))+1)

# Show results for the query "Action movies with Christian Bale"
# We now see that the top positions all contain action movies with Christian Bale (Reign of Fire, The Dark Knight Rises, ...)
rerank_results

Unnamed: 0,rank,title,overview,genres,producer,cast
586,1,Superman II,Three escaped criminals from the planet Krypto...,"Action, Adventure, Fantasy, Science Fiction","Warner Bros., Dovemead Films, Film Export A.G.","Gene Hackman as Lex Luthor, Christopher Reeve ..."
759,2,The Monuments Men,Based on the true story of the greatest treasu...,"War, Drama, History, Action","Columbia Pictures, Studio Babelsberg, Fox 2000...","Matt Damon as James Granger, Cate Blanchett as..."
1323,3,Exit Wounds,Maverick cop Orin Boyd always brings down the ...,"Action, Crime, Thriller","Village Roadshow Pictures, NPV Entertainment, ...","Steven Seagal as Orin Boyd, DMX as Latrell Wal..."
534,4,Braveheart,"Enraged at the slaughter of Murron, his new br...","Action, Drama, History, War","Icon Entertainment International, The Ladd Com...","Mel Gibson as William Wallace, Catherine McCor..."
227,5,Taken 2,"In Istanbul, retired CIA operative Bryan Mills...","Action, Crime, Thriller","Twentieth Century Fox Film Corporation, M6 Fil...","Liam Neeson as Bryan Mills, Maggie Grace as Ki..."
