# Cohere Rerank 3 - Search over Tabular Data

This notebook shows an example how the JSON capabilities of [Cohere Rerank 3](https://txt.cohere.com/rerank-3/) can be used to search over tabular data, in this example for a Pandas DataFrame with top-5000 movies.

In [1]:
# !pip install -q cohere pandas numpy

In [4]:
from dotenv import load_dotenv

import pandas as pd
import cohere
import numpy as np

In [5]:
load_dotenv()

True

In [6]:
# Get your free API key from: www.cohere.com
# cohere_key = "<<YOUR_COHERE_API_KEY>>"
# co = cohere.Client(cohere_key)
co = cohere.Client()

In [7]:
# Here we load a pandas DataFrame with around top-5000 movies
df = pd.read_parquet("https://huggingface.co/datasets/Cohere/movies/resolve/main/movies.parquet")
df

Unnamed: 0,title,overview,genres,producer,cast
0,Avatar,"In the 22nd century, a paraplegic Marine is di...","Action, Adventure, Fantasy, Science Fiction","Ingenious Film Partners, Twentieth Century Fox...","Sam Worthington as Jake Sully, Zoe Saldana as ..."
1,Titanic,"84 years later, a 101-year-old woman named Ros...","Drama, Romance, Thriller","Paramount Pictures, Twentieth Century Fox Film...","Kate Winslet as Rose DeWitt Bukater, Leonardo ..."
2,The Avengers,When an unexpected enemy emerges and threatens...,"Science Fiction, Action, Adventure","Paramount Pictures, Marvel Studios","Robert Downey Jr. as Tony Stark / Iron Man, Ch..."
3,Jurassic World,Twenty-two years after the events of Jurassic ...,"Action, Adventure, Science Fiction, Thriller","Universal Studios, Amblin Entertainment, Legen...","Chris Pratt as Owen Grady, Bryce Dallas Howard..."
4,Furious 7,Deckard Shaw seeks revenge against Dominic Tor...,Action,"Universal Pictures, Original Film, Fuji Televi...","Vin Diesel as Dominic Toretto, Paul Walker as ..."
...,...,...,...,...,...
4798,Midnight Cabaret,A Broadway producer puts on a play with a Devi...,Horror,,"Lisa Hart Carroll as Dawn, Michael Des Barres ..."
4799,Growing Up Smith,"In 1979, an Indian family moves to America wit...","Comedy, Family, Drama",,"Roni Akurati as Smith Bhatnagar, Brighton Shar..."
4800,8 Days,"After sneaking to a party with her friends, 16...","Thriller, Drama",After Eden Pictures,"Nicole Smolen as Amber, Kim Baldwin as BB, Ari..."
4801,Running Forever,After being estranged since her mother's death...,Family,New Kingdom Pictures,


In [9]:
# First we embed all movies to be able to go from 5000 movies to top-100 movies
docs_txt = [str(doc) for doc in df.to_dict('records')]
print(docs_txt[0])

{'title': 'Avatar', 'overview': 'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.', 'genres': 'Action, Adventure, Fantasy, Science Fiction', 'producer': 'Ingenious Film Partners, Twentieth Century Fox Film Corporation, Dune Entertainment, Lightstorm Entertainment', 'cast': "Sam Worthington as Jake Sully, Zoe Saldana as Neytiri, Sigourney Weaver as Dr. Grace Augustine, Stephen Lang as Col. Quaritch, Michelle Rodriguez as Trudy Chacon, Giovanni Ribisi as Selfridge, Joel David Moore as Norm Spellman, CCH Pounder as Moat, Wes Studi as Eytukan, Laz Alonso as Tsu'Tey, Dileep Rao as Dr. Max Patel, Matt Gerald as Lyle Wainfleet, Sean Anthony Moran as Private Fike, Jason Whyte as Cryo Vault Med Tech, Scott Lawrence as Venture Star Crew Chief, Kelly Kilgour as Lock Up Trooper, James Patrick Pitt as Shuttle Pilot, Sean Patrick Murphy as Shuttle Co-Pilot, Peter Dillon as S

In [10]:
doc_emb = np.asarray(co.embed(texts=docs_txt, model="embed-english-v3.0", input_type="search_document").embeddings)
print(f"Doc embeddings: {doc_emb.shape}")

Doc embeddings: (4803, 1024)


# Embedding-Only Search Results

In [11]:
# Now we define a specific query and first run embedding search to go from 5000 movies to 100 movies
query = "Action movies with Christian Bale"

# Get the query embedding
query_emb = np.asarray(co.embed(texts=[query], model="embed-english-v3.0", input_type="search_query").embeddings)

# Compute the cosine similarity / dot product for the query embedding and all document embeddings
dot_scores = np.matmul(query_emb, doc_emb.transpose())[0]

# Get the top_100 hits
top_100_hits = np.argsort(-dot_scores).tolist()[0:100]

# Create a data frame with the top-100 results

# We see that the data frame contains some Christian Bale movies, but
# not necessarily at the top positions
emb_results = df.iloc[top_100_hits].copy()
emb_results.insert(loc=0, column='rank', value=(np.arange(len(emb_results))+1))
emb_results

Unnamed: 0,rank,title,overview,genres,producer,cast
1874,1,War,FBI agent Jack Crawford is out for revenge whe...,"Action, Thriller, Crime","Mosaic Media Group, Current Entertainment, Lio...","Jet Li as Rogue, Jason Statham as Special Agen..."
2942,2,Equilibrium,"In a dystopian future, a totalitarian regime m...","Action, Science Fiction, Thriller","Dimension Films, Blue Tulip Productions","Christian Bale as John Preston, Taye Diggs as ..."
1818,3,Reign of Fire,"In post-apocalyptic England, an American volun...","Adventure, Action, Fantasy","The Zanuck Company, Spyglass Entertainment, Wo...","Christian Bale as Quinn Abercromby, Matthew Mc..."
2361,4,Action Jackson,Vengence drives a tough Detroit cop to stay on...,"Action, Adventure, Comedy, Crime, Drama","Silver Pictures, Lorimar Motion Pictures",Carl Weathers as Sgt. Jericho 'Action' Jackson...
417,5,Cliffhanger,"A year after losing his friend in a tragic 4,0...","Action, Adventure, Thriller","TriStar Pictures, Canal+, Carolco Pictures, RC...","Sylvester Stallone as Gabe Walker, John Lithgo..."
...,...,...,...,...,...,...
2028,96,I Spy,"When the Switchblade, the most sophisticated p...","Action, Adventure, Comedy, Thriller",Columbia Pictures Corporation,"Eddie Murphy as Kelly Robinson, Owen Wilson as..."
130,97,American Sniper,U.S. Navy SEAL Chris Kyle takes his sole missi...,"War, Action","Village Roadshow Pictures, Malpaso Productions...","Bradley Cooper as Chris Kyle, Sienna Miller as..."
3906,98,The Devil's Tomb,Captain Mack leads an elite military unit on a...,"Action, Horror, Thriller, Science Fiction","Ice Cold Productions, Empyreal Entertainment, ...","Cuba Gooding Jr. as Mack, Ray Winstone as Blak..."
247,99,Die Hard: With a Vengeance,New York detective John McClane is back and ki...,"Action, Thriller","Twentieth Century Fox Film Corporation, Cinerg...","Bruce Willis as John McClane, Jeremy Irons as ..."


# Rerank 3 Search Results
We transform the dataframe to a list of JSON records, and then use the JSON capabilities of Rerank 3 for the ranking.

In [12]:
# We now use rerank to improve our results. We rerank the results from embedding search

# Convert the data frame with the results from embeddings search to a list with JSON rows
docs = emb_results.to_dict('records')

#We rank over all fields in our DataFrame. You could also limit the columns you want to include in the search
rank_fields = list(docs[0].keys())

#We pass this to co.rerank
results = co.rerank(query=query, documents=docs, top_n=10, model='rerank-english-v3.0', rank_fields=rank_fields)

# Collect the indices for the  top results
top_ids = [hit.index for hit in results.results]

# Create the DataFrame with the rerank results
rerank_results = emb_results.iloc[top_ids].copy()
rerank_results['rank'] = (np.arange(len(rerank_results))+1)

# Show results for the query "Action movies with Christian Bale"
# We now see that the top positions all contain action movies with Christian Bale (Reign of Fire, The Dark Knight Rises, ...)
rerank_results

Unnamed: 0,rank,title,overview,genres,producer,cast
1818,1,Reign of Fire,"In post-apocalyptic England, an American volun...","Adventure, Action, Fantasy","The Zanuck Company, Spyglass Entertainment, Wo...","Christian Bale as Quinn Abercromby, Matthew Mc..."
2942,2,Equilibrium,"In a dystopian future, a totalitarian regime m...","Action, Science Fiction, Thriller","Dimension Films, Blue Tulip Productions","Christian Bale as John Preston, Taye Diggs as ..."
14,3,The Dark Knight Rises,Following the death of District Attorney Harve...,"Action, Crime, Drama, Thriller","Legendary Pictures, Warner Bros., DC Entertain...","Christian Bale as Bruce Wayne / Batman, Michae..."
20,4,The Dark Knight,Batman raises the stakes in his war on crime. ...,"Drama, Action, Crime, Thriller","DC Comics, Legendary Pictures, Warner Bros., D...","Christian Bale as Bruce Wayne, Heath Ledger as..."
2849,5,Rescue Dawn,A US Fighter pilot's epic struggle of survival...,"Adventure, Drama, War",Metro-Goldwyn-Mayer (MGM),"Christian Bale as Dieter Dengler, Steve Zahn a..."
1176,6,The Flowers of War,A Westerner finds refuge with a group of women...,"Drama, History, War","Beijing New Picture Film Co. Ltd., EDKO Film, ...","Christian Bale as John Miller, Ni Ni as Yu Mo,..."
4492,7,U.F.O.,A group of friends awake one morning to find a...,"Action, Adventure, Science Fiction","Hawthorn Productions, Hawthorne Productions","Bianca Bree as Carrie, Sean Brosnan as Michael..."
1688,8,Maximum Risk,A policeman takes his twin brother's place and...,"Action, Adventure, Thriller",Columbia Pictures,"Jean-Claude Van Damme as Mikhail Suverov, Nata..."
531,9,Kingdom of Heaven,"After his wife dies, a blacksmith named Balian...","Drama, Action, Adventure, History, War","Studio Babelsberg, Twentieth Century Fox Film ...","Orlando Bloom as Balian de Ibelin, Eva Green a..."
288,10,The Rock,A group of renegade marine commandos seizes a ...,"Action, Adventure, Thriller","Hollywood Pictures, Don Simpson/Jerry Bruckhei...","Sean Connery as John Patrick Mason, Nicolas Ca..."
