# Relative Score Fusion (RSF) Demo with MongoDB Atlas

This notebook demonstrates how to implement **Relative Score Fusion (RSF)** by combining results from MongoDB Atlas **Full Text Search (keyword)** and **Vector Search (semantic)**.

## Prerequisites
1.  **Data Updated**: Ensure your MongoDB collection `mobile_reviews` contains the latest data (including the "Night" category phones added to `mobile_reviews.json`).
    *   *Tip:* You may need to re-run `a.generate_embeddings.ipynb` and `b.mongodb_atlas_setup.ipynb` (or `import_and_index.ipynb`) if you haven't recently.
2.  **Indexes**: You need TWO indexes on the `mobile_reviews` collection:
    *   **Vector Index**: Already created in previous steps (named `vector_index`).
    *   **Search Index**: A standard Lucene search index for text matching (we will create this below).

In [None]:
%pip install pymongo ollama

In [None]:
import pymongo
from pymongo.operations import SearchIndexModel
import ollama
import pandas as pd

# --- CONFIGURATION --- 
# Replace with your actual connection string
MONGODB_URI = "<connection_string>"
DB_NAME = "tech_on_the_rock"
COLLECTION_NAME = "mobile_reviews"

# Connect
client = pymongo.MongoClient(MONGODB_URI)
db = client[DB_NAME]
collection = db[COLLECTION_NAME]

print("Connected to MongoDB Atlas.")

## 1. Create a Full Text Search Index

To perform keyword search (BM25), we need a standard Atlas Search index. We will create one named `default` that indexes relevant text fields.

In [None]:
search_index_name = "default"

search_index_model = {
  "mappings": {
    "dynamic": False,
    "fields": {
      "product_name": { "type": "string" },
      "review_text": { "type": "string" },
      "tags": { "type": "string" },
      "category": { "type": "string" }
    }
  }
}

try:
    # Check if index exists
    existing_indexes = list(collection.list_search_indexes())
    if any(idx['name'] == search_index_name for idx in existing_indexes):
        print(f"Search index '{search_index_name}' already exists.")
    else:
        print(f"Creating search index '{search_index_name}'... this may take a minute.")
        collection.create_search_index(model=SearchIndexModel(definition=search_index_model, name=search_index_name))
        print("Index creation initiated. Wait a moment before searching.")
except Exception as e:
    print(f"Error creating search index (you might need to create it in Atlas UI): {e}")

## 2. Define Search Functions

We define two functions: one for **Vector Search** (semantic) and one for **Keyword Search** (text).

In [None]:
def get_query_embedding(query):
    """Generates vector embedding for the query using Ollama."""
    response = ollama.embeddings(model='qwen3-embedding', prompt=query)
    return response['embedding']

def search_vector(query, limit=10):
    """Performs Semantic Search using $vectorSearch."""
    query_vector = get_query_embedding(query)
    
    pipeline = [
        {
            "$vectorSearch": {
                "index": "vector_index",
                "path": "review_embedding",
                "queryVector": query_vector,
                "numCandidates": limit * 10,
                "limit": limit
            }
        },
        {
            "$project": {
                "_id": 0,
                "review_id": 1,
                "product_name": 1,
                "review_text": 1,
                "score": { "$meta": "vectorSearchScore" }  # Get the cosine similarity score
            }
        }
    ]
    return list(collection.aggregate(pipeline))

def search_keyword(query, limit=10):
    """Performs Keyword Search using $search."""
    pipeline = [
        {
            "$search": {
                "index": "default",
                "text": {
                    "query": query,
                    "path": ["review_text", "product_name", "tags"]
                }
            }
        },
        {
            "$limit": limit
        },
        {
            "$project": {
                "_id": 0,
                "review_id": 1,
                "product_name": 1,
                "review_text": 1,
                "score": { "$meta": "searchScore" }  # Get the Lucene/BM25 score
            }
        }
    ]
    return list(collection.aggregate(pipeline))

## 3. Implement Relative Score Fusion (RSF)

Here is the core logic. We take two lists of results, normalize their scores to a 0.0-1.0 range, and compute a weighted average.

In [None]:
def relative_score_fusion(vector_results, keyword_results, weight_vector=0.5, weight_keyword=0.5):
    # 1. Create a map to merge results by review_id
    fused_scores = {}
    
    # Helper to get min/max scores for normalization
    def get_min_max(results):
        if not results: return 0, 1
        scores = [r['score'] for r in results]
        return min(scores), max(scores)

    min_v, max_v = get_min_max(vector_results)
    min_k, max_k = get_min_max(keyword_results)
    
    # Normalize function
    def normalize(score, min_s, max_s):
        if max_s == min_s: return 1.0 # Avoid divide by zero if all scores are same
        return (score - min_s) / (max_s - min_s)

    # 2. Process Vector Results
    for doc in vector_results:
        rid = doc['review_id']
        norm_score = normalize(doc['score'], min_v, max_v)
        fused_scores[rid] = {
            "doc": doc,
            "vector_score_raw": doc['score'],
            "vector_score_norm": norm_score,
            "keyword_score_raw": 0,
            "keyword_score_norm": 0,
            "final_score": norm_score * weight_vector
        }

    # 3. Process Keyword Results
    for doc in keyword_results:
        rid = doc['review_id']
        norm_score = normalize(doc['score'], min_k, max_k)
        
        if rid in fused_scores:
            # Update existing entry
            fused_scores[rid]['keyword_score_raw'] = doc['score']
            fused_scores[rid]['keyword_score_norm'] = norm_score
            fused_scores[rid]['final_score'] += (norm_score * weight_keyword)
        else:
            # New entry (was not in vector results)
            fused_scores[rid] = {
                "doc": doc,
                "vector_score_raw": 0,
                "vector_score_norm": 0,
                "keyword_score_raw": doc['score'],
                "keyword_score_norm": norm_score,
                "final_score": norm_score * weight_keyword
            }
            
    # 4. Convert to list and sort
    results = []
    for rid, data in fused_scores.items():
        row = data['doc'].copy()
        del row['score'] # Remove original single score
        row['vector_raw'] = data['vector_score_raw']
        row['keyword_raw'] = data['keyword_score_raw']
        row['rsf_score'] = data['final_score']
        results.append(row)
        
    return sorted(results, key=lambda x: x['rsf_score'], reverse=True)

## 4. Run the Demo

We will search for **"night photography"**.

*   **Expectation:**
    *   *Vector Search* should favor **NightOwl X** (Semantic match: "complete darkness", "low-light").
    *   *Keyword Search* should favor **Keyword King** (Exact match: "Night photography" repeated).
    *   *RSF* should balance them based on our weights.

In [None]:
QUERY = "night photography"

print(f"--- Searching for: '{QUERY}' ---")

# 1. Get Independent Results
print("Running Vector Search...")
vec_res = search_vector(QUERY, limit=10)

print("Running Keyword Search...")
kw_res = search_keyword(QUERY, limit=10)

# 2. Perform Fusion (Semantic Priority: 0.8 Vector, 0.2 Keyword)
print("Calculating RSF (Weights: Vector=0.8, Keyword=0.2)...")
fused_results = relative_score_fusion(vec_res, kw_res, weight_vector=0.8, weight_keyword=0.2)

# 3. Display
df = pd.DataFrame(fused_results)
columns = ['product_name', 'rsf_score', 'vector_raw', 'keyword_raw', 'review_text']
print(df[columns].head(5))

In [None]:
# 4. Try Balanced Fusion (Weights: Vector=0.5, Keyword=0.5)
print("
--- Balanced Fusion (Weights: 0.5 / 0.5) ---")
balanced_results = relative_score_fusion(vec_res, kw_res, weight_vector=0.5, weight_keyword=0.5)
df_balanced = pd.DataFrame(balanced_results)
print(df_balanced[columns].head(5))