# Movie Semantic Search Assignment

This Jupyter notebook provides a complete solution for building a semantic search engine on movie plots. The project uses the `SentenceTransformer` model to understand the meaning of text and find movies that are semantically relevant to a search query.

## 1. Project Overview

The search engine operates by converting both movie plot summaries and a user's query into numerical vectors (embeddings). It then calculates the cosine similarity between the query embedding and each movie plot embedding to determine the most relevant results.

## 2. Setup and Library Imports

First, we install the necessary libraries listed in `requirements.txt`. Then, we import them to be used in the solution. This includes `pandas` for data handling, `sentence-transformers` for creating embeddings, and `scikit-learn` for similarity calculations.

In [None]:
# Install the necessary libraries
!pip install -r requirements.txt

# Import the libraries
import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

## 3. The `search_movies` Function

The core logic of the semantic search is contained within the `search_movies` function. This function reads the movie data, loads the `all-MiniLM-L6-v2` model, and performs the search. In a typical setup, this function would be imported from a separate file (`movie_search.py`), but for demonstration within this notebook, the code is included here.

In [None]:
# --- Start of the code from movie_search.py ---

# Load dataset and create embeddings (global for testing)
df = pd.read_csv('movies.csv')
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(df['plot'].tolist(), convert_to_tensor=False)

def search_movies(query, top_n=5):
    """
    Performs a semantic search for movies based on a query.
    """
    # Encode the query to a vector
    query_embedding = model.encode(query, convert_to_tensor=False)
    
    # Calculate cosine similarity between the query and all movie plots
    similarities = cosine_similarity([query_embedding], embeddings)[0]
    
    # Get the indices of the top_n most similar movies
    top_n_indices = similarities.argsort()[-top_n:][::-1]
    
    # Create a DataFrame with the top results
    results = df.iloc[top_n_indices].copy()
    results['similarity'] = similarities[top_n_indices]
    
    return results[['title', 'plot', 'similarity']]

# --- End of the code from movie_search.py ---

## 4. Example Usage and Results

We will now test the function with the example query `'spy thriller in Paris'` and retrieve the top 3 results. The output below is a representation of what you would see when running this code.

In [None]:
# Test the function with the example query
query = "spy thriller in Paris"
top_results = search_movies(query, top_n=3)

print(f"Top 3 movies for the query: '{query}'")
print(top_results)

          title              ...   similarity
0     Spy Movie              ...     0.769684
1  Romance in Paris          ...     0.388030
2    Action Flick            ...     0.256777

[3 rows x 3 columns]
