In [1]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "view-in-github"
   },
   "source": [
    "## Semantic Search on Movie Plots\n",
    "### Assignment-1\n",
    "This Jupyter notebook provides a solution for building a semantic search engine for movie plots. The objective is to use a pre-trained Sentence Transformer model to find movies with similar plot descriptions to a given search query."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "e7ff2b37"
   },
   "source": [
    "### 1. Install and Import Libraries\n",
    "\n",
    "First, we need to install the required libraries listed in `requirements.txt`: `pandas` for data manipulation, `sentence-transformers` for creating embeddings, and `scikit-learn` for calculating cosine similarity. Then, we import them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "e402802d",
    "outputId": "36ef9d80-5a9d-4861-f3b1-ef171803ff26"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (2.0.3)\n",
      "Requirement already satisfied: sentence-transformers in /usr/local/lib/python3.10/dist-packages (2.7.0)\n",
      "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (1.2.2)\n",
      "..."
     ]
    }
   ],
   "source": [
    "# Install the necessary libraries\n",
    "!pip install -r requirements.txt\n",
    "\n",
    "# Import the libraries\n",
    "import pandas as pd\n",
    "from sentence_transformers import SentenceTransformer\n",
    "from sklearn.metrics.pairwise import cosine_similarity\n",
    "\n",
    "# You should also import the search_movies function from the movie_search.py file you created.\n",
    "# In a Colab environment, you can't easily import a local .py file, so the code below\n",
    "# will directly define the function. If you are running this on your local machine\n",
    "# you can import the function directly:\n",
    "# from movie_search import search_movies\n",
    "\n",
    "# --- Start of the code from movie_search.py ---\n",
    "# Load dataset and create embeddings (global for testing)\n",
    "df = pd.read_csv('movies.csv')\n",
    "model = SentenceTransformer('all-MiniLM-L6-v2')\n",
    "embeddings = model.encode(df['plot'].tolist(), convert_to_tensor=False)\n",
    "\n",
    "def search_movies(query, top_n=5):\n",
    "    \"\"\"\n",
    "    Performs a semantic search for movies based on a query.\n",
    "    \"\"\"\n",
    "    # Encode the query to a vector\n",
    "    query_embedding = model.encode(query, convert_to_tensor=False)\n",
    "    \n",
    "    # Calculate cosine similarity between the query and all movie plots\n",
    "    similarities = cosine_similarity([query_embedding], embeddings)[0]\n",
    "    \n",
    "    # Get the indices of the top_n most similar movies\n",
    "    top_n_indices = similarities.argsort()[-top_n:][::-1]\n",
    "    \n",
    "    # Create a DataFrame with the top results\n",
    "    results = df.iloc[top_n_indices].copy()\n",
    "    results['similarity'] = similarities[top_n_indices]\n",
    "    \n",
    "    return results[['title', 'plot', 'similarity']]\n",
    "\n",
    "# --- End of the code from movie_search.py ---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "b3e34b10"
   },
   "source": [
    "### 2. Implement the Semantic Search\n",
    "\n",
    "The core logic is implemented in the `search_movies` function. This function takes a user query, converts it into an embedding, and then calculates its cosine similarity to all movie plot embeddings. It returns a DataFrame of the top `n` most similar movies."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "e2d45c50"
   },
   "source": [
    "### 3. Test the Functionality\n",
    "\n",
    "Now we can test our function with the required query: `'spy thriller in Paris'`. We will retrieve the top 3 most relevant movies and display the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "4b63e527"
   },
   "outputs": [],
   "source": [
    "# Test with the example query\n",
    "query = 'spy thriller in Paris'\n",
    "top_results = search_movies(query, top_n=3)\n",
    "\n",
    "print(f\"Top 3 movies for the query: '{query}'\")\n",
    "print(top_results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "b9680e6c"
   },
   "source": [
    "### 4. Run Unit Tests\n",
    "\n",
    "To verify your solution, you should run the unit tests provided in the `tests/test_movie_search.py` file from your local terminal. This step cannot be performed within the notebook itself, but it is crucial for a complete submission.\n",
    "\n",
    "**From your terminal, navigate to your assignment folder and run:**\n",
    "\n",
    "```bash\n",
    "python -m unittest tests/test_movie_search.py -v\n",
    "```\n",
    "\n",
    "Ensure all four tests pass to meet the grading criteria for this section."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": [],
   "gpuType": "T4"
  },
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}

NameError: name 'null' is not defined