Merge pull request #1416 from vespa-engine/tht-update-text-video-search

Updates to text-video-search sample app
vespa-engine · Apr 17, 2024 · c89b2cf · c89b2cf
2 parents 574530c + 6ace5ca
commit c89b2cf
Show file tree

Hide file tree

Showing 3 changed files with 55 additions and 44 deletions.
diff --git a/text-video-search/src/python/create-feed-query-text-video-search.ipynb b/text-video-search/src/python/create-feed-query-text-video-search.ipynb
@@ -6,15 +6,16 @@
    "metadata": {},
    "source": [
     "# Create a text-video search app with Vespa\n",
-    "> Create, deploy, feed and query the Vespa app using the Vespa python API "
+    "\n",
+    "> Create, deploy, feed and query the Vespa app using the Vespa python API\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "bearing-adelaide",
    "metadata": {},
    "source": [
-    "## Install required packages"
+    "## Install required packages\n"
    ]
   },
   {
@@ -32,15 +33,15 @@
    "id": "recreational-characterization",
    "metadata": {},
    "source": [
-    "## CLIP model"
+    "## CLIP model\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "operational-transcript",
    "metadata": {},
    "source": [
-    "There are multiple CLIP model variations"
+    "There are multiple CLIP model variations\n"
    ]
   },
   {
@@ -71,7 +72,7 @@
    "id": "adolescent-freedom",
    "metadata": {},
    "source": [
-    "Each CLIP model might have a different embedding size. We need this information when creating the schema of the text-video search application."
+    "Each CLIP model might have a different embedding size. We need this information when creating the schema of the text-video search application.\n"
    ]
   },
   {
@@ -97,7 +98,9 @@
     }
    ],
    "source": [
-    "embedding_info = {name: clip.load(name)[0].visual.output_dim for name in clip.available_models()}\n",
+    "embedding_info = {\n",
+    "    name: clip.load(name)[0].visual.output_dim for name in clip.available_models()\n",
+    "}\n",
     "embedding_info"
    ]
   },
@@ -106,31 +109,31 @@
    "id": "different-remainder",
    "metadata": {},
    "source": [
-    "## Create and deploy a text-video search app"
+    "## Create and deploy a text-video search app\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "thirty-territory",
    "metadata": {},
    "source": [
-    "### Create the Vespa application package"
+    "### Create the Vespa application package\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "international-question",
    "metadata": {},
    "source": [
-    "The function `create_text_video_app` below uses [the Vespa python API](https://pyvespa.readthedocs.io/en/latest/) to create an application package with fields to store image embeddings extracted from the videos that we want to search based on the selected CLIP models. It also declares the types of the text embeddings that we are going to send along with the query when searching for images, and creates one ranking profile for each (text, image) embedding model."
+    "The function `create_text_video_app` below uses [the Vespa python API](https://pyvespa.readthedocs.io/en/latest/) to create an application package with fields to store image embeddings extracted from the videos that we want to search based on the selected CLIP models. It also declares the types of the text embeddings that we are going to send along with the query when searching for images, and creates one ranking profile for each (text, image) embedding model.\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "38f995b3",
    "metadata": {},
    "source": [
-    "For this demonstration we are going to use only one CLIP model but we could very well index all the available models for comparison, just as we did for [the text-image sample app](https://github.com/vespa-engine/sample-apps/blob/master/text-image-search/src/python/compare-pre-trained-clip-for-text-image-search.ipynb)."
+    "For this demonstration we are going to use only one CLIP model but we could very well index all the available models for comparison, just as we did for [the text-image sample app](https://github.com/vespa-engine/sample-apps/blob/master/text-image-search/src/python/compare-pre-trained-clip-for-text-image-search.ipynb).\n"
    ]
   },
   {
@@ -150,7 +153,7 @@
    "id": "neutral-fence",
    "metadata": {},
    "source": [
-    "We can inspect how the `schema` of the resulting application package looks like:"
+    "We can inspect how the `schema` of the resulting application package looks like:\n"
    ]
   },
   {
@@ -199,15 +202,15 @@
    "id": "meaning-report",
    "metadata": {},
    "source": [
-    "### Deploy to Vespa Cloud"
+    "### Deploy to Vespa Cloud\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "assured-possible",
    "metadata": {},
    "source": [
-    "Follow [this guide](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-cloud.html) to learn how to set the environment variables below before deploying to Vespa Cloud."
+    "Follow [this guide](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-cloud.html) to learn how to set the environment variables below before deploying to Vespa Cloud.\n"
    ]
   },
   {
@@ -235,37 +238,37 @@
    "id": "foreign-complaint",
    "metadata": {},
    "source": [
-    "Alternatively, check [this guide](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-docker.html) to deploy locally in a Docker container."
+    "Alternatively, check [this guide](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html) to deploy locally in a Docker container.\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "large-institution",
    "metadata": {},
    "source": [
-    "## Feed data"
+    "## Feed data\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "7230b645",
    "metadata": {},
    "source": [
-    "### Download the data"
+    "### Download the data\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "8cc37439",
    "metadata": {},
    "source": [
-    "We are going to use the UCF101 dataset to allow users to follow along from \n",
-    "their laptop. We downloaded a [zipped file](http://storage.googleapis.com/thumos14_files/UCF101_videos.zip) \n",
-    "containing 13320 trimmed videos, each including one action, \n",
-    "and a [text file](http://crcv.ucf.edu/THUMOS14/Class%20Index.txt) containing the list of action \n",
+    "We are going to use the UCF101 dataset to allow users to follow along from\n",
+    "their laptop. We downloaded a [zipped file](http://storage.googleapis.com/thumos14_files/UCF101_videos.zip)\n",
+    "containing 13320 trimmed videos, each including one action,\n",
+    "and a [text file](http://crcv.ucf.edu/THUMOS14/Class%20Index.txt) containing the list of action\n",
     "classes and their numerical index.\n",
     "\n",
-    "After downloading and unzipping the data, set the `VIDEO_DIR` environment variable to the folder containing the video \n",
+    "After downloading and unzipping the data, set the `VIDEO_DIR` environment variable to the folder containing the video\n",
     "files.\n"
    ]
   },
@@ -274,15 +277,15 @@
    "id": "f6b34733",
    "metadata": {},
    "source": [
-    "### Convert .avi files to .mp4"
+    "### Convert .avi files to .mp4\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "7d1a2759",
    "metadata": {},
    "source": [
-    "There is better support for `.mp4` files, so we will convert the `.avi` files to `.mp4` using `ffmpeg`. The code below requires that your machine have `ffmpeg` installed."
+    "There is better support for `.mp4` files, so we will convert the `.avi` files to `.mp4` using `ffmpeg`. The code below requires that your machine have `ffmpeg` installed.\n"
    ]
   },
   {
@@ -294,17 +297,18 @@
    "source": [
     "import subprocess\n",
     "\n",
+    "\n",
     "def convert_from_avi_to_mp4(file_name):\n",
     "    outputfile = file_name.lower().replace(\".avi\", \".mp4\")\n",
-    "    subprocess.call(['ffmpeg', '-i', file_name, outputfile])"
+    "    subprocess.call([\"ffmpeg\", \"-i\", file_name, outputfile])"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "73d68bcc",
    "metadata": {},
    "source": [
-    "The code below takes quite a while and could be sped up by using multi-processing:"
+    "The code below takes quite a while and could be sped up by using multi-processing:\n"
    ]
   },
   {
@@ -326,15 +330,15 @@
    "id": "4c1c4259",
    "metadata": {},
    "source": [
-    "### Compute and send embeddings"
+    "### Compute and send embeddings\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "suspended-supervision",
    "metadata": {},
    "source": [
-    "The function below assumes you have downloaded the UCF101 dataset, converted it to .mp4 and stored the resulting files in the `VIDEO_PATH` folder. It extracts frames from the video, compute image embeddings according to a CLIP model and send it to the Vespa app."
+    "The function below assumes you have downloaded the UCF101 dataset, converted it to .mp4 and stored the resulting files in the `VIDEO_PATH` folder. It extracts frames from the video, compute image embeddings according to a CLIP model and send it to the Vespa app.\n"
    ]
   },
   {
@@ -347,11 +351,11 @@
     "from embedding import compute_and_send_video_embeddings\n",
     "\n",
     "compute_and_send_video_embeddings(\n",
-    "    app=app, \n",
-    "    batch_size=32, \n",
-    "    clip_model_names=[\"ViT-B/32\"], \n",
+    "    app=app,\n",
+    "    batch_size=32,\n",
+    "    clip_model_names=[\"ViT-B/32\"],\n",
     "    number_frames_per_video=4,\n",
-    "    video_dir=os.environ[\"VIDEO_DIR\"]\n",
+    "    video_dir=os.environ[\"VIDEO_DIR\"],\n",
     ")"
    ]
   },
@@ -360,7 +364,7 @@
    "id": "organizational-delta",
    "metadata": {},
    "source": [
-    "The function `compute_and_send_video_embeddings` is a more robust version of the following loop:"
+    "The function `compute_and_send_video_embeddings` is a more robust version of the following loop:\n"
    ]
   },
   {
@@ -371,15 +375,19 @@
    "outputs": [],
    "source": [
     "for model_name in clip_model_names:\n",
-    "    video_dataset = VideoFeedDataset(       ## PyTorch Dataset that outputs pyvespa-compatible data \n",
-    "        video_dir=os.environ[\"VIDEO_DIR\"],   # Folder containing video files\n",
-    "        model_name=model_name,               # CLIP model name used to convert image into vector\n",
-    "        number_frames_per_video=4            # Number of image frames to use per video\n",
+    "    video_dataset = (\n",
+    "        VideoFeedDataset(  ## PyTorch Dataset that outputs pyvespa-compatible data\n",
+    "            video_dir=os.environ[\"VIDEO_DIR\"],  # Folder containing video files\n",
+    "            model_name=model_name,  # CLIP model name used to convert image into vector\n",
+    "            number_frames_per_video=4,  # Number of image frames to use per video\n",
+    "        )\n",
     "    )\n",
-    "    dataloader = DataLoader(                ## PyTorch Dataloader to loop through the dataset\n",
-    "        video_dataset,                  \n",
+    "    dataloader = DataLoader(  ## PyTorch Dataloader to loop through the dataset\n",
+    "        video_dataset,\n",
     "        batch_size=batch_size,\n",
-    "        collate_fn=lambda x: [item for sublist in x for item in sublist],  # turn list of list into flat list\n",
+    "        collate_fn=lambda x: [\n",
+    "            item for sublist in x for item in sublist\n",
+    "        ],  # turn list of list into flat list\n",
     "    )\n",
     "    for idx, batch in enumerate(dataloader):\n",
     "        app.update_batch(batch=batch)"
@@ -390,15 +398,15 @@
    "id": "eea78554",
    "metadata": {},
    "source": [
-    "## Query the application"
+    "## Query the application\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "eff562f6",
    "metadata": {},
    "source": [
-    "We created a custom class `VideoSearchApp` that implements a `query` method that is specific to text-video use case that we are demonstrating here."
+    "We created a custom class `VideoSearchApp` that implements a `query` method that is specific to text-video use case that we are demonstrating here.\n"
    ]
   },
   {
@@ -418,7 +426,7 @@
    "id": "191275fb",
    "metadata": {},
    "source": [
-    "It takes a `text` query, transform it into an embedding with the CLIP model, and for each video it takes the score of the frame of that video that is closest to the text in the joint embedding space to represent the score of the video. We can also select the number of videos that we want to retrieve."
+    "It takes a `text` query, transform it into an embedding with the CLIP model, and for each video it takes the score of the frame of that video that is closest to the text in the joint embedding space to represent the score of the video. We can also select the number of videos that we want to retrieve.\n"
    ]
   },
   {
@@ -521,7 +529,9 @@
     "from IPython.display import Video, display\n",
     "\n",
     "for hit in result:\n",
-    "    display(Video(os.path.join(os.environ[\"VIDEO_DIR\"], hit[\"video_file_name\"]), embed=True))"
+    "    display(\n",
+    "        Video(os.path.join(os.environ[\"VIDEO_DIR\"], hit[\"video_file_name\"]), embed=True)\n",
+    "    )"
    ]
   }
  ],

diff --git a/text-video-search/src/python/embedding.py b/text-video-search/src/python/embedding.py
@@ -197,7 +197,7 @@ def create_text_video_app(model_info):
 
     :return: A Vespa application package.
     """
-    app_package = ApplicationPackage(name="video_search")
+    app_package = ApplicationPackage(name="videosearch")
 
     app_package.schema.add_fields(
         Field(name="video_file_name", type="string", indexing=["summary", "attribute"]),

diff --git a/text-video-search/src/python/requirements.txt b/text-video-search/src/python/requirements.txt
@@ -4,4 +4,5 @@ torch
 torchvision
 pyvespa
 streamlit
+setuptools
 git+https://github.com/openai/CLIP.git