too many (#669)

vespa-engine · Feb 2, 2024 · 7961c04 · 7961c04
1 parent 2e8be33
commit 7961c04
Showing 1 changed file with 68 additions and 105 deletions.
diff --git a/docs/sphinx/source/examples/nomic-embeddings-cloud.ipynb b/docs/sphinx/source/examples/nomic-embeddings-cloud.ipynb
@@ -12,15 +12,15 @@
     "</picture>\n",
     "\n",
     "\n",
-    "# Arxiv AI-powered search\n",
+    "# Arxiv AI-powered search with Nomaic (nomic-embed-text-v1) and Vespa\n",
     "\n",
-    "This notebook demonstrates how to load a ArxiV dataset hosted on [HF datasets](https://huggingface.co/datasets/somewheresystems/dataclysm-arxiv) \n",
-    "and feed it to a Vespa instance. The dataset comprises of English language arXiv papers from the Cornell/arXiv dataset, with two new columns added: title-embeddings and abstract-embeddings. Embeddings generated using the [bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) embeddings model. \n",
+    " This notebook demonstrates how to use the recently announced open-source Nomic embedding model\n",
+    "([nomic blog post](https://blog.nomic.ai/posts/nomic-embed-text-v1)) with Vespa. It also demonstrates how to \n",
+    "load an ArxiV dataset hosted on [HF datasets](https://huggingface.co/datasets/somewheresystems/dataclysm-arxiv) \n",
+    "and feed it to a Vespa instance.\n",
     "\n",
-    "In this notebook, we use Vespa's embedder functionality to include the  [bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) embedding\n",
-    "model into Vespa for query serving. \n",
     "\n",
-    "This is work in progress - we want to demonstrate more query examples. "
+    "In this notebook, we use Vespa's embedder functionality to include the [nomic-ai/nomic-embed-text-v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1) embedding model into Vespa for query serving. We use a quantized model to improve CPU inference performance. \n"
    ]
   },
   {
@@ -43,13 +43,16 @@
     "[PyVespa](https://pyvespa.readthedocs.io/en/latest/) helps us build the [Vespa application package](https://docs.vespa.ai/en/application-packages.html). \n",
     "A Vespa application package consists of configuration files, schemas, models, and code (plugins).   \n",
     "\n",
-    "First, we define a [Vespa schema](https://docs.vespa.ai/en/schemas.html) with the fields we want to store and their type. This is a translation\n",
-    "of the dataset features:"
+    "First, we define a [Vespa schema](https://docs.vespa.ai/en/schemas.html) with the fields we want to store and their type. \n",
+    "\n",
+    "The nomic technical report [(pdf)](https://static.nomic.ai/reports/2024_Nomic_Embed_Text_Technical_Report.pdf) mentions\n",
+    "that the model was trained with prefix instructions, using `search_document` as a prefix for documents and `search_query` \n",
+    "as a prefix for queries. We add the prefix in the embed infere. "
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 93,
+   "execution_count": 2,
    "id": "0dca2378",
    "metadata": {},
    "outputs": [],
@@ -68,14 +71,10 @@
     "                    Field(name=\"journal_ref\", type=\"string\", indexing=[\"summary\", \"index\"]),\n",
     "                    Field(name=\"doi\", type=\"string\", indexing=[\"summary\", \"index\"]),\n",
     "                    Field(name=\"categories\", type=\"array<string>\", indexing=[\"summary\", \"index\"], match=[\"word\"]),\n",
-    "                    Field(name=\"title_embedding\", type=\"tensor<bfloat16>(x[384])\",\n",
-    "                        indexing=[\"attribute\", \"index\"],\n",
-    "                        ann=HNSW(distance_metric=\"angular\")\n",
-    "                    ),\n",
-    "                    Field(name=\"abstract_embedding\", type=\"tensor<bfloat16>(x[384])\",\n",
-    "                        indexing=[\"attribute\", \"index\"],\n",
-    "                        ann=HNSW(distance_metric=\"angular\")\n",
-    "                    ),\n",
+    "                    Field(name=\"embedding\", type=\"tensor<bfloat16>(x[768])\",\n",
+    "                     indexing=[\"\\\"search_document\\\" . \\\" \\\" . input title . \\\" \\\". input abstract \", \"embed\", \"index\", \"attribute\"],\n",
+    "                    ann=HNSW(distance_metric=\"angular\"),\n",
+    "                    is_document_field=False)       \n",
     "                ],\n",
     "            ),\n",
     "            fieldsets=[\n",
@@ -84,9 +83,18 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "3acc9020",
+   "metadata": {},
+   "source": [
+    "## Configure embedder\n",
+    "This uses Vespa embedder inference support, we use the Xenova (Transformer.js) model checkpoints in ONNX. "
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 94,
+   "execution_count": 3,
    "id": "66c5da1d",
    "metadata": {},
    "outputs": [],
@@ -97,11 +105,10 @@
     "vespa_application_package = ApplicationPackage(\n",
     "        name=vespa_app_name,\n",
     "        schema=[paper_schema],\n",
-    "        components=[Component(id=\"bge\", type=\"hugging-face-embedder\",\n",
+    "        components=[Component(id=\"nomic\", type=\"hugging-face-embedder\",\n",
     "            parameters=[\n",
-    "                Parameter(\"transformer-model\", {\"url\": \"https://huggingface.co/Xenova/bge-small-en-v1.5/resolve/main/onnx/model.onnx\"}),\n",
-    "                Parameter(\"tokenizer-model\", {\"url\": \"https://huggingface.co/Xenova/bge-small-en-v1.5/raw/main/tokenizer.json\"}),\n",
-    "                Parameter(\"pooling-strategy\", args=dict(), children=\"cls\")\n",
+    "                Parameter(\"transformer-model\", {\"url\": \"https://huggingface.co/Xenova/nomic-embed-text-v1/resolve/main/onnx/model_quantized.onnx\"}),\n",
+    "                Parameter(\"tokenizer-model\", {\"url\": \"https://huggingface.co/Xenova/nomic-embed-text-v1/raw/main/tokenizer.json\"})\n",
     "            ]\n",
     "        )]\n",
     ") "
@@ -127,7 +134,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 101,
+   "execution_count": 4,
    "id": "a8ce5624",
    "metadata": {},
    "outputs": [],
@@ -136,7 +143,7 @@
     "\n",
     "bm25 = RankProfile(\n",
     "    name=\"bm25\", \n",
-    "    inputs=[(\"query(q)\", \"tensor<float>(x[384])\")],\n",
+    "    inputs=[(\"query(q)\", \"tensor<float>(x[768])\")],\n",
     "    \n",
     "    first_phase=FirstPhaseRanking(\n",
     "        expression=\"bm25(title) + bm25(abstract)\",\n",
@@ -145,14 +152,14 @@
     "\n",
     "hybrid = RankProfile(\n",
     "    name=\"hybrid\", \n",
-    "    inputs=[(\"query(q)\", \"tensor<float>(x[384])\")],\n",
+    "    inputs=[(\"query(q)\", \"tensor<float>(x[768])\")],\n",
     "    first_phase=FirstPhaseRanking(\n",
-    "        expression=\"closeness(field, title_embedding) + closeness(field, abstract_embedding)\"\n",
+    "        expression=\"closeness(field, embedding)\"\n",
     "    ),\n",
     "    global_phase=GlobalPhaseRanking(\n",
-    "        expression=\"reciprocal_rank_fusion(closeness(field,title_embedding), bm25(title), bm25(abstract), closeness(field,abstract_embedding))\"\n",
+    "        expression=\"reciprocal_rank_fusion(closeness(field,embedding), bm25(title), bm25(abstract))\"\n",
     "    ),\n",
-    "    match_features=[\"bm25(title)\", \"bm25(abstract)\", \"closeness(field, title_embedding)\", \"closeness(field, abstract_embedding)\"]\n",
+    "    match_features=[\"bm25(title)\", \"bm25(abstract)\", \"closeness(field, embedding)\"]\n",
     ")\n",
     "paper_schema.add_rank_profile(bm25)\n",
     "paper_schema.add_rank_profile(hybrid)"
@@ -244,7 +251,7 @@
    "source": [
     "import os\n",
     "\n",
-    "os.environ[\"TENANT_NAME\"] = \"vespa-team\" # Replace with your tenant name\n",
+    "os.environ[\"TENANT_NAME\"] = \"samples\" # Replace with your tenant name\n",
     "\n",
     "vespa_cli_command = f'vespa config set application {os.environ[\"TENANT_NAME\"]}.{vespa_app_name}'\n",
     "\n",
@@ -263,7 +270,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 8,
    "id": "1f0b97c8",
    "metadata": {},
    "outputs": [],
@@ -328,7 +335,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 103,
+   "execution_count": 10,
    "id": "b5fddf9f",
    "metadata": {},
    "outputs": [],
@@ -364,50 +371,10 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 104,
+   "execution_count": null,
    "id": "fe954dc4",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Deployment started in run 7 of dev-aws-us-east-1c for samples.arxivsearch. This may take a few minutes the first time.\n",
-      "INFO    [12:01:11]  Deploying platform version 8.284.4 and application dev build 7 for dev-aws-us-east-1c of default ...\n",
-      "INFO    [12:01:11]  Using CA signed certificate version 0\n",
-      "INFO    [12:01:12]  Using 1 nodes in container cluster 'arxivsearch_container'\n",
-      "INFO    [12:01:13]  Using 1 nodes in container cluster 'arxivsearch_container'\n",
-      "INFO    [12:01:15]  Deployment successful.\n",
-      "INFO    [12:01:15]  Session 247 for tenant 'samples' prepared and activated.\n",
-      "INFO    [12:01:15]  ######## Details for all nodes ########\n",
-      "INFO    [12:01:15]  h90001f.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
-      "INFO    [12:01:15]  --- platform vespa/cloud-tenant-rhel8:8.284.4\n",
-      "INFO    [12:01:15]  --- logserver-container on port 4080 has config generation 247, wanted is 247\n",
-      "INFO    [12:01:15]  --- metricsproxy-container on port 19092 has config generation 247, wanted is 247\n",
-      "INFO    [12:01:15]  h90001g.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
-      "INFO    [12:01:15]  --- platform vespa/cloud-tenant-rhel8:8.284.4\n",
-      "INFO    [12:01:15]  --- container-clustercontroller on port 19050 has config generation 247, wanted is 247\n",
-      "INFO    [12:01:15]  --- metricsproxy-container on port 19092 has config generation 247, wanted is 247\n",
-      "INFO    [12:01:15]  h90024a.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
-      "INFO    [12:01:15]  --- platform vespa/cloud-tenant-rhel8:8.284.4\n",
-      "INFO    [12:01:15]  --- container on port 4080 has config generation 247, wanted is 247\n",
-      "INFO    [12:01:15]  --- metricsproxy-container on port 19092 has config generation 247, wanted is 247\n",
-      "INFO    [12:01:15]  h90026a.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
-      "INFO    [12:01:15]  --- platform vespa/cloud-tenant-rhel8:8.284.4\n",
-      "INFO    [12:01:15]  --- storagenode on port 19102 has config generation 246, wanted is 247\n",
-      "INFO    [12:01:15]  --- searchnode on port 19107 has config generation 247, wanted is 247\n",
-      "INFO    [12:01:15]  --- distributor on port 19111 has config generation 246, wanted is 247\n",
-      "INFO    [12:01:15]  --- metricsproxy-container on port 19092 has config generation 247, wanted is 247\n",
-      "INFO    [12:01:21]  Found endpoints:\n",
-      "INFO    [12:01:21]  - dev.aws-us-east-1c\n",
-      "INFO    [12:01:21]   |-- https://fa63b7b7.e9029380.z.vespa-app.cloud/ (cluster 'arxivsearch_container')\n",
-      "INFO    [12:01:21]  Installation succeeded!\n",
-      "Using mTLS (key,cert) Authentication against endpoint https://fa63b7b7.e9029380.z.vespa-app.cloud//ApplicationStatus\n",
-      "Application is up!\n",
-      "Finished deployment.\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "from vespa.application import Vespa\n",
     "app:Vespa = vespa_cloud.deploy()"
@@ -426,7 +393,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 105,
+   "execution_count": null,
    "id": "8f422178",
    "metadata": {},
    "outputs": [],
@@ -442,8 +409,6 @@
     "        \"id\": x[\"id\"],\n",
     "        \"title\": x[\"title\"],\n",
     "        \"abstract\": x[\"abstract\"],\n",
-    "        \"title_embedding\": x[\"title_embedding\"],\n",
-    "        \"abstract_embedding\": x[\"abstract_embedding\"],\n",
     "        \"journal_ref\": x.get(\"journal-ref\",None),\n",
     "        \"doi\": x.get(\"doi\",None),\n",
     "        \"categories\": x[\"categories\"],\n",
@@ -483,7 +448,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 109,
+   "execution_count": 13,
    "id": "b9349fb4",
    "metadata": {},
    "outputs": [
@@ -493,33 +458,31 @@
      "text": [
       "[\n",
       "  {\n",
-      "    \"id\": \"index:arxivsearch_content/0/cfdff72f28cffdb0b73f6026\",\n",
-      "    \"relevance\": 0.06384129063829451,\n",
+      "    \"id\": \"index:arxivsearch_content/0/93be86dc99f3b26b012796f6\",\n",
+      "    \"relevance\": 0.04918032786885246,\n",
       "    \"source\": \"arxivsearch_content\",\n",
       "    \"fields\": {\n",
       "      \"matchfeatures\": {\n",
-      "        \"bm25(abstract)\": 0.0,\n",
-      "        \"bm25(title)\": 0.0,\n",
-      "        \"closeness(field,abstract_embedding)\": 0.6178772298066597,\n",
-      "        \"closeness(field,title_embedding)\": 0.6288338602029975\n",
+      "        \"bm25(abstract)\": 16.812307256543374,\n",
+      "        \"bm25(title)\": 10.293464220282218,\n",
+      "        \"closeness(field,embedding)\": 0.5253550471303831\n",
       "      },\n",
-      "      \"id\": \"0812.3122\",\n",
-      "      \"title\": \"Cosmological constraints on unifying Dark Fluid models\"\n",
+      "      \"id\": \"704.0003\",\n",
+      "      \"title\": \"The evolution of the Earth-Moon system based on the dark matter field\\n  fluid model\"\n",
       "    }\n",
       "  },\n",
       "  {\n",
-      "    \"id\": \"index:arxivsearch_content/0/c77e9d766bd90c894a5d0481\",\n",
-      "    \"relevance\": 0.06198484047241319,\n",
+      "    \"id\": \"index:arxivsearch_content/0/9fba7416eaa319e25a2b7b6f\",\n",
+      "    \"relevance\": 0.047619047619047616,\n",
       "    \"source\": \"arxivsearch_content\",\n",
       "    \"fields\": {\n",
       "      \"matchfeatures\": {\n",
-      "        \"bm25(abstract)\": 0.0,\n",
-      "        \"bm25(title)\": 0.0,\n",
-      "        \"closeness(field,abstract_embedding)\": 0.5754037589718138,\n",
-      "        \"closeness(field,title_embedding)\": 0.6644048114912198\n",
+      "        \"bm25(abstract)\": 6.052146920083321,\n",
+      "        \"bm25(title)\": 3.3033206224361487,\n",
+      "        \"closeness(field,embedding)\": 0.4964298779859894\n",
       "      },\n",
-      "      \"id\": \"0711.0466\",\n",
-      "      \"title\": \"A Model for Dark Matter Halos\"\n",
+      "      \"id\": \"704.0077\",\n",
+      "      \"title\": \"Universal Forces and the Dark Energy Problem\"\n",
       "    }\n",
       "  }\n",
       "]\n"
@@ -531,12 +494,12 @@
     "import json\n",
     "\n",
     "response:VespaQueryResponse = app.query(\n",
-    "    yql=\"select title, id from paper where ({targetHits:10}nearestNeighbor(title_embedding,q)) or ({targetHits:10}nearestNeighbor(abstract_embedding,q))\",\n",
+    "    yql=\"select title, id from paper where ({targetHits:10}nearestNeighbor(embedding,q)) or userQuery()\",\n",
     "    ranking=\"hybrid\",\n",
     "    query=\"dark matter field fluid model\",\n",
     "    body={\n",
     "        \"presentation.format.tensors\": \"short-value\",\n",
-    "        \"input.query(q)\": \"embed(bge, \\\"dark matter field fluid model\\\")\",\n",
+    "        \"input.query(q)\": \"embed(nomic, \\\"search_query dark matter field fluid model\\\")\",\n",
     "    }\n",
     ")\n",
     "assert(response.is_successful())\n",
@@ -545,7 +508,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 108,
+   "execution_count": 14,
    "id": "405cdb72",
    "metadata": {},
    "outputs": [
@@ -555,21 +518,21 @@
      "text": [
       "[\n",
       "  {\n",
-      "    \"id\": \"index:arxivsearch_content/0/cfdff72f28cffdb0b73f6026\",\n",
-      "    \"relevance\": 31.398304828681407,\n",
+      "    \"id\": \"index:arxivsearch_content/0/93be86dc99f3b26b012796f6\",\n",
+      "    \"relevance\": 27.10577147682559,\n",
       "    \"source\": \"arxivsearch_content\",\n",
       "    \"fields\": {\n",
-      "      \"id\": \"0812.3122\",\n",
-      "      \"title\": \"Cosmological constraints on unifying Dark Fluid models\"\n",
+      "      \"id\": \"704.0003\",\n",
+      "      \"title\": \"The evolution of the Earth-Moon system based on the dark matter field\\n  fluid model\"\n",
       "    }\n",
       "  },\n",
       "  {\n",
-      "    \"id\": \"index:arxivsearch_content/0/6033639d686a018894cdd4ec\",\n",
-      "    \"relevance\": 30.574650705468287,\n",
+      "    \"id\": \"index:arxivsearch_content/0/9fba7416eaa319e25a2b7b6f\",\n",
+      "    \"relevance\": 9.35546754251947,\n",
       "    \"source\": \"arxivsearch_content\",\n",
       "    \"fields\": {\n",
-      "      \"id\": \"0812.3611\",\n",
-      "      \"title\": \"Dark Energy vs. Dark Matter: Towards a Unifying Scalar Field?\"\n",
+      "      \"id\": \"704.0077\",\n",
+      "      \"title\": \"Universal Forces and the Dark Energy Problem\"\n",
       "    }\n",
       "  }\n",
       "]\n"
@@ -595,7 +558,7 @@
    "source": [
     "## Summary\n",
     "\n",
-    "This notebook demonstrates how to interact with HF datasets, including embedding models in Vespa and querying. "
+    "This notebook demonstrates how to interact with HF datasets, including embedding models in Vespa and querying. Now we can delete the Vespa cloud instance!"
    ]
   },
   {