![Vespa Cloud logo](https://cloud.vespa.ai/assets/logos/vespa-cloud-logo-full-black.png)


# Building cost-efficient retrieval-augmented personal AI assistants


This notebook demonstrates how to use [Vespa streaming mode](https://docs.vespa.ai/en/streaming-search.html) for cost-efficient retrieval for applications that 
store and retrieve personal data. You can read more about Vespa vector streaming search in these two blog posts:


- [Yahoo Mail turns to Vespa to do RAG at scale](https://blog.vespa.ai/yahoo-mail-turns-to-vespa-to-do-rag-at-scale/)
- [Announcing vector streaming search: AI assistants at scale without breaking the bank](https://blog.vespa.ai/announcing-vector-streaming-search/)


This notebook demonstrates how to build a [LlamaIndex](https://gpt-index.readthedocs.io/en/latest/)
[Retriever](https://gpt-index.readthedocs.io/en/latest/core_modules/query_modules/retriever/root.html) that can 
retrieve data from a [Vespa](https://vespa.ai/) deployment using streaming mode.


In [None]:
!pip3 install pyvespa llama-index

## Synthetic Mail Sample Data 
There are few public email datasets because people care about their privacy,  so this notebook uses synthetic data to examine how to use Vespa streaming search. 
We create a generator function yields a `dict` with synthetic mail data. Notice that the dict has three keys:

- `id`
- `groupname`
- `fields` 

This is the expected feed format for [PyVespa](https://pyvespa.readthedocs.io/en/latest/reads-writes.html) feed operations and
where PyVespa will use these to build a Vespa [document v1 api](https://docs.vespa.ai/en/document-v1-api-guide.html) request(s). The groupname is only required when using
streaming mode.  

In [32]:
from typing import List

def synthetic_mail_data_generator() -> List[dict]:
    synthetic_mails = [
        {
            "id": 1,
            "groupname": "bergum@vespa.ai",
            "fields": {
                "subject": "LlamaIndex news, 2023-11-14",
                "to": "bergum@vespa.ai",
                "body": """Hello Llama Friends 🦙 LlamaIndex is 1 year old this week! 🎉 To celebrate, we're taking a stroll down memory 
                    lane on our blog with twelve milestones from our first year. Be sure to check it out.""",
                "from": "news@llamaindex.ai",
                "display_date": "2023-11-15T09:00:00Z"
            }
        },
        {
            "id": 2,
            "groupname": "bergum@vespa.ai",
            "fields": {
                "subject": "Dentist Appointment Reminder",
                "to": "bergum@vespa.ai",
                "body": "Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist",
                "from": "dentist@dentist.no",
                "display_date": "2023-11-20T15:30:00Z"
            }
        },
        {
            "id": 1,
            "groupname": "giraffe@wildlife.ai",
            "fields": {
                "subject": "Wildlife Update: Giraffe Edition",
                "to": "giraffe@wildlife.ai",
                "body": "Dear Wildlife Enthusiasts 🦒, We're thrilled to share the latest insights into giraffe behavior in the wild. Join us on an adventure as we explore their natural habitat and learn more about these majestic creatures.",
                "from": "updates@wildlife.ai",
                "display_date": "2023-11-20T14:30:00Z"
        }
        },
        {
            "id": 1,
            "groupname": "penguin@antarctica.ai",
            "fields": {
                "subject": "Antarctica Expedition: Penguin Chronicles",
                "to": "penguin@antarctica.ai",
                "body": "Greetings Explorers 🐧, Our team is embarking on an exciting expedition to Antarctica to study penguin colonies. Stay tuned for live updates and behind-the-scenes footage as we dive into the world of these fascinating birds.",
                "from": "expedition@antarctica.ai",
                "display_date": "2023-11-25T11:45:00Z"
            }
        },
        {
            "id": 1,
            "groupname": "space@exploration.ai",
            "fields": {
                "subject": "Space Exploration News: November Edition",
                "to": "space@exploration.ai",
                "body": "Hello Space Enthusiasts 🚀, Join us as we highlight the latest discoveries and breakthroughs in space exploration. From distant galaxies to new technologies, there's a lot to explore!",
                "from": "news@exploration.ai",
                "display_date": "2023-11-30T16:20:00Z"
            }
         },
        {
            "id": 1,
            "groupname": "ocean@discovery.ai",
            "fields": {
                "subject": "Ocean Discovery: Hidden Treasures Unveiled",
                "to": "ocean@discovery.ai",
                "body": "Dear Ocean Explorers 🌊, Dive deep into the secrets of the ocean with our latest discoveries. From undiscovered species to underwater landscapes, our team is uncovering the wonders of the deep blue.",
                "from": "discovery@ocean.ai",
                "display_date": "2023-12-05T10:15:00Z"
            }
        }
    ]
    for mail in synthetic_mails:
        yield mail  



## Definining a Vespa application
[PyVespa](https://pyvespa.readthedocs.io/en/latest/) can help us build the [Vespa application package](https://docs.vespa.ai/en/application-packages.html) which is
a set of configuration files that defines a Vespa application.  

First, we define a [Vespa schema](https://docs.vespa.ai/en/schemas.html). [PyVespa](https://pyvespa.readthedocs.io/en/latest/)
offers a programatic api for creating the schema. In the end it is serialized to a file (`<schema>.sd`) before it can be deployed to Vespa. 

Vespa is statically typed, so we need to define the fields and their type in the schema. 
Note that we set `mode` to `streaming` which enables [Vespa streaming mode for this schema](https://docs.vespa.ai/en/streaming-search.html). 
Other valid modes are `indexed` and `store-only`. 


In [33]:

from vespa.package import Schema, Document, Field, FieldSet, HNSW
mail_schema = Schema(
            name="mail",
            mode="streaming",
            document=Document(
                fields=[
                    Field(name="id", type="string", indexing=["summary", "index"]),
                    Field(name="subject", type="string", indexing=["index", "summary"]),
                    Field(name="to", type="string", indexing=["index", "summary"]),
                    Field(name="from", type="string", indexing=["index", "summary"]),
                    Field(name="body", type="string", indexing=["index", "summary"]),
                    Field(name="display_date", type="string", indexing=["summary"]),
                    Field(name="timestamp", type="long", indexing=["input display_date", "to_epoch_second", "summary", "attribute"], is_document_field=False),
                    Field(name="embedding", type="tensor<bfloat16>(x[384])",
                        indexing=["\"passage: \" . input subject .\" \". input body", "embed e5", "attribute", "index"],
                        ann=HNSW(distance_metric="angular"),
                        is_document_field=False
                    )
                ],
            ),
            fieldsets=[
                FieldSet(name = "default", fields = ["subject", "body", "to", "from"])
            ]
)

In the `mail` schema, we have six document fields; these are provided by us when we feed documents of type `mail` to this app. 

In addition there are two synthetic fields `timestamp` and `embedding` that uses Vespa [indexing expressions](https://docs.vespa.ai/en/reference/indexing-language-reference.html)
taking inputs from the document. 

- the `timestamp` field takes the input `display_date` and use [converter](https://docs.vespa.ai/en/reference/indexing-language-reference.html#converter) to convert the 
display date into an epoch timestamp.
- the `embedding` tensor field takes the subject and body as input and feeds that into an [embed](https://docs.vespa.ai/en/embedding.html#embedding-a-document-field) function
that uses an embedding model to map the string input into an embedding vector representation using 384 dimensions using `bfloat16` precision. Vectors
in Vespa are represented as [Tensors](https://docs.vespa.ai/en/tensor-user-guide.html). 

Now, for the observant reader, you might have noticed the `e5` argument to the `embed` expression in the above `embedding` field. 
This references a component of the type [hugging-face-embedder](https://docs.vespa.ai/en/embedding.html#huggingface-embedder). Now, we configure
the application package and its name, with the schema previously defined and the `e5` embedder component. 

In [34]:
from vespa.package import ApplicationPackage, Component, Parameter

vespa_app_name = "assistant"
vespa_application_package = ApplicationPackage(
        name=vespa_app_name,
        schema=[mail_schema],
        components=[Component(id="e5", type="hugging-face-embedder",
            parameters=[
                Parameter("transformer-model", {"url": "https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx"}),
                Parameter("tokenizer-model", {"url": "https://raw.githubusercontent.com/vespa-engine/sample-apps/master/simple-semantic-search/model/tokenizer.json"})
            ]
        )]
) 

In the last step we configure [ranking](https://docs.vespa.ai/en/ranking.html) by adding `rank-profile`'s to the mail schema. Vespa 
supports [phased ranking](https://docs.vespa.ai/en/phased-ranking.html) and has a rich set of built-in [rank-features](https://docs.vespa.ai/en/reference/rank-features.html)
and users can also define custom functions with [ranking expressions](https://docs.vespa.ai/en/reference/ranking-expressions.html). 

In [35]:
from vespa.package import RankProfile, Function

keywords = RankProfile(
    name="default", 
    functions=[Function(
        name="my_function", expression="nativeRank(subject) + nativeRank(body) + freshness(timestamp)"
    )],
    first_phase="my_function",
    match_features=["nativeRank(subject)", "nativeRank(body)", "my_function", "freshness(timestamp)"],
)

semantic = RankProfile(
    name="semantic", 
    functions=[Function(
        name="cosine", expression="max(0,cos(distance(field, embedding)))"
    )],
    inputs=[("query(q)", "tensor<float>(x[384])")],
    first_phase="cosine",
    match_features=["cosine", "freshness(timestamp)", "distance(field, embedding)"],
)


In [36]:
mail_schema.add_rank_profile(keywords)
mail_schema.add_rank_profile(semantic)

Now, we have our basic Vespa schema and application package, we can serialize the representation to application package files. 
This is handy when we want to start working with production deployments and version control. 


In [37]:
vespa_application_package.to_files("app-directory")
import os

def print_files_in_directory(directory):
    for root, _, files in os.walk(directory):
        for file in files:
            print(os.path.join(root, file))
print_files_in_directory("saved-app-directory")


## Deploy the application to Vespa Cloud

With the basic application ready, we can deploy it to [Vespa Cloud](https://cloud.vespa.ai/en/). 
It's also possible to deploy the app using docker, 
see [Hybrid Search - Quickstart](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa.html) for
a complete example of how to deploy the app to a local docker container. 

Install the Vespa CLI using [homebrew](https://brew.sh/) - or download a binary from GitHub as demonstrated below. 

In [None]:
!brew install vespa-cli

Alternatively, if running in Colab, download the Vespa CLI:

In [None]:
import os
import requests
res = requests.get(url="https://api.github.com/repos/vespa-engine/vespa/releases/latest").json()
os.environ["VERSION"] = res["tag_name"].replace("v", "")
!curl -fsSL https://github.com/vespa-engine/vespa/releases/download/v${VERSION}/vespa-cli_${VERSION}_linux_amd64.tar.gz | tar -zxf -
!ln -sf /content/vespa-cli_${VERSION}_linux_amd64/bin/vespa /bin/vespa

To deploy the application to Vespa Cloud we need to create a tenant in the Vespa Cloud:

Create a tenant at [console.vespa-cloud.com](https://console.vespa-cloud.com/) (unless you already have one). 
This step requires a Google or GitHub account, and will start your [free trial](https://cloud.vespa.ai/en/free-trial). 
Make note of the tenant name, it is used in the next steps.

### Configure Vespa Cloud date-plane security

Create Vespa Cloud data-plane mTLS cert/key-pair. The mutual certificate pair is used to talk to your Vespa cloud endpoints. See [Vespa Cloud Security Guide](https://cloud.vespa.ai/en/security/guide) for details.

We save the paths to the credentials, for later data-plane access without using pyvespa APIs. 

In [None]:
import os

os.environ["TENANT_NAME"] = "vespa-team" # Replace with your tenant name

vespa_cli_command = f'vespa config set application {os.environ["TENANT_NAME"]}.{vespa_app_name}'

!vespa config set target cloud
!{vespa_cli_command}
!vespa auth cert -N 

Validate that we have the expected data-plane credential files:

In [41]:
from os.path import exists
from pathlib import Path

cert_path = Path.home() / ".vespa" / f"{os.environ['TENANT_NAME']}.{vespa_app_name}.default/data-plane-public-cert.pem"
key_path = Path.home() / ".vespa" / f"{os.environ['TENANT_NAME']}.{vespa_app_name}.default/data-plane-private-key.pem"

if not exists(cert_path) or not exists(key_path):
    print("ERROR: set the correct paths to security credentials. Correct paths above and rerun until you do not see this error")

Note that the subsequent Vespa Cloud deploy call below will add `data-plane-public-cert.pem` to the application before deploying it to Vespa Cloud, so that
you have access to both the private key and the public certificate, while Vespa Cloud only knows the public certificate. 

### Configure control-plane security 

Authenticate to generate a tenant level control plane API key for deploying the applications to Vespa Cloud, and save the path to it. 

The generated tenant api key must be added in the Vespa Console before attemting to deploy the application. 

```
To use this key in Vespa Cloud click 'Add custom key' at
https://console.vespa-cloud.com/tenant/TENANT_NAME/account/keys
and paste the entire public key including the BEGIN and END lines.
```

In [None]:
!vespa auth api-key

from pathlib import Path
api_key_path = Path.home() / ".vespa" / f"{os.environ['TENANT_NAME']}.api-key.pem"

### Deploy to Vespa

Now that we have data-plane and control-plane credentials ready, we can deploy our application to Vespa Cloud! `PyVespa` supports deploying to the 
[development zone](https://cloud.vespa.ai/en/reference/environments#dev-and-perf).

>Note: Deployments to dev and perf expire after 7 days of inactivity, i.e., 7 days after running deploy. This applies to all plans, not only the Free Trial. Use the Vespa Console to extend the expiry period, or redeploy the application to add 7 more days.

![Vespa Cloud logo](https://cloud.vespa.ai/assets/logos/vespa-cloud-logo-full-black.png)

# Hybrid Search - Quickstart on Vespa Cloud

This is the same guide as [getting-started-pyvespa](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa.html), deploying to Vespa Cloud.

In [43]:
from vespa.deployment import VespaCloud

def read_secret():
    """Read the API key from the environment variable. This is 
    only used for CI/CD purposes."""
    t = os.getenv("VESPA_TEAM_API_KEY")
    if t:
        return t.replace(r"\n", "\n")
    else:
        return t

vespa_cloud = VespaCloud(
    tenant=os.environ["TENANT_NAME"],
    application=vespa_app_name,
    key_content=read_secret() if read_secret() else None,
    key_location=api_key_path,
    application_package=vespa_application_package)

Now deploy the app to Vespa Cloud dev zone! 

In [44]:
from vespa.application import Vespa
app:Vespa = vespa_cloud.deploy(disk_folder="saved-app-directory")

Deployment started in run 4 of dev-aws-us-east-1c for samples.assistant. This may take about 15 minutes the first time.
INFO    [17:55:28]  Deploying platform version 8.259.15 and application dev build 4 for dev-aws-us-east-1c of default ...
INFO    [17:55:28]  Using CA signed certificate version 0
INFO    [17:55:29]  Using 1 nodes in container cluster 'assistant_container'
INFO    [17:55:30]  Deployment successful.
INFO    [17:55:30]  Session 2758 for tenant 'samples' prepared and activated.
INFO    [17:55:31]  ######## Details for all nodes ########
INFO    [17:55:31]  h88962b.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP
INFO    [17:55:31]  --- platform vespa/cloud-tenant-rhel8:8.259.15
INFO    [17:55:31]  --- storagenode on port 19102 has config generation 2758, wanted is 2758
INFO    [17:55:31]  --- searchnode on port 19107 has config generation 2758, wanted is 2758
INFO    [17:55:31]  --- distributor on port 19111 has config generation 2757, wanted is 2758
I

### Feeding data

With the app up and running in Vespa Cloud we can interact with the app, feeding and querying our data. In this case
we use the [feed_iterable](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.feed_iterable) api
with a custom `callback` which prints the output of the operation. 

We pass the `synthetic_mail_data_generator()` and the `schema` and `namespace`. The namespace can be any string, but schema must 
in our case be `mail` as that is the only schema that is deployed with the app. Read more in [Vespa document ids](https://docs.vespa.ai/en/documents.html#id-scheme).

In [45]:
from vespa.io import VespaResponse, VespaQueryResponse

def callback(response:VespaResponse, id:str):
    if not response.is_successful():
        print(f"Error when feeding document {id}: {response.get_json()}")
    else:
        print(f"Document {id} fed successfully " + response.url)

app.feed_iterable(synthetic_mail_data_generator(), schema="mail", namespace="assistant", callback=callback)


Document 1 fed successfully https://e09d41ab.cae25ac9.z.vespa-app.cloud//document/v1/assistant/mail/group/bergum@vespa.ai/1
Document 2 fed successfully https://e09d41ab.cae25ac9.z.vespa-app.cloud//document/v1/assistant/mail/group/bergum@vespa.ai/2
Document 1 fed successfully https://e09d41ab.cae25ac9.z.vespa-app.cloud//document/v1/assistant/mail/group/giraffe@wildlife.ai/1
Document 1 fed successfully https://e09d41ab.cae25ac9.z.vespa-app.cloud//document/v1/assistant/mail/group/penguin@antarctica.ai/1
Document 1 fed successfully https://e09d41ab.cae25ac9.z.vespa-app.cloud//document/v1/assistant/mail/group/space@exploration.ai/1
Document 1 fed successfully https://e09d41ab.cae25ac9.z.vespa-app.cloud//document/v1/assistant/mail/group/ocean@discovery.ai/1


PyVespa uses the [document v1 api](https://docs.vespa.ai/en/reference/document-v1-api-reference.html) and the above callback prints the
document v1 url. 

We can get a data point, specyfing the schema, namespace and the id. We also pass a [fieldSet](https://docs.vespa.ai/en/reference/document-v1-api-reference.html#fieldset) 
parameter that uses a built in set `[all]` that also will return the synthetic fields. This also returns the result of the indexing converters (the embedding tensor and the utc
epoch timestamp). 

In [46]:
from vespa.io import VespaResponse
import json

response:VespaResponse = app.get_data(schema="mail", namespace="assistant", 
    data_id="1",
    groupname="bergum@vespa.ai", fieldSet="[all]")
assert(response.is_successful())
print(json.dumps(response.json, indent=2))

{
  "pathId": "/document/v1/assistant/mail/group/bergum@vespa.ai/1",
  "id": "id:assistant:mail:g=bergum@vespa.ai:1",
  "fields": {
    "body": "Hello Llama Friends \ud83e\udd99 LlamaIndex is 1 year old this week! \ud83c\udf89 To celebrate, we're taking a stroll down memory \n                    lane on our blog with twelve milestones from our first year. Be sure to check it out.",
    "timestamp": 1700038800,
    "display_date": "2023-11-15T09:00:00Z",
    "from": "news@llamaindex.ai",
    "to": "bergum@vespa.ai",
    "embedding": {
      "type": "tensor<bfloat16>(x[384])",
      "values": [
        -0.83984375,
        0.322265625,
        0.431640625,
        -0.1083984375,
        -0.1357421875,
        0.263671875,
        0.388671875,
        -0.310546875,
        0.2041015625,
        0.5078125,
        0.486328125,
        -0.076171875,
        0.125,
        0.1552734375,
        -0.08154296875,
        -0.349609375,
        -0.21875,
        0.212890625,
        -1.046875,
  

Compare that with using the default fieldSet parameter `[document]` that only returns the fields that we sent:

In [47]:
response:VespaResponse = app.get_data(schema="mail", namespace="assistant", 
    data_id="1",
    groupname="bergum@vespa.ai", fieldSet="[document]")
assert(response.is_successful())
print(json.dumps(response.json, indent=2))

{
  "pathId": "/document/v1/assistant/mail/group/bergum@vespa.ai/1",
  "id": "id:assistant:mail:g=bergum@vespa.ai:1",
  "fields": {
    "body": "Hello Llama Friends \ud83e\udd99 LlamaIndex is 1 year old this week! \ud83c\udf89 To celebrate, we're taking a stroll down memory \n                    lane on our blog with twelve milestones from our first year. Be sure to check it out.",
    "display_date": "2023-11-15T09:00:00Z",
    "from": "news@llamaindex.ai",
    "to": "bergum@vespa.ai",
    "subject": "LlamaIndex news, 2023-11-14"
  }
}


### Querying data

Now, we can also query our data. When using [streaming mode](https://docs.vespa.ai/en/reference/query-api-reference.html#streaming), 
we must pass the `groupname` parameter. The query request uses the Vespa Query API where `PyVespa` allows passing any of the Vespa query api parameters
using `**kwargs`. Read more about querying Vespa in:

- [Vespa Query API](https://docs.vespa.ai/en/query-api.html)
- [Vespa Query API reference](https://docs.vespa.ai/en/reference/query-api-reference.html)
- [Vespa Query Language API (YQL)](https://docs.vespa.ai/en/query-language.html)

Sample query request for `when is my dentist appointment` for the user `bergum@vespa.ai`:



In [48]:
from vespa.io import VespaQueryResponse

response:VespaQueryResponse = app.query(
    yql="select subject, display_date, to from sources mail where userQuery()",
    query="when is my dentist appointment", 
    groupname="bergum@vespa.ai", 
    ranking="default"
)
assert(response.is_successful())
print(json.dumps(response.hits[0], indent=2))

{
  "id": "id:assistant:mail:g=bergum@vespa.ai:2",
  "relevance": 1.2115380480627955,
  "source": "assistant_content.mail",
  "fields": {
    "matchfeatures": {
      "freshness(timestamp)": 1.0,
      "nativeRank(body)": 0.09246780326887034,
      "nativeRank(subject)": 0.11907024479392506,
      "my_function": 1.2115380480627955
    },
    "subject": "Dentist Appointment Reminder",
    "to": "bergum@vespa.ai",
    "display_date": "2023-11-20T15:30:00Z"
  }
}


For the above query request, Vespa searched the `default` fieldset which we defined in the schema to match against several fields including the body and the subject. The `default`
rank-profile calculated the relevance score as the sum of three rank-features: `nativeRank(body)` + `nativeRank(subject)` + `freshness(timestamp)` and the result of this computation is the
`relevance` score of the hit. In addition, we also asked for Vespa to return `match-features` that can be used to debug the `relevance` score or for feature logging. 

Now, we can try the `semantic` ranking profile, using Vespa's support for nearestNeighbor search also for streaming mode. Again, we
must specify the `groupname`. This also examplifies how to use the configured `e5` embedder to embed the user query 
into an embedding representation. See [embedding a query text](https://docs.vespa.ai/en/embedding.html#embedding-a-query-text) for more usage examples of using Vespa embedders.

In [49]:
from vespa.io import VespaQueryResponse

response:VespaQueryResponse = app.query(
    yql="select subject, display_date from mail where {targetHits:10}nearestNeighbor(embedding,q)",
    groupname="bergum@vespa.ai", 
    ranking="semantic",
    body={
        "input.query(q)": "embed(e5, \"when is my dentist appointment\")",
    }
)
assert(response.is_successful())
print(json.dumps(response.hits[0], indent=2))

{
  "id": "id:assistant:mail:g=bergum@vespa.ai:2",
  "relevance": 0.9079386507883569,
  "source": "assistant_content.mail",
  "fields": {
    "matchfeatures": {
      "distance(field,embedding)": 0.4324572498488368,
      "freshness(timestamp)": 1.0,
      "cosine": 0.9079386507883569
    },
    "subject": "Dentist Appointment Reminder",
    "display_date": "2023-11-20T15:30:00Z"
  }
}


Notice now that the relevance score is different since the `semantic` rank-profile defined in our schema used `cos(distance(field,embedding))` calculating 
the cosine similarity between the query embedding vector and the document embedding vector. We can try the same query for a different user (`penguin@antarctica.ai`):

In [50]:
from vespa.io import VespaQueryResponse

response:VespaQueryResponse = app.query(
    yql="select subject, display_date from mail where {targetHits:10}nearestNeighbor(embedding,q)",
    groupname="penguin@antarctica.ai", 
    ranking="semantic",
    body={
        "input.query(q)": "embed(e5, \"when is my dentist appointment\")",
    }
)
assert(response.is_successful())
print(json.dumps(response.hits[0], indent=2))

{
  "id": "id:assistant:mail:g=penguin@antarctica.ai:1",
  "relevance": 0.7491816233633459,
  "source": "assistant_content.mail",
  "fields": {
    "matchfeatures": {
      "distance(field,embedding)": 0.7239706506184826,
      "freshness(timestamp)": 1.0,
      "cosine": 0.7491816233633459
    },
    "subject": "Antarctica Expedition: Penguin Chronicles",
    "display_date": "2023-11-25T11:45:00Z"
  }
}


Notice that this query restricted to the user/groupname `penguin@antarctica.ai` and for this user there aren't any relevant hits, but nearestNeighbor
search will still retrieve as there is no distinction between a match, all documents are neighbors, it's just that the distance differs. 

This is solvable either by passing [distanceThreshold](https://docs.vespa.ai/en/nearest-neighbor-search-guide.html#strict-filters-and-distant-neighbors) or
custom [drop ranking expressions](https://docs.vespa.ai/en/faq#how-to-set-a-dynamic-query-time-ranking-drop-threshold) using any feature combination.  

## LlamaIndex Retrievers Introduction

Now, we have a basic Vespa app using streaming mode up. For building an end to end assistant, we likely want to
use a LLM framework like [LangChain](https://www.langchain.com/) or [LLamaIndex](https://www.llamaindex.ai/). In this example 
we use LLamaIndex retrievers. 

LlamaIndex [retriever](https://gpt-index.readthedocs.io/en/latest/core_modules/query_modules/retriever/root.html)
abstraction allows developers to add custom retrievers that retrieve information in Retrieval Augmented Generation (RAG) pipelines. 

For a good introduction to LLamaIndex and it's concepts, see [LLamaIndex High-Level Concepts](https://gpt-index.readthedocs.io/en/latest/getting_started/concepts.html).


In our example, we connect a custom LLamaIndex retriever with the deployed Vespa app. 

To create a custom LlamaIndex Retriever we implement a class that inherts from `llama_index.retrievers.BaseRetriever.BaseRetriever` and 
which  implements `_retrieve(query)`. 

A simple `PersonalAssistantVespaRetriever` could look like the following:

In [51]:

import llama_index.retrievers
from llama_index.schema import  Document, NodeWithScore
from llama_index.indices.query.schema import QueryBundle

from vespa.application import Vespa
from vespa.io import VespaQueryResponse

from typing import List, Union

class PersonalAssistantVespaRetriever(llama_index.retrievers.BaseRetriever):

   def __init__(
      self,
      app: Vespa,
      user: str,
      hits: int = 5,
      vespa_rank_profile: str = "default",
      fields: List[str] = ["subject", "body"]
   ) -> None:
      """Retriever for the Personal Assistant application.
      Args:
      param: app: Vespa application object
      param: user: user id to retrieve documents for (used for streaming groupname)
      param: hits: number of hits to retrieve from Vespa app
      param: vespa_rank_profile: Vespa rank profile to use
      param: fields: fields to retrieve
      """
 
      self.app = app
      self.hits = hits
      self.user = user
      self.vespa_rank_profile = vespa_rank_profile
      self.fields = fields
      self.summary_fields = ",".join(fields)

   def _retrieve(self, query:Union[str,QueryBundle]) -> List[NodeWithScore]:
      """Retrieve documents from Vespa application.
      """
      if isinstance(query, QueryBundle):
         query = query.query_str
      
      if self.vespa_rank_profile == 'default':
         yql:str = f"select {self.summary_fields} from mail where userQuery()"
      else:
         yql = f"select {self.summary_fields} from mail where {{targetHits:10}}nearestNeighbor(embedding,q) or userQuery()"
      vespa_body_request = {
         "yql" : yql,
         "query": query,
         "hits": self.hits,
         "ranking.profile": self.vespa_rank_profile,
         "timeout": "1s",
      }
      if self.vespa_rank_profile != "default":
         vespa_body_request["input.query(q)"] = f"embed(e5, \"{query}\")"

      with self.app.syncio(connections=1) as session:
         response:VespaQueryResponse = session.query(body=vespa_body_request, groupname=self.user)
         if not response.is_successful():
            raise ValueError(f"Query request failed: {response.status_code}, response payload: {response.get_json()}")

      nodes: List[NodeWithScore] = []
      for hit in response.hits:
         response_fields:dict = hit.get('fields', {})
         text: str = ""
         for field in response_fields.keys():
            if isinstance(response_fields[field], str) and field in self.fields:
                  text += response_fields[field] + " "
         id = hit['id']
         doc = Document(id_=id, text=text, 
            metadata=response_fields,    
         )
         nodes.append(NodeWithScore(node=doc, score=hit['relevance']))    
      return nodes                  

The above defines a `PersonalAssistantVespaRetriever` which accepts most importantly a [pyvespa](https://pyvespa.readthedocs.io/en/latest/)
`Vespa` application instance. 

The YQL specifies a hybrid retrieval query that retrieves both using embedding-based retrieval (vector search) 
using Vespa's nearest neighbor search operator in combination with traditional keyword matching.  

Then it reads the Vespa [search result JSON response](https://docs.vespa.ai/en/reference/default-result-format.html)
wrapped by a [VespaQueryResponse ](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespaqueryresponse). 

With the above, we can connect to the running Vespa app and initialize the `PersonalAssistantVespaRetriever` 
for the user `bergum@vespa.ai`. The `user` argument is passed as the Vespa [streaming search groupname
parameter](https://docs.vespa.ai/en/reference/query-api-reference.html#streaming.groupname). This effectively limits the 
data that Vespa needs to stream through. 

In [52]:

retriever = PersonalAssistantVespaRetriever(
    app=app, 
    user="bergum@vespa.ai", 
    vespa_rank_profile="default"
)
retriever.retrieve("When is my dentist appointment?")


[NodeWithScore(node=Document(id_='id:assistant:mail:g=bergum@vespa.ai:2', embedding=None, metadata={'matchfeatures': {'freshness(timestamp)': 1.0, 'nativeRank(body)': 0.09246780326887034, 'nativeRank(subject)': 0.11907024479392506, 'my_function': 1.2115380480627955}, 'subject': 'Dentist Appointment Reminder', 'body': 'Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='7dc7e72dbad349461957058218f4dab0b031e2a91a3f8e4d404994d6af2cde93', text='Dentist Appointment Reminder Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=1.2115380

There we have the top ranking `NodeWithScore`, that could be used for downstream generation. We can also use a different Vespa
[ranking](https://docs.vespa.ai/en/ranking.html) profile. The above used `default` which will be Vespa's [nativeRank](https://docs.vespa.ai/en/nativerank.html) text matching feature. We can also use a different (configured) Vespa rank-profile. Now we can notice that the `score` takes a different value.

In [53]:
retriever = PersonalAssistantVespaRetriever(
    app=app, 
    user="bergum@vespa.ai", 
    vespa_rank_profile="semantic"
)
retriever.retrieve("When is my dentist appointment?")

[NodeWithScore(node=Document(id_='id:assistant:mail:g=bergum@vespa.ai:2', embedding=None, metadata={'matchfeatures': {'distance(field,embedding)': 0.43945494361938975, 'freshness(timestamp)': 1.0, 'cosine': 0.9049836898369259}, 'subject': 'Dentist Appointment Reminder', 'body': 'Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='4bec670f4b497fea7c9cbe31b73abaf305514e2533467df281fba2a38676ace2', text='Dentist Appointment Reminder Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9049836898369259),
 NodeWithScore(node=Document

### Rank fusion

So far we demonstrated ranking using freshness, keyword matching and semantic. Now if we want to combine these without having to think much about 
the score distributions of these features, we can turn to [reciprocal_rank_fusion](https://docs.vespa.ai/en/phased-ranking.html#cross-hit-normalization-including-reciprocal-rank-fusion). 

We deploy this as a final phase ranking where we fuse two different functions:

- The semantic vector matching
- Keywords and freshness 

We deploy a new rank-profile with the following:


In [54]:
from vespa.package import  RankProfile,  Function, GlobalPhaseRanking

fusion = RankProfile(
    name="fusion",
    inherits="semantic",
    functions=[
        Function(
            name="keywords_and_freshness", expression=" nativeRank(subject) + nativeRank(body) + freshness(timestamp)"
        ),
        Function(
            name="semantic", expression="cos(distance(field,embedding))"
        )

    ],
    first_phase="keywords_and_freshness",
    match_features=["nativeRank(subject)", "nativeRank(body)", "keywords_and_freshness", "freshness(timestamp)", "semantic"],
    global_phase=GlobalPhaseRanking(
        rerank_count=1000,
        expression="reciprocal_rank_fusion(semantic, keywords_and_freshness)"
    )
)


Add this new `rank-profile` to the schema and re-deploy the application to Vespa Cloud

In [55]:
mail_schema.add_rank_profile(fusion)
vespa_cloud.deploy()

Deployment started in run 5 of dev-aws-us-east-1c for samples.assistant. This may take about 15 minutes the first time.
INFO    [17:55:59]  Deploying platform version 8.259.15 and application dev build 5 for dev-aws-us-east-1c of default ...
INFO    [17:56:00]  Using CA signed certificate version 0
INFO    [17:56:00]  Using 1 nodes in container cluster 'assistant_container'
INFO    [17:56:02]  Deployment successful.
INFO    [17:56:02]  Session 2759 for tenant 'samples' prepared and activated.
INFO    [17:56:02]  ######## Details for all nodes ########
INFO    [17:56:02]  h88962b.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP
INFO    [17:56:02]  --- platform vespa/cloud-tenant-rhel8:8.259.15
INFO    [17:56:02]  --- storagenode on port 19102 has config generation 2758, wanted is 2759
INFO    [17:56:02]  --- searchnode on port 19107 has config generation 2759, wanted is 2759
INFO    [17:56:02]  --- distributor on port 19111 has config generation 2758, wanted is 2759
I

Vespa(https://e09d41ab.cae25ac9.z.vespa-app.cloud/)

Run a query with the new `fusion` ranking profile:

In [56]:
retriever = PersonalAssistantVespaRetriever(
    app=app, 
    user="bergum@vespa.ai", 
    vespa_rank_profile="fusion"
)
retriever.retrieve("When is my dentist appointment?")

[NodeWithScore(node=Document(id_='id:assistant:mail:g=bergum@vespa.ai:2', embedding=None, metadata={'matchfeatures': {'freshness(timestamp)': 1.0, 'nativeRank(body)': 0.09246780326887034, 'nativeRank(subject)': 0.11907024479392506, 'keywords_and_freshness': 1.2115380480627955, 'semantic': 0.9049836898369259}, 'subject': 'Dentist Appointment Reminder', 'body': 'Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='b2580398d53e8071b11a1040f900e6a4fdf264e1a4c7ebb5838f8729f3bc1674', text='Dentist Appointment Reminder Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}'

## Conclusion 
With a custom retriever one like above, one can start experimenting with [LLamaIndex](https://gpt-index.readthedocs.io/en/stable/) to build 
the personal agent. With Vespa streaming mode, the cost of searching personal data is several orders
lower than personal search built using ANN algorithms as no fields are in-memory and everything is streamed from disk based storage. 


We can now delete the cloud instance:

In [57]:
vespa_cloud.delete()

Deactivated samples.assistant in dev.aws-us-east-1c
Deleted instance samples.assistant.default
