![Vespa logo](https://vespa.ai/assets/vespa-logo-color.png)

# Multi-vector indexing with HNSW

This is the pyvespa steps of the multi-vector-indexing sample application.
Go to the [source](https://github.com/vespa-engine/sample-apps/tree/master/multi-vector-indexing)
for a full description and prerequisites,
and read the [blog post](https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/).
Highlighted features:

* Approximate Nearest Neighbor Search - using HNSW or exact
* Use a Component to configure the Huggingface embedder.
* Using synthetic fields with auto-generated
  [embeddings](https://docs.vespa.ai/en/embedding.html) in data and query flow -
  see `is_document_field=False` in the `paragraph_embeddings` field definition.
* Application package file export, model files in the application package, deployment from files.
* How to control text search result highlighting.

This notebook requires [pyvespa >= 0.37.1](https://pyvespa.readthedocs.io/en/latest/index.html#requirements)
and the [Vespa CLI](https://pyvespa.readthedocs.io/en/latest/reads-writes.html#Feed-using-Vespa-CLI).

## Create the application

In [None]:
from vespa.package import *
from pathlib import Path

app_package = ApplicationPackage(name="wiki",
              components=[Component(id="e5-small-q", type="hugging-face-embedder",
                  parameters=[
                      Parameter("transformer-model", {"path": "model/e5-small-v2-int8.onnx"}),
                      Parameter("tokenizer-model", {"path": "model/tokenizer.json"})
              ])])

app_package.schema.add_fields(
    Field(name="id", type="int", indexing=["attribute", "summary"]),
    Field(name="title", type="string", indexing=["index", "summary"], index="enable-bm25"),
    Field(name="url", type="string", indexing=["index", "summary"], index="enable-bm25"),
    Field(name="paragraphs", type="array<string>", indexing=["index", "summary"],
          index="enable-bm25", bolding=True),
    Field(name="paragraph_embeddings", type="tensor<float>(p{},x[384])",
          indexing=["input paragraphs", "embed", "index", "attribute"],
          ann=HNSW(distance_metric="angular"),
          is_document_field=False)
    #
    # Alteratively, for exact distance calculation not using HNSW:
    # 
    # Field(name="paragraph_embeddings", type="tensor<float>(p{},x[384])",
    #       indexing=["input paragraphs", "embed", "attribute"],
    #       attribute=["distance-metric: angular"],
    #       is_document_field=False)
    
)

app_package.schema.add_field_set(FieldSet(name="default", fields=["title", "url", "paragraphs"]))

app_package.schema.add_rank_profile(RankProfile(
    name="semantic",
    inputs=[("query(q)", "tensor<float>(x[384])")],
    inherits="default",
    first_phase="cos(distance(field,paragraph_embeddings))",
    match_features=["closest(paragraph_embeddings)"])
)

app_package.schema.add_rank_profile(RankProfile(
        name = "bm25",
        first_phase = "2*bm25(title) + bm25(paragraphs)")
)

app_package.schema.add_rank_profile(RankProfile(
    name="hybrid",
    inherits="semantic",
    functions=[
        Function(name="avg_paragraph_similarity",
            expression="""reduce(
                              sum(l2_normalize(query(q),x) * l2_normalize(attribute(paragraph_embeddings),x),x),
                              avg,
                              p
                          )"""),
        Function(name="max_paragraph_similarity",
            expression="""reduce(
                              sum(l2_normalize(query(q),x) * l2_normalize(attribute(paragraph_embeddings),x),x),
                              max,
                              p
                          )"""),
        Function(name="all_paragraph_similarities",
            expression="sum(l2_normalize(query(q),x) * l2_normalize(attribute(paragraph_embeddings),x),x)")
    ],
    first_phase=FirstPhaseRanking(expression="cos(distance(field,paragraph_embeddings))"),
    second_phase=SecondPhaseRanking(expression="firstPhase + avg_paragraph_similarity() + log( bm25(title) + bm25(paragraphs) + bm25(url))"),
    match_features=["closest(paragraph_embeddings)",
                    "firstPhase",
                    "closest(paragraph_embeddings)",
                    "bm25(title)",
                    "bm25(paragraphs)",
                    "avg_paragraph_similarity",
                    "max_paragraph_similarity",
                    "all_paragraph_similarities"])
)

app_package.schema.add_document_summary(DocumentSummary(name="minimal",
                                        summary_fields=[Summary("id", "int"),
                                                        Summary("title", "string")]))

Path("pkg").mkdir(parents=True, exist_ok=True)
app_package.to_files("pkg")

## Download embedding model files

Read more in [Text embedding made simple](https://blog.vespa.ai/text-embedding-made-simple/):

In [2]:
! mkdir -p pkg/model
! curl -L -o pkg/model/tokenizer.json \
  https://raw.githubusercontent.com/vespa-engine/sample-apps/master/simple-semantic-search/model/tokenizer.json

! curl -L -o pkg/model/e5-small-v2-int8.onnx \
  https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  694k  100  694k    0     0  1016k      0 --:--:-- --:--:-- --:--:-- 1024k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 32.3M  100 32.3M    0     0  4081k      0  0:00:08  0:00:08 --:--:-- 9174k      0 --:--:--  0:00:03 --:--:--     0


## Deploy the application

In [31]:
from vespa.deployment import VespaDocker

vespa_docker = VespaDocker()
app = vespa_docker.deploy_from_disk(application_name="wiki", application_root="pkg")

Waiting for configuration server, 0/300 seconds...
Waiting for configuration server, 5/300 seconds...
Waiting for application status, 0/300 seconds...
Waiting for application status, 5/300 seconds...
Waiting for application status, 10/300 seconds...
Waiting for application status, 15/300 seconds...
Waiting for application status, 20/300 seconds...
Waiting for application status, 25/300 seconds...
Finished deployment.


## Feed documents

Download the Wikipedia articles:

In [4]:
! curl -s -H "Accept:application/vnd.github.v3.raw" \
  https://api.github.com/repos/vespa-engine/sample-apps/contents/multi-vector-indexing/ext/articles.jsonl.zst | \
  zstdcat - > articles.jsonl

Feed and index the Wikipedia articles using the [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html).
This embeds all the paragraphs using the native embedding model:

In [32]:
! vespa config set target local
! vespa feed articles.jsonl

{
  "feeder.seconds": 2.528,
  "feeder.ok.count": 8,
  "feeder.ok.rate": 3.165,
  "feeder.error.count": 0,
  "feeder.inflight.count": 0,
  "http.request.count": 8,
  "http.request.bytes": 12958,
  "http.request.MBps": 0.005,
  "http.exception.count": 0,
  "http.response.count": 8,
  "http.response.bytes": 674,
  "http.response.MBps": 0.000,
  "http.response.error.count": 0,
  "http.response.latency.millis.min": 1091,
  "http.response.latency.millis.avg": 1312,
  "http.response.latency.millis.max": 2524,
  "http.response.code.counts": {
    "200": 8
  }
}


## Simple retrieve all articles with undefined ranking

Refer to [multi-vector-indexing](https://github.com/vespa-engine/sample-apps/tree/master/multi-vector-indexing)
for a description of the following query examples:

In [33]:
result = app.query(body={
  'yql': 'select * from wiki where true',
  'ranking.profile': 'unranked',
  'hits': 2
})
result.hits

[{'id': 'id:wikipedia:wiki::797944',
  'relevance': 0.0,
  'source': 'wiki_content',
  'fields': {'sddocname': 'wiki',
   'paragraphs': ['Abella Danger made her pornography debut in July 2014 for Bang Bros. She has appeared in about 1010 credited scenes. She has appeared in mainstream news media other than adult news media, including the websites "Elite Daily" and "International Business Times".',
    'In 2018, "Fortune" said she was one of the most popular and in-demand performers in the pornographic business.',
    "Amongst many awards, she won Best Pornographic actor in 2021 (Pornovizija '21), nominated by the competent jury committee.",
    'Abella belongs to a Jewish-Ukrainian family. She started as a ballet dancer when she was only three years old.'],
   'documentid': 'id:wikipedia:wiki::797944',
   'title': 'Abella Danger',
   'url': 'https://simple.wikipedia.org/wiki?curid=797944'}},
 {'id': 'id:wikipedia:wiki::377304',
  'relevance': 0.0,
  'source': 'wiki_content',
  'fields'

## Traditional keyword search with BM25 ranking on the article level

In [34]:
result = app.query(body={
  'yql': 'select * from wiki where userQuery()',
  'query': 24,
  'ranking.profile': 'bm25',
  'hits': 2
})
result.hits

[{'id': 'id:wikipedia:wiki::9985',
  'relevance': 4.88768243450246,
  'source': 'wiki_content',
  'fields': {'sddocname': 'wiki',
   'paragraphs': ['The <hi>24</hi>-hour clock is a way of telling the time in which the day runs from midnight to midnight and is divided into <hi>24</hi> hours, numbered from 0 to 23. It does not use a.m. or p.m. This system is also referred to (only in the US and the English speaking parts of Canada) as military time or (only in the United Kingdom and now very rarely) as continental time. In some parts of the world, it is called railway time. Also, the international standard notation of time (ISO 8601) is based on this format.',
    'A time in the <hi>24</hi>-hour clock is written in the form hours:minutes (for example, 01:23), or hours:minutes:seconds (01:23:45). Numbers under 10 have a zero in front (called a leading zero); e.g. 09:07. Under the <hi>24</hi>-hour clock system, the day begins at midnight, 00:00, and the last minute of the day begins at 23:

## Semantic vector search on the paragraph level

In [26]:
result = app.query(body={
  'yql': 'select * from wiki where {targetHits:1}nearestNeighbor(paragraph_embeddings,q)',
  'input.query(q)': 'embed(what does 24 mean in the context of railways)',
  'ranking.profile': 'semantic',
  'hits': 2
})
result.hits

[{'id': 'id:wikipedia:wiki::9985',
  'relevance': 0.8807156260391702,
  'source': 'wiki_content',
  'fields': {'matchfeatures': {'closest(paragraph_embeddings)': {'type': 'tensor<float>(p{})',
     'cells': {'4': 1.0}}},
   'sddocname': 'wiki',
   'paragraphs': ['The 24-hour clock is a way of telling the time in which the day runs from midnight to midnight and is divided into 24 hours, numbered from 0 to 23. It does not use a.m. or p.m. This system is also referred to (only in the US and the English speaking parts of Canada) as military time or (only in the United Kingdom and now very rarely) as continental time. In some parts of the world, it is called railway time. Also, the international standard notation of time (ISO 8601) is based on this format.',
    'A time in the 24-hour clock is written in the form hours:minutes (for example, 01:23), or hours:minutes:seconds (01:23:45). Numbers under 10 have a zero in front (called a leading zero); e.g. 09:07. Under the 24-hour clock system, 

## Hybrid search and ranking

Hybrid combining keyword search on the article level with vector search in the paragraph index:

In [35]:
result = app.query(body={
  'yql': 'select * from wiki where userQuery() or ({targetHits:1}nearestNeighbor(paragraph_embeddings,q))',
  'input.query(q)': 'embed(what does 24 mean in the context of railways)',
  'query': 'what does 24 mean in the context of railways',
  'ranking.profile': 'hybrid',
  'hits': 1
})
result.hits

[{'id': 'id:wikipedia:wiki::9985',
  'relevance': 4.163377257526372,
  'source': 'wiki_content',
  'fields': {'matchfeatures': {'bm25(paragraphs)': 10.468827250036052,
    'bm25(title)': 1.1272217840066165,
    'closest(paragraph_embeddings)': {'type': 'tensor<float>(p{})',
     'cells': {'4': 1.0}},
    'firstPhase': 0.8807156260391702,
    'all_paragraph_similarities': {'type': 'tensor<float>(p{})',
     'cells': {'1': 0.8061168789863586,
      '2': 0.7993348240852356,
      '3': 0.8240271806716919,
      '4': 0.880715548992157,
      '0': 0.8497915267944336}},
    'avg_paragraph_similarity': 0.8319971919059753,
    'max_paragraph_similarity': 0.880715548992157},
   'sddocname': 'wiki',
   'paragraphs': ['<hi>The</hi> <hi>24</hi>-hour clock is a way <hi>of</hi> telling <hi>the</hi> time <hi>in</hi> which <hi>the</hi> day runs from midnight to midnight and is divided into <hi>24</hi> hours, numbered from 0 to 23. It <hi>does</hi> not use a.m. or p.m. This system is also referred to (o

## Hybrid search and filter

Filtering is also supported, also disable bolding:

In [36]:
result = app.query(body={
  'yql': 'select * from wiki where url contains "9985" and userQuery() or ({targetHits:1}nearestNeighbor(paragraph_embeddings,q))',
  'input.query(q)': 'embed(what does 24 mean in the context of railways)',
  'query': 'what does 24 mean in the context of railways',
  'ranking.profile': 'hybrid',
  'bolding': False
})
result.hits

[{'id': 'id:wikipedia:wiki::9985',
  'relevance': 4.307057297582033,
  'source': 'wiki_content',
  'fields': {'matchfeatures': {'bm25(paragraphs)': 10.468827250036052,
    'bm25(title)': 1.1272217840066165,
    'closest(paragraph_embeddings)': {'type': 'tensor<float>(p{})',
     'cells': {'4': 1.0}},
    'firstPhase': 0.8807156260391702,
    'all_paragraph_similarities': {'type': 'tensor<float>(p{})',
     'cells': {'1': 0.8061168789863586,
      '2': 0.7993348240852356,
      '3': 0.8240271806716919,
      '4': 0.880715548992157,
      '0': 0.8497915267944336}},
    'avg_paragraph_similarity': 0.8319971919059753,
    'max_paragraph_similarity': 0.880715548992157},
   'sddocname': 'wiki',
   'paragraphs': ['The 24-hour clock is a way of telling the time in which the day runs from midnight to midnight and is divided into 24 hours, numbered from 0 to 23. It does not use a.m. or p.m. This system is also referred to (only in the US and the English speaking parts of Canada) as military time

## Cleanup

In [37]:
vespa_docker.container.stop()
vespa_docker.container.remove()