# Getting started with pyvespa

![Vespa logo](https://vespa.ai/assets/vespa-logo-color.png)

This notebook starts Vespa, configures the application and tests the document and query APIs.
Install [jupyter notebook](https://jupyter.org/install#jupyter-notebook)
and start the notebook by selecting `getting-started-pyvespa.ipynb`:

    $ git clone --depth 1 https://github.com/vespa-engine/pyvespa.git
$ jupyter notebook --notebook-dir pyvespa/docs/sphinx/source

Docker is used to run Vespa, alternatively, use [Vespa Cloud](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-cloud.html).
Start Docker and validate minimum 4G available:

In [None]:
!docker info | grep "Total Memory"

## Install pyvespa

In [None]:
!pip install pyvespa

## Create the application package

Create an [application package](https://pyvespa.readthedocs.io/en/latest/create-text-app.html):

In [None]:
from typing import List

from vespa.package import (
    Document,
    Field,
    Schema,
    FieldSet,
    RankProfile,
    HNSW,
    ApplicationPackage,
    QueryProfile,
    QueryProfileType,
    QueryTypeField,
)

from vespa.query import QueryModel, AND, RankProfile as Ranking

class QuestionAnswering(ApplicationPackage):
    def __init__(self, name: str = "qa"):
        context_document = Document(
            fields=[
                Field(
                    name="questions",
                    type="array<int>",
                    indexing=["summary", "attribute"],
                ),
                Field(name="dataset", type="string", indexing=["summary", "attribute"]),
                Field(name="context_id", type="int", indexing=["summary", "attribute"]),
                Field(
                    name="text",
                    type="string",
                    indexing=["summary", "index"],
                    index="enable-bm25",
                ),
            ]
        )
        context_schema = Schema(
            name="context",
            document=context_document,
            fieldsets=[FieldSet(name="default", fields=["text"])],
            rank_profiles=[
                RankProfile(name="bm25", inherits="default", first_phase="bm25(text)"),
                RankProfile(
                    name="nativeRank",
                    inherits="default",
                    first_phase="nativeRank(text)",
                ),
            ],
        )
        sentence_document = Document(
            inherits="context",
            fields=[
                Field(
                    name="sentence_embedding",
                    type="tensor<float>(x[512])",
                    indexing=["attribute", "index"],
                    ann=HNSW(
                        distance_metric="euclidean",
                        max_links_per_node=16,
                        neighbors_to_explore_at_insert=500,
                    ),
                )
            ],
        )
        sentence_schema = Schema(
            name="sentence",
            document=sentence_document,
            fieldsets=[FieldSet(name="default", fields=["text"])],
            rank_profiles=[
                RankProfile(
                    name="semantic-similarity",
                    inherits="default",
                    first_phase="closeness(sentence_embedding)",
                ),
                RankProfile(name="bm25", inherits="default", first_phase="bm25(text)"),
                RankProfile(
                    name="bm25-semantic-similarity",
                    inherits="default",
                    first_phase="bm25(text) + closeness(sentence_embedding)",
                ),
            ],
        )
        super().__init__(
            name=name,
            schema=[context_schema, sentence_schema],
            query_profile=QueryProfile(),
            query_profile_type=QueryProfileType(
                fields=[
                    QueryTypeField(
                        name="ranking.features.query(query_embedding)",
                        type="tensor<float>(x[512])",
                    )
                ]
            ),
        )

app_package = QuestionAnswering()

## Deploy the application

Deploy the `app_package`, wait for _Finished deployment_:

In [None]:
import os
from vespa.deployment import VespaDocker

vespa_docker = VespaDocker(port=8081)
app = vespa_docker.deploy(application_package=app_package)

## Download, prepare and feed sample data

In [None]:
import json, requests

sentence_data = json.loads(
    requests.get("https://data.vespa.oath.cloud/blog/qa/sample_sentence_data_100.json").text
)
list(sentence_data[0].keys())

Prepare the data as a list of dicts having the `id` key holding a unique id of the data point and the `fields` key holding a dict with the data fields required by the application:

In [None]:
batch_feed = [
    {
        "id": idx, 
        "fields": sentence
    }
    for idx, sentence in enumerate(sentence_data)
]

Feed the batch using the `sentence` schema:

In [None]:
response = app.feed_batch(schema="sentence", batch=batch_feed)

## Run a query

Query the application using the [Vespa Query Language](https://docs.vespa.ai/en/query-language.html):

In [None]:
result = app.query(body={
  'yql': 'select text from sources sentence  where userQuery();',
  'query': 'What is in front of the Notre Dame Main Building?',
  'type': 'any',
  'hits': 5,
  'ranking.profile': 'bm25'
})

In [None]:
result.hits[0]

## Get documents

Get the sentences with ids = 0, 1 and 2. Inspect the response in `json`:

In [None]:
batch = [{"id": 0}, {"id": 1}, {"id": 2}]
response = app.get_batch(schema="sentence", batch=batch)

In [None]:
response

In [None]:
response[0].json

## Update a document

Update a data point by `id`. Optionally, `create` the data point if it does not exist:

In [None]:
batch_update = [
    {
        "id": 0,                               # data_id
        "fields": {"text": "this is a test"},  # fields to be updated
        "create": False                        # Optional. Create data point if not exist, default to False.
        
    }
]

In [None]:
response = app.update_batch(schema="sentence", batch=batch_update)

## Delete documents

Delete the sentences with ids = 0, 1 and 2:

In [None]:
batch = [{"id": 0}, {"id": 1}, {"id": 2}]
response = app.delete_batch(schema="sentence", batch=batch)

## Cleanup

In [None]:
from shutil import rmtree

rmtree(os.path.join(os.getcwd(), app_package.name), ignore_errors=True)
vespa_docker.container.stop()
vespa_docker.container.remove()