![Vespa Cloud logo](https://cloud.vespa.ai/assets/logos/vespa-cloud-logo-full-black.png)

# Text Search on Vespa Cloud - quickstart

This is the same guide as [getting-started-pyvespa](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa.html), deploying to Vespa Cloud.

Refer to [troubleshooting](https://pyvespa.readthedocs.io/en/latest/troubleshooting.html) for any problem when running this guide.

Pre-requisite: Create a tenant at [cloud.vespa.ai](https://cloud.vespa.ai/), save the tenant name.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vespa-engine/pyvespa/blob/master/docs/sphinx/source/getting-started-pyvespa-cloud.ipynb)

# Install

Install [pyvespa](https://pyvespa.readthedocs.io/) >= 0.35
and the [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html).
The Vespa CLI is used for key management:

In [None]:
!pip3 install pyvespa

Install the Vespa CLI using homebrew:

In [None]:
!brew install vespa-cli

Alternatively, if running in Colab, download the Vespa CLI:

In [None]:
import os
import requests
res = requests.get(url="https://api.github.com/repos/vespa-engine/vespa/releases/latest").json()
os.environ["VERSION"] = res["tag_name"].replace("v", "")
!curl -fsSL https://github.com/vespa-engine/vespa/releases/download/v${VERSION}/vespa-cli_${VERSION}_linux_amd64.tar.gz | tar -zxf -
!ln -sf /content/vespa-cli_${VERSION}_linux_amd64/bin/vespa /usr/local/bin/vespa

# Configure application and keys

Create Vespa Cloud data-plane cert/key-pair:

In [None]:
import os
os.environ["TENANT_NAME"] = "mytenant" # Your tenant name here

!vespa config set target cloud
!vespa config set application ${TENANT_NAME}.textsearch
!vespa auth cert -N

Authenticate to get API key for deployment and save path for it:

In [None]:
!vespa auth api-key

from pathlib import Path
api_key_path = str(Path.home()) + "/.vespa/" + os.getenv("TENANT_NAME") + ".api-key.pem"

## Create an application package

The [application package](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.package.ApplicationPackage)
has all the Vespa configuration files -
create one from scratch:

In [None]:
from vespa.package import ApplicationPackage

app_name = "textsearch"
app_package = ApplicationPackage(name=app_name)

Note that the name cannot have `-` or `_`.

The above will create an empty schema with the same name as the application package.

## Add fields to the schema

Add [fields](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.package.Field)
to the [schema](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.package.Schema):

In [None]:
from vespa.package import Field

app_package.schema.add_fields(
    Field(name = "id",    type = "string", indexing = ["attribute", "summary"]),
    Field(name = "title", type = "string", indexing = ["index", "summary"], index = "enable-bm25"),
    Field(name = "body",  type = "string", indexing = ["index", "summary"], index = "enable-bm25")
)

* `id` holds the document ids, while `title` and `body` are the text fields of the documents.

* Setting `"index"` in `indexing` means that a searchable index for `title` and `body` is created.
  Read more about [indexing options](https://docs.vespa.ai/en/reference/schema-reference.html#indexing). 

* Setting `index = "enable-bm25"` will pre-compute quantities to make it fast to compute the BM25 score.

## Search multiple fields

A [FieldSet](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.package.FieldSet)
groups fields together for searching -
it configures queries to look for matches both in the `title` and `body` fields of the documents:

In [None]:
from vespa.package import FieldSet

app_package.schema.add_field_set(
    FieldSet(name = "default", fields = ["title", "body"])
)

## Define ranking

Specify how to rank the matched documents by defining a
[RankProfile](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.package.RankProfile).
Below are different rank profiles that can be selected in the query:

In [None]:
from vespa.package import RankProfile

app_package.schema.add_rank_profile(
    RankProfile(name = "bm25", first_phase = "bm25(title) + bm25(body)")
)
app_package.schema.add_rank_profile(
    RankProfile(name = "native_rank", first_phase = "nativeRank(title, body)")
)

## Deploy

The text search app with fields, a fieldset to group fields together, and rank profiles
is now defined and ready to deploy.
Deploy `app_package` to Vespa Cloud, by creating an instance of
[VespaCloud](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.deployment.VespaCloud):

In [None]:
from vespa.deployment import VespaCloud

vespa_cloud = VespaCloud(
    tenant=os.getenv("TENANT_NAME"),
    application=app_name,
    key_location=api_key_path,
    application_package=app_package)

In [None]:
app = vespa_cloud.deploy(instance="default")

`app` now holds a reference to a [Vespa](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa) instance.

## Feed

Download approx 10K documents:

In [None]:
from pandas import read_csv

docs = read_csv(filepath_or_buffer="https://data.vespa.oath.cloud/blog/msmarco/sample_docs.csv").fillna('')
docs.head()

[Feed](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.feed_df) the documents to the application:

In [None]:
feed_res = app.feed_df(docs, asynchronous=True, batch_size=100)

## Query

Query the text search app using the [Vespa Query language](https://docs.vespa.ai/en/query-language.html)
by sending the parameters to the body argument of
[Vespa.query](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.query) -
here using the `bm25` rank profile:

In [None]:
query = {
    'yql': 'select * from sources * where userQuery()',
    'query': 'what keeps planes in the air',
    'ranking': 'bm25',
    'type': 'all',
    'hits': 10
}
res = app.query(body=query)
res.hits[0]

## Next steps

This is just an intro into the capabilities of Vespa and pyvespa.
Browse the site to learn more about schemas, feeding and queries - 
find more complex applications in
[examples](https://pyvespa.readthedocs.io/en/latest/examples.html).