# Query application

Python API to query Vespa applications.

This tutorial goes through how to connect to a pre-existing Vespa instance and use the Query API, using the https://cord19.vespa.ai/ app as an example. You can run this tutorial in Google Colab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vespa-engine/pyvespa/blob/master/docs/sphinx/source/query.ipynb)

In [1]:
from vespa.application import Vespa

app = Vespa(url = "https://api.cord19.vespa.ai")

## Specify the request body

Full flexibility by specifying the entire request body,
see the [Vespa query language](https://docs.vespa.ai/en/reference/query-api-reference.html).

In [2]:
body = {
  'yql': 'select cord_uid, title, abstract from sources * where userQuery();',
  'hits': 5,
  'query': 'Is remdesivir an effective treatment for COVID-19?',
  'type': 'any',
  'ranking': 'bm25'
}

In [3]:
results = app.query(body=body)

In [4]:
results.number_documents_retrieved

9865

Number of documents returned:

In [5]:
len(results.hits)

5

We can then retrieve specific information from the hit list thorugh the `results.hits` or access the entire Vespa response through `results.json`.

In [6]:
[hit["fields"]["cord_uid"] for hit in results.hits]

['8n6eybze', '2lwzhqer', '8n6eybze', '8art2tyj', 'xej338lo']

## Specify a query model

### Query + term-matching + rank profile

In [7]:
from learntorank.query import QueryModel, OR, Ranking

results = app.query(
    query="Is remdesivir an effective treatment for COVID-19?", 
    query_model = QueryModel(
        match_phase=OR(), 
        ranking=Ranking(name="bm25")
    )
)

In [8]:
results.number_documents_retrieved

268740

### Query + term-matching + ann operator + rank_profile

In [9]:
from learntorank.query import QueryModel, QueryRankingFeature, ANN, WeakAnd, Union, Ranking
from random import random

match_phase = Union(
    WeakAnd(hits = 10), 
    ANN(
        doc_vector="title_embedding", 
        query_vector="title_vector", 
        hits = 10,
        label="title"
    )
)
ranking = Ranking(name="bm25", list_features=True)
query_model = QueryModel(
    query_properties=[QueryRankingFeature(
        name="title_vector", 
        mapping=lambda x: [random() for x in range(768)]
    )],
    match_phase=match_phase, ranking=ranking
)

In [10]:
results = app.query(query="Is remdesivir an effective treatment for COVID-19?", 
          query_model=query_model)

In [11]:
results.number_documents_retrieved

1520

## Recall specific documents

Let's take a look at the top 3 ids from the last query.

In [12]:
top_ids = [hit["fields"]["id"] for hit in results.hits[0:3]]
top_ids

[144384, 269386, 144385]

Assume that we now want to retrieve the second and third ids above. We can do so with the `recall` argument.

In [13]:
results_with_recall = app.query(query="Is remdesivir an effective treatment for COVID-19?", 
                    query_model=query_model,
                    recall = ("id", top_ids[1:3]))

It will only retrieve the documents with Vespa field `id` that is defined on the list that is inside the tuple.

In [14]:
id_recalled = [hit["fields"]["id"] for hit in results_with_recall.hits]
id_recalled

[269386, 144385]