In [5]:
%config Completer.use_jedi = False

# pyvespa release: version 0.5.0

> Summary of the improvements introduced in pyvespa 0.5.0

- toc: true 
- badges: false
- comments: true
- categories: [vespa, pyvespa, search, search evaluation]

We just made a series of improvements to pyvespa evaluation framework. This post summarizes the changes.

## Connect to a sample app

We will use the [cord19 app](https://cord19.vespa.ai/) in our demonstration. The focus here is not on this specific app, but to use a few sample queries to illustrate the latest changes in pyvespa `0.5.0`.

In [1]:
from vespa.application import Vespa

app = Vespa(url = "https://api.cord19.vespa.ai")

## Flexible QueryModel

A [QueryModel](https://pyvespa.readthedocs.io/en/latest/reference-api.html#query-model) is an abstraction that encapsulates all the relevant information controlling how your app match and rank documents. A `QueryModel` can be used for [querying](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.query), [evaluating](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.evaluate) and [collecting data](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.collect_training_data) from your app.

Before version `0.5.0`, the only way to build a `QueryModel` was by specifying arguments like `match_phase` and `rank_profile` using the pyvespa API, such as [match operators](https://pyvespa.readthedocs.io/en/latest/reference-api.html#match-phase). For example:

In [2]:
from vespa.query import QueryModel, RankProfile, OR

standard_query_model = QueryModel(
    name="or_bm25",
    match_phase = OR(),
    rank_profile = RankProfile(name="bm25")
)

Starting in version `0.5.0` we can bypass the pyvespa high-level API and create a `QueryModel` with the full flexibility of the [Vespa Query API](https://docs.vespa.ai/en/reference/query-api-reference.html). This is useful for use cases not covered by the pyvespa API and for users that are familiar with and prefer to work with the Vespa Query API.

In [3]:
def body_function(query):
    body = {'yql': 'select * from sources * where userQuery();',
            'query': query,
            'type': 'any',
            'ranking': {'profile': 'bm25', 'listFeatures': 'false'}}
    return body

flexible_query_model = QueryModel(body_function = body_function)

The `flexible_query_model` defined above is equivalent to the `standard_query_model`, as we can see when querying the `app`.

In [13]:
standard_result = app.query(query="this is a test", query_model=standard_query_model)
standard_result.get_hits().head(3)

Unnamed: 0,qid,doc_id,score,rank
0,0,id:covid-19:doc::31328,11.282253,0
1,0,id:covid-19:doc::142863,11.282253,1
2,0,id:covid-19:doc::187156,11.266751,2


In [14]:
flexible_result = app.query(query="this is a test", query_model=flexible_query_model)
flexible_result.get_hits().head(3)

Unnamed: 0,qid,doc_id,score,rank
0,0,id:covid-19:doc::31328,11.282253,0
1,0,id:covid-19:doc::142863,11.282253,1
2,0,id:covid-19:doc::187156,11.266751,2


## Query output format

* PR: [Make it possible to format query results](https://github.com/vespa-engine/pyvespa/pull/119)

In [None]:
res = app.query(query = "this is a test", query_model=query_model)

Full Vespa output

In [None]:
res.json

Vespa hits

In [None]:
res.hits

Get formatted hits

In [None]:
res.get_hits()

Choose `id_field` to be returned as `doc_id` column and specify desired `qid`.

In [None]:
res.get_hits(id_field = "cord_uid", qid = 2)

## Evaluation framework

In [None]:
from vespa.evaluation import MatchRatio, Recall, ReciprocalRank

eval_metrics = [MatchRatio(), Recall(at=10), eiddccidrbcvlbtnjiriejnrtgjeffggcdvrkcucrnnk
                ReciprocalRank(at=10)]

### Allow df as input to app.evaluate

* PR: [Support df as input to app.evaluate](https://github.com/vespa-engine/pyvespa/pull/120)

We accept two types of labeled_data format. The first is a DataFrame with ["qid", "query", "doc_id", "relevance"] columns.

In [None]:
from pandas import DataFrame

labeled_data_df = DataFrame(
    data={
        "qid": [0] * 2 + [1] * 2, 
        "query": ["Intrauterine virus infections and congenital heart disease"] * 2 + 
                 ["Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus"] * 2,
        "doc_id": [120761, 145189, 49, 11317],
        "relevance": [1,1,1,1]
    }
)
labeled_data_df

In [None]:
evaluation = app.evaluate(
    labeled_data = labeled_data_df,
    eval_metrics = eval_metrics,
    query_model = query_model,
    id_field = "id",
)
evaluation

The second input type is a list of dicts. It is a more concise version where we do not need to repeat `query_id` and `query` for every relevant document.

In [None]:
labeled_data = [
    {
        "query_id": 0, 
        "query": "Intrauterine virus infections and congenital heart disease",
        "relevant_docs": [{"id": 120761, "score": 1}, {"id": 145189, "score": 1}]
    },
    {
        "query_id": 1, 
        "query": "Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus",
        "relevant_docs": [{"id": 49, "score": 1}, {"id": 11317, "score": 1}]
    }
]

In [None]:
evaluation = app.evaluate(
    labeled_data = labeled_data,
    eval_metrics = eval_metrics,
    query_model = query_model,
    id_field = "id",
)
evaluation

### Make app.evaluate return simplied metrics as default

* PR: [Simplified metrics output by default with option for detailed metrics](https://github.com/vespa-engine/pyvespa/pull/121)

In [None]:
evaluation = app.evaluate(
    labeled_data = labeled_data,
    eval_metrics = eval_metrics,
    query_model = query_model,
    id_field = "id",
    detailed_metrics = True
)
evaluation

## Allow multiple query models as input to evaluate

* PR: [Evaluate multiple query models](https://github.com/vespa-engine/pyvespa/pull/122)

In [None]:
from vespa.query import QueryModel, RankProfile, OR, AND

query_model_1 = QueryModel(
    name="or_bm25",
    match_phase = OR(),
    rank_profile = RankProfile(name="bm25", list_features=True)
)
query_model_2 = QueryModel(
    name="and_bm25",
    match_phase = AND(),
    rank_profile = RankProfile(name="bm25", list_features=True)
)

In [None]:
evaluation = app.evaluate(
    labeled_data = labeled_data,
    eval_metrics = eval_metrics,
    query_model = [query_model_1, query_model_2],
    id_field = "id",
)
evaluation

### Make app.evaluate return aggregare metrics by default and add per_query = True as argument

* PR: [Enable per_query argument. Default to per model summary](https://github.com/vespa-engine/pyvespa/pull/124)

In [None]:
evaluation = app.evaluate(
    labeled_data = labeled_data,
    eval_metrics = eval_metrics,
    query_model = [query_model_1, query_model_2],
    id_field = "id",
)
evaluation

In [None]:
evaluation = app.evaluate(
    labeled_data = labeled_data,
    eval_metrics = eval_metrics,
    query_model = [query_model_1, query_model_2],
    id_field = "id",
    aggregators = ["max", "min"]
)
evaluation

In [None]:
evaluation = app.evaluate(
    labeled_data = labeled_data,
    eval_metrics = eval_metrics,
    query_model = [query_model_1, query_model_2],
    id_field = "id",
    per_query = True
)
evaluation