## Connect to a sample app

In [2]:
from vespa.application import Vespa

app = Vespa(url = "https://api.cord19.vespa.ai")

## More flexible way to specify query models

* PR: [Make QueryModel more flexible by adding body_function argument](https://github.com/vespa-engine/pyvespa/pull/118)

Standard query model

In [3]:
from vespa.query import QueryModel, RankProfile, OR

query_model_1 = QueryModel(
    name="or_bm25",
    match_phase = OR(),
    rank_profile = RankProfile(name="bm25")
)

Newer flexible query model - allow us too specify a parameterized version of the Vespa Query API.

In [4]:
def body_function(query):
    body = {'yql': 'select * from sources * where userQuery();',
            'query': query,
            'type': 'any',
            'ranking': {'profile': 'bm25', 'listFeatures': 'false'}}
    return body

query_model = QueryModel(body_function = body_function)

## Query output format

* PR: [Make it possible to format query results](https://github.com/vespa-engine/pyvespa/pull/119)

In [14]:
res = app.query(query = "this is a test", query_model=query_model)

Full Vespa output

In [15]:
res.json

{'root': {'id': 'toplevel',
  'relevance': 1.0,
  'fields': {'totalCount': 236369},
  'coverage': {'coverage': 100,
   'documents': 309201,
   'full': True,
   'nodes': 4,
   'results': 1,
   'resultsFull': 1},
  'children': [{'id': 'id:covid-19:doc::31328',
    'relevance': 11.29865209239005,
    'source': 'content',
    'fields': {'sddocname': 'doc',
     'body_text': 'Governments around the world are looking for a <hi>testing</hi> strategy for COVID-19; one that will ensure that the escape from lockdown is as fast as possible, given a<sep /> by Romer (2020b). Our proposal is ‘stratified’ because it focuses <hi>testing</hi> on groups who are at particular risk of spreading the infection. This may be because<sep />contact with others. Their basic reproduction number will, as a result, be very high and very frequent <hi>testing</hi> will be necessary to ensure that their effective reproduction number is low enough. As a result, there could be<sep />',
     'title': 'A workable strategy

Vespa hits

In [16]:
res.hits

[{'id': 'id:covid-19:doc::31328',
  'relevance': 11.29865209239005,
  'source': 'content',
  'fields': {'sddocname': 'doc',
   'body_text': 'Governments around the world are looking for a <hi>testing</hi> strategy for COVID-19; one that will ensure that the escape from lockdown is as fast as possible, given a<sep /> by Romer (2020b). Our proposal is ‘stratified’ because it focuses <hi>testing</hi> on groups who are at particular risk of spreading the infection. This may be because<sep />contact with others. Their basic reproduction number will, as a result, be very high and very frequent <hi>testing</hi> will be necessary to ensure that their effective reproduction number is low enough. As a result, there could be<sep />',
   'title': 'A workable strategy for COVID-19 <hi>testing</hi>: stratified periodic <hi>testing</hi> rather than universal random <hi>testing</hi>',
   'abstract': 'This paper argues for the regular <hi>testing</hi> of people in groups that are more likely to be expo

Get formatted hits

In [17]:
res.get_hits()

Unnamed: 0,qid,doc_id,score,rank
0,0,id:covid-19:doc::31328,11.298652,0
1,0,id:covid-19:doc::142863,11.291205,1
2,0,id:covid-19:doc::187156,11.275726,2
3,0,id:covid-19:doc::119791,10.895973,3
4,0,id:covid-19:doc::308195,10.895973,4
5,0,id:covid-19:doc::54708,10.776745,5
6,0,id:covid-19:doc::200685,10.721157,6
7,0,id:covid-19:doc::169325,10.689229,7
8,0,id:covid-19:doc::170582,10.659681,8
9,0,id:covid-19:doc::157719,10.617025,9


Choose `id_field` to be returned as `doc_id` column and specify desired `qid`.

In [18]:
res.get_hits(id_field = "cord_uid", qid = 2)

Unnamed: 0,qid,doc_id,score,rank
0,2,moy0u7n5,11.298652,0
1,2,0p6vrujx,11.291205,1
2,2,rhmywn8n,11.275726,2
3,2,2tafauc2,10.895973,3
4,2,jg45bnzq,10.895973,4
5,2,e9q86x6n,10.776745,5
6,2,lel5jqls,10.721157,6
7,2,w4f2qmje,10.689229,7
8,2,emogyuht,10.659681,8
9,2,9x85276w,10.617025,9


## Evaluation framework

In [19]:
from vespa.evaluation import MatchRatio, Recall, ReciprocalRank

eval_metrics = [MatchRatio(), Recall(at=10), eiddccidrbcvlbtnjiriejnrtgjeffggcdvrkcucrnnk
                ReciprocalRank(at=10)]

### Allow df as input to app.evaluate

* PR: [Support df as input to app.evaluate](https://github.com/vespa-engine/pyvespa/pull/120)

We accept two types of labeled_data format. The first is a DataFrame with ["qid", "query", "doc_id", "relevance"] columns.

In [24]:
from pandas import DataFrame

labeled_data_df = DataFrame(
    data={
        "qid": [0] * 2 + [1] * 2, 
        "query": ["Intrauterine virus infections and congenital heart disease"] * 2 + 
                 ["Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus"] * 2,
        "doc_id": [120761, 145189, 49, 11317],
        "relevance": [1,1,1,1]
    }
)
labeled_data_df

Unnamed: 0,qid,query,doc_id,relevance
0,0,Intrauterine virus infections and congenital h...,120761,1
1,0,Intrauterine virus infections and congenital h...,145189,1
2,1,Clinical and immunologic studies in identical ...,49,1
3,1,Clinical and immunologic studies in identical ...,11317,1


In [25]:
evaluation = app.evaluate(
    labeled_data = labeled_data_df,
    eval_metrics = eval_metrics,
    query_model = query_model,
    id_field = "id",
)
evaluation

Unnamed: 0,model,default_name
match_ratio,mean,0.853521
match_ratio,median,0.853521
match_ratio,std,0.055102
recall_10,mean,0.75
recall_10,median,0.75
recall_10,std,0.353553
reciprocal_rank_10,mean,0.0
reciprocal_rank_10,median,0.0
reciprocal_rank_10,std,0.0


The second input type is a list of dicts. It is a more concise version where we do not need to repeat `query_id` and `query` for every relevant document.

In [26]:
labeled_data = [
    {
        "query_id": 0, 
        "query": "Intrauterine virus infections and congenital heart disease",
        "relevant_docs": [{"id": 120761, "score": 1}, {"id": 145189, "score": 1}]
    },
    {
        "query_id": 1, 
        "query": "Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus",
        "relevant_docs": [{"id": 49, "score": 1}, {"id": 11317, "score": 1}]
    }
]

In [27]:
evaluation = app.evaluate(
    labeled_data = labeled_data,
    eval_metrics = eval_metrics,
    query_model = query_model,
    id_field = "id",
)
evaluation

Unnamed: 0,model,default_name
match_ratio,mean,0.853521
match_ratio,median,0.853521
match_ratio,std,0.055102
recall_10,mean,0.75
recall_10,median,0.75
recall_10,std,0.353553
reciprocal_rank_10,mean,0.0
reciprocal_rank_10,median,0.0
reciprocal_rank_10,std,0.0


### Make app.evaluate return simplied metrics as default

* PR: [Simplified metrics output by default with option for detailed metrics](https://github.com/vespa-engine/pyvespa/pull/121)

In [7]:
evaluation = app.evaluate(
    labeled_data = labeled_data,
    eval_metrics = eval_metrics,
    query_model = query_model,
    id_field = "id",
    detailed_metrics = True
)
evaluation

Unnamed: 0,model,default_name
match_ratio,mean,0.853521
match_ratio,median,0.853521
match_ratio,std,0.055102
match_ratio_retrieved_docs,mean,263909.5
match_ratio_retrieved_docs,median,263909.5
match_ratio_retrieved_docs,std,17037.737893
match_ratio_docs_available,mean,309201.0
match_ratio_docs_available,median,309201.0
match_ratio_docs_available,std,0.0
recall_10,mean,0.0


## Allow multiple query models as input to evaluate

* PR: [Evaluate multiple query models](https://github.com/vespa-engine/pyvespa/pull/122)

In [28]:
from vespa.query import QueryModel, RankProfile, OR, AND

query_model_1 = QueryModel(
    name="or_bm25",
    match_phase = OR(),
    rank_profile = RankProfile(name="bm25", list_features=True)
)
query_model_2 = QueryModel(
    name="and_bm25",
    match_phase = AND(),
    rank_profile = RankProfile(name="bm25", list_features=True)
)

In [29]:
evaluation = app.evaluate(
    labeled_data = labeled_data,
    eval_metrics = eval_metrics,
    query_model = [query_model_1, query_model_2],
    id_field = "id",
)
evaluation

Unnamed: 0,model,and_bm25,or_bm25
match_ratio,mean,3e-06,0.853521
match_ratio,median,3e-06,0.853521
match_ratio,std,0.0,0.055102
recall_10,mean,0.0,0.75
recall_10,median,0.0,0.75
recall_10,std,0.0,0.353553
reciprocal_rank_10,mean,0.0,0.0
reciprocal_rank_10,median,0.0,0.0
reciprocal_rank_10,std,0.0,0.0


### Make app.evaluate return aggregare metrics by default and add per_query = True as argument

* PR: [Enable per_query argument. Default to per model summary](https://github.com/vespa-engine/pyvespa/pull/124)

In [10]:
evaluation = app.evaluate(
    labeled_data = labeled_data,
    eval_metrics = eval_metrics,
    query_model = [query_model_1, query_model_2],
    id_field = "id",
)
evaluation

Unnamed: 0,model,and_bm25,or_bm25
match_ratio,mean,3e-06,0.853521
match_ratio,median,3e-06,0.853521
match_ratio,std,0.0,0.055102
recall_10,mean,0.0,0.0
recall_10,median,0.0,0.0
recall_10,std,0.0,0.0
reciprocal_rank_10,mean,0.0,0.0
reciprocal_rank_10,median,0.0,0.0
reciprocal_rank_10,std,0.0,0.0


In [12]:
evaluation = app.evaluate(
    labeled_data = labeled_data,
    eval_metrics = eval_metrics,
    query_model = [query_model_1, query_model_2],
    id_field = "id",
    aggregators = ["max", "min"]
)
evaluation

Unnamed: 0,model,and_bm25,or_bm25
match_ratio,max,3e-06,0.892484
match_ratio,min,3e-06,0.814558
recall_10,max,0.0,0.0
recall_10,min,0.0,0.0
reciprocal_rank_10,max,0.0,0.0
reciprocal_rank_10,min,0.0,0.0


In [11]:
evaluation = app.evaluate(
    labeled_data = labeled_data,
    eval_metrics = eval_metrics,
    query_model = [query_model_1, query_model_2],
    id_field = "id",
    per_query = True
)
evaluation

Unnamed: 0,model,query_id,match_ratio,recall_10,reciprocal_rank_10
0,or_bm25,0,0.814558,0.0,0
1,and_bm25,0,3e-06,0.0,0
2,or_bm25,1,0.892484,0.0,0
3,and_bm25,1,3e-06,0.0,0
