# Preparation

In [1]:
!pip install --upgrade git+https://github.com/terrier-org/pyterrier.git#egg=python-terrier
#!pip install python-terrier

Collecting python-terrier
  Cloning https://github.com/terrier-org/pyterrier.git to /tmp/pip-install-_cprmknv/python-terrier
  Running command git clone -q https://github.com/terrier-org/pyterrier.git /tmp/pip-install-_cprmknv/python-terrier
Collecting pyjnius~=1.3.0
[?25l  Downloading https://files.pythonhosted.org/packages/d8/50/098cb5fb76fb7c7d99d403226a2a63dcbfb5c129b71b7d0f5200b05de1f0/pyjnius-1.3.0-cp36-cp36m-manylinux2010_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 2.8MB/s 
Collecting wget
  Downloading https://files.pythonhosted.org/packages/47/6a/62e288da7bcda82b935ff0c6cfe542970f04e29c756b0e147251b2fb251f/wget-3.2.zip
Collecting pytrec_eval
  Downloading https://files.pythonhosted.org/packages/36/0a/5809ba805e62c98f81e19d6007132712945c78e7612c11f61bac76a25ba3/pytrec_eval-0.4.tar.gz
Collecting matchpy
[?25l  Downloading https://files.pythonhosted.org/packages/47/95/d265b944ce391bb2fa9982d7506bbb197bb55c5088ea74448a5ffcaeefab/matchpy-0.5.1-py3-none-any

# Init 

You must run `pt.init()` before other pyterrier functions and classes

Arguments:    
 - `version` - terrier IR version e.g. "5.2"    
 - `mem` - megabytes allocated to java e.g. "4096"      


In [2]:
import pyterrier as pt
if not pt.started():
  pt.init()

terrier-assemblies 5.2  jar-with-dependencies not found, downloading to /root/.pyterrier...
Done
terrier-python-helper 0.0.2  jar not found, downloading to /root/.pyterrier...
Done


# Vaswani_NPL

We're going to use a very old IR test collection called [Vaswani_NPL](http://ir.dcs.gla.ac.uk/resources/test_collections/npl/). This is included with Terrier, but we provide access here to pre-made indices, along with the topics and qrels:
 

In [0]:
vaswani_dataset = pt.datasets.get_dataset("vaswani")

# Load an existing index

In [4]:
indexref = vaswani_dataset.get_index()
index = pt.IndexFactory.of(indexref)

print(index.getCollectionStatistics().toString())

Downloading vaswani index to /root/.pyterrier/corpora/vaswani/index
Number of documents: 11429
Number of terms: 7756
Number of fields: 0
Field names: []
Number of tokens: 271581



# Retrieval

Normally, we would use pt.io.read_topics(topics_path) to parse a topics file. 
``` python
topics_path = "./query-text.trec"
topics = pt.io.read_topics(topics_path)
```

However, the pt.dataset gives the topics and qrels readily-parsed:



In [5]:
topics = vaswani_dataset.get_topics()
topics.head(5)

Downloading vaswani topics to /root/.pyterrier/corpora/vaswani/query-text.trec


Unnamed: 0,qid,query
0,1,measurement of dielectric constant of liquids ...
1,2,mathematical analysis and design details of wa...
2,3,use of digital computers in the design of band...
3,4,systems of data coding for information transfer
4,5,use of programs in engineering testing of comp...


Create BatchRetrieve object

You can optionally set the controls and the properties by passing a dictionary to the 'controls' and 'properties' arguments
or by calling setControl or setControls methods on a created object, or use the default controls.

Then call the transform method on the created object with the topics as an argument

In [0]:
retr = pt.BatchRetrieve(index, controls = {"wmodel": "TF_IDF"})

retr.setControl("wmodel", "TF_IDF")
retr.setControls({"wmodel": "TF_IDF"})

res=retr.transform(topics)

In [7]:
res

Unnamed: 0,qid,docid,docno,rank,score,query
0,1,8171,8172,0,13.746087,measurement of dielectric constant of liquids ...
1,1,9880,9881,1,12.352666,measurement of dielectric constant of liquids ...
2,1,5501,5502,2,12.178153,measurement of dielectric constant of liquids ...
3,1,1501,1502,3,10.993585,measurement of dielectric constant of liquids ...
4,1,9858,9859,4,10.271452,measurement of dielectric constant of liquids ...
...,...,...,...,...,...,...
91925,93,2226,2227,995,4.904950,high frequency oscillators using transistors t...
91926,93,6898,6899,996,4.899385,high frequency oscillators using transistors t...
91927,93,3473,3474,997,4.898796,high frequency oscillators using transistors t...
91928,93,3187,3188,998,4.893073,high frequency oscillators using transistors t...


You can also query simple strings or list of strings

In [8]:
print(retr.transform("Light"))
print(retr.transform(["Light","Sound"]))

    qid  docid  docno  rank     score  query
0     1  10808  10809     0  5.537595  Light
1     1  11231  11232     1  5.535640  Light
2     1  11066  11067     2  5.497895  Light
3     1   5995   5996     3  5.486707  Light
4     1   4460   4461     4  5.464468  Light
..   ..    ...    ...   ...       ...    ...
120   1   4820   4821   120  1.964441  Light
121   1   9836   9837   121  1.927833  Light
122   1   7213   7214   122  1.910036  Light
123   1   6177   6178   123  1.892565  Light
124   1   7777   7778   124  1.251497  Light

[125 rows x 6 columns]
    qid  docid  docno  rank     score  query
0     1  10808  10809     0  5.537595  Light
1     1  11231  11232     1  5.535640  Light
2     1  11066  11067     2  5.497895  Light
3     1   5995   5996     3  5.486707  Light
4     1   4460   4461     4  5.464468  Light
..   ..    ...    ...   ...       ...    ...
211   2   6374   6375    86  2.505309  Sound
212   2   1695   1696    87  2.505309  Sound
213   2   6546   6547    88  2.

You can save the result to a file by using `saveResult(result, path)`

In [0]:
retr.saveResult(res,"result1.res")

# Evaluation

Similarly, if working with a local test collection, we can use pt.Utils.parse_qrels(qrels_path) to parse a qrels file:
```python
qrels_path=("./qrels")
qrels = pt.io.read_qrels(qrels_path)
```

However, for the Vaswani dataset, the qrels are provided ready-to-do:


In [10]:
qrels = vaswani_dataset.get_qrels()

Downloading vaswani qrels to /root/.pyterrier/corpora/vaswani/qrels


Use `pt.Utils.evaluate(results, qrels)` to evaluate the results    
Args:    
metrics, `default = ["map", ndcg"]`, select the evaluation metrics    
perquery, `default = False`, select whether to show the mean of the metrics or the metrics for each query

In [11]:
eval = pt.Utils.evaluate(res,qrels)
eval

{'map': 0.29090543005529873, 'ndcg': 0.6153667539666847}

We can also ask for per-query results.

In [12]:
eval = pt.Utils.evaluate(res,qrels,metrics=["map"], perquery=True)
eval

{'1': {'map': 0.2688603632606692},
 '10': {'map': 0.1214856066519094},
 '11': {'map': 0.06799761023743447},
 '12': {'map': 0.2093716360982601},
 '13': {'map': 0.26945162856284827},
 '14': {'map': 0.3164929260069987},
 '15': {'map': 0.17479160483981196},
 '16': {'map': 0.07376769675516924},
 '17': {'map': 0.3965636483508813},
 '18': {'map': 0.16354405989238738},
 '19': {'map': 0.44669647488527836},
 '2': {'map': 0.056448212440045914},
 '20': {'map': 0.22061080821325293},
 '21': {'map': 0.5395186359625185},
 '22': {'map': 0.3874015813665481},
 '23': {'map': 0.34623362970302457},
 '24': {'map': 0.19184305434732396},
 '25': {'map': 0.17181819840273246},
 '26': {'map': 0.46224321892311115},
 '27': {'map': 0.3332977158611145},
 '28': {'map': 0.3248793014207182},
 '29': {'map': 0.3678434174356832},
 '3': {'map': 0.23945401361406524},
 '30': {'map': 0.3740405619725896},
 '31': {'map': 0.3659688796052433},
 '32': {'map': 0.5449193708233969},
 '33': {'map': 0.16758096895311753},
 '34': {'map': 0