# <a id='top'></a> REST API of the TREC Metadatabase

This notebook documents and exemplifies the use of the REST API and the metadatabase, including the following:  
1. [Setup](#setup)   
2. [Overview of the API endpoints](#overview)
3. [Use case example I: Fetching metadata with the API](#fetching_metadata)
4. [Use case example II: Downloading resources](#download) (i.e., input run files) with URLs from the metadata  
5. [Use case example III: Plotting the retrieval effectiveness](#plots) of run submissions (by the example of the TREC Deep Learning track)  

## 1. <a id='setup'>Setup</a>

**Start the docker container with the REST-API server and make sure it is available at `0.0.0.0:5000`.**

In [None]:
!cd src/ && docker compose up -d

**Install the required packages to run the notebook properly.**

In [None]:
%pip install requests==2.31.0 pandas==2.1.1 matplotlib-inline==0.1.6 seaborn==0.13.0 tqdm==4.66.1

## 2. <a id='overview'>Overview of the API endpoints</a>

[Go back to top](#top)

| API invocation | Example | Description | 
| --- | --- | --- | 
| `trec/api/v1/trecs` | [Go to cell](#api_trecs) | Get all identifiers of the TREC conferences in the database, e.g., `trec8` represents the eigth TREC conference in the year 1999. | 
| `trec/api/v1/<string:trec>/tracks` | [Go to cell](#api_trec_tracks) | Get all tracks of a TREC conference. This call will return track identifiers as keys and their full names as values, e.g., `deep` as the key of the ''Deep Learning'' track.  | 
| `trec/api/v1/<string:trec>/<string:pid>/tracks` | [Go to cell](#api_trec_pid_tracks) |  Get all tracks in which the participant took part for a specified conference. Some participants submit results to multiple tracks. | 
| `trec/api/v1/<string:trec>/participants` | [Go to cell](#api_trec_participants) | Get all participants of a TREC iteration. The call will return all identifiers of participants that took part in the specified TREC iteration. | 
| `trec/api/v1/<string:trec>/<string:track>/participants` | [Go to cell](#api_trec_track_participants) | Get all participants of a track. The call will return all identifiers of participants that took part in the specified track. | 
| `trec/api/v1/<string:trec>/publications` | [Go to cell](#api_trec_publications) | Get the metadata of all publications of a specified TREC iteration. The output is structured by the single tracks. | 
| `trec/api/v1/<string:trec>/<string:track>/publications` | [Go to cell](#api_trec_track_publications) | Get the metadata of all publications of a specified track. The metadata will point to all run submissions that can be associated with the publication. | 
| `trec/api/v1/<string:trec>/<string:track>/results` | [Go to cell](#api_trec_track_results) | Get the evaluation results of the run submissions of of a specified track. The output is structured by the single participants. | 
| `trec/api/v1/<string:trec>/<string:track>/<string:pid>/results` | [Go to cell](#api_trec_track_pid_results) | Get the evaluation results of the run submissions to a track of of a specified participant. | 
| `trec/api/v1/<string:trec>/data` | [Go to cell](#api_trec_data) | Get the metadata of all data resources of a specified TREC iteration. The output is structured by the single tracks. | 
| `trec/api/v1/<string:trec>/<string:track>/data` | [Go to cell](#api_trec_track_data) | Get the metadata of the data resources of a specified track. The output contains the metadata of the corpus, topics, relevance judgments, and others. | 
| `trec/api/v1/<string:trec>/<string:track>/runs` | [Go to cell](#api_trec_track_runs) | Get the metadata of all runs submitted to a track for a specified conference. The output is structured by the single run identifiers. | 
| `trec/api/v1/<string:trec>/<string:track>/<string:pid>/runs` | [Go to cell](#api_trec_track_pid_runs) | Get the metadata of all runs of a specified participant at a specified track. The output is similar is similar to the one above but limited to the participant. | 
| `trec/api/v1/<string:trec>/<string:track>/<string:pid>/<string:runid>` | [Go to cell](#api_trec_track_pid_runid) | Get the metadata of the run with the specified identifier. The call will return the entire metadata available for the run. | 
| `trec/api/v1/<string:pid>/runs` | [Go to cell](#api_pid_runs) | Get the metadata of all runs from a specified participant. The output contains metadata of runs from possibly different tracks to which the participant contributed runs. | 
| `trec/api/v1/runs/<string:runid>` | [Go to cell](#api_runs_runid) | Get the metadata of all the runs that have the specified runid. The output includes metadata of runs by different participants submitted to different tracks. | 

## 3. <a id='fetching_metadata'>Use case example I: Fetching metadata with the API</a>

[Go back to top](#top)

<a id='api_trecs'></a> `trec/api/v1/trecs`

_get all trec conferences in the database_

[Go back to overview](#overview)

In [None]:
import json 
from requests import get

url = 'http://localhost:5000/trec/api/v1/trecs'

result = get(url=url).text
json.loads(result)

<a id='api_trec_tracks'></a> `trec/api/v1/<string:trec>/tracks`  

_get all tracks of a TREC conference_

[Go back to overview](#overview)

In [None]:
import json
from requests import get

url = 'http://localhost:5000/trec/api/v1/trec31/tracks'

result = get(url=url).text
json.loads(result)

<a id='api_trec_pid_tracks'></a> `trec/api/v1/<string:trec>/<string:pid>/tracks`

_get all tracks in that a participant took part at a TREC conference_

[Go back to overview](#overview)

In [None]:
import json
from requests import get

url = 'http://localhost:5000/trec/api/v1/trec31/Webis/tracks'

result = get(url=url).text
json.loads(result)

<a id='api_trec_participants'></a> `trec/api/v1/<string:trec>/participants`

_get all participants of a trec conference_

[Go back to overview](#overview)

In [None]:
import json 
from requests import get

url = 'http://localhost:5000/trec/api/v1/{}/{}'.format(
    'trec31',
    'participants'
    )

result = get(url=url).text
json.loads(result)

<a id='api_trec_track_participants'></a> `trec/api/v1/<string:trec>/<string:track>/participants`

_get all participants of a track_

[Go back to overview](#overview)

In [None]:
import json 
from requests import get

url = 'http://localhost:5000/trec/api/v1/{}/{}/{}'.format(
    'trec31',
    'neuclir',
    'participants'
    )

result = get(url=url).text
json.loads(result)

<a id='api_trec_publications'></a> `trec/api/v1/<string:trec>/publications`  

_get all publications of a specified TREC iteration_

[Go back to overview](#overview)

In [None]:
import json
from requests import get

url = 'http://localhost:5000/trec/api/v1/trec31/publications'

result = get(url=url).text
json.loads(result)

<a id='api_trec_track_publications'></a> `trec/api/v1/<string:trec>/<string:track>/publications`  

_get all publications of a specified track_

[Go back to overview](#overview)

In [None]:
import json
from requests import get

url = 'http://localhost:5000/trec/api/v1/trec31/neuclir/publications'

result = get(url=url).text

json.loads(result)

<a id='api_trec_track_results'></a> `trec/api/v1/<string:trec>/<string:track>/results`  

_get all evaluation results of runs submitted to a track_

[Go back to overview](#overview)

In [None]:
import json
from requests import get

url = 'http://localhost:5000/trec/api/v1/trec31/deep/results'

result = get(url=url).text
json.loads(result)

<a id='api_trec_track_pid_results'></a> `trec/api/v1/<string:trec>/<string:track>/<string:pid>/results`  

_get all evaluation results of runs submitted by a participant to a track_

[Go back to overview](#overview)

In [None]:
import json
from requests import get

url = 'http://localhost:5000/trec/api/v1/trec31/deep/UGA/results'

result = get(url=url).text
json.loads(result)

<a id='api_trec_data'></a> `trec/api/v1/<string:trec>/data`  

_get all data resources of a TREC conference_

[Go back to overview](#overview)

In [None]:
import json
from requests import get

url = 'http://localhost:5000/trec/api/v1/trec31/data'

result = get(url=url).text
json.loads(result)

<a id='api_trec_track_data'></a> `trec/api/v1/<string:trec>/<string:track>/data`  

_get all data resources of a track_

[Go back to overview](#overview)

In [None]:
import json
from requests import get

url = 'http://localhost:5000/trec/api/v1/trec31/deep/data'

result = get(url=url).text
json.loads(result)

<a id='api_trec_track_runs'></a> `trec/api/v1/<string:trec>/<string:track>/runs`

_get all runs of a specified track_

[Go back to overview](#overview)

In [None]:
import json 
from requests import get

url = 'http://localhost:5000/trec/api/v1/{}/{}/{}'.format(
    'trec31',
    'neuclir',
    'runs'
    )

result = get(url=url).text
json.loads(result)

<a id='api_trec_track_pid_runs'></a> `trec/api/v1/<string:trec>/<string:track>/<string:pid>/runs`

_get all runs from a specified participant at a specified track_

[Go back to overview](#overview)

In [None]:
import json 
from requests import get

url = 'http://localhost:5000/trec/api/v1/{}/{}/{}/{}'.format(
    'trec31', 
    'neuclir', 
    'CFDA_CLIP',
    'runs'
    )

result = get(url=url).text

json.loads(result)

<a id='api_trec_track_pid_runid'></a> `trec/api/v1/<string:trec>/<string:track>/<string:pid>/<string:runid>`

_get the run with specified runid_

[Go back to overview](#overview)

In [None]:
import json 
from requests import get

url = 'http://localhost:5000/trec/api/v1/{}/{}/{}/{}'.format(
    'trec31', 
    'neuclir', 
    'CFDA_CLIP', 
    'CFDA_CLIP_dq'
    )

result = get(url=url).text
json.loads(result)

_another example_

In [None]:
import json 
from requests import get

url = 'http://localhost:5000/trec/api/v1/{}/{}/{}/{}'.format(
    'trec31', 
    'deep', 
    'Webis', 
    'webis-dl-duot5'
    )

result = get(url=url).text

from pprint import pprint 
print(json.dumps(json.loads(result), indent=4))


<a id='api_pid_runs'></a> `trec/api/v1/<string:pid>/runs/`

_get all runs from a specified participant_

[Go back to overview](#overview)

In [None]:
import json 
from requests import get

url = 'http://localhost:5000/trec/api/v1/{}/{}'.format(
    'DOSSIER', 
    'runs'
    )

result = get(url=url).text
json.loads(result)

<a id='api_runs_runid'></a> `trec/api/v1/runs/<string:runid>`

_get all the runs with the specified runid (includes multiple tracks)_

[Go back to overview](#overview)

In [None]:
import json 
from requests import get

url = 'http://localhost:5000/trec/api/v1/{}/{}'.format(
    'runs', 
    'baseline'
    )

result = get(url=url).text

json.loads(result)

## 4. <a id='download'>Use case example II: Downloading resources</a>

[Go back to top](#top)

In [None]:
import json 
import time 
from pathlib import Path
from requests import get
from tqdm import tqdm 

# credentials can be obtained from the TREC program manager
u = '<//insert username//>'
p = '<//insert password//>'

url = 'http://localhost:5000/trec/api/v1/trec31/deep/runs'
result = get(url=url).text
runs = json.loads(result)
out_dir = 'trec31/deep/'
Path(out_dir).mkdir(parents=True, exist_ok=True)

for run in tqdm(runs):
    input_url = run.get('input_url')
    file_name = input_url.split('/')[-1]
    time.sleep(1)
    r = get(url, auth=(u, p))
    file_path = ''.join([out_dir, file_name])
    open(file_path, 'wb').write(r.content)

## 5. <a id='plots'>Use case example III: Example plots</a>

[Go back to top](#top)

**TREC-31 Deep Learning - Passage Ranking**

In [None]:
import json 
from requests import get
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()

url = 'http://localhost:5000/trec/api/v1/trec31/deep/runs'

result = get(url=url).text
runs = json.loads(result)

df_data = []
for r in runs:
    if r.get('type') == 'auto' and r.get('task') == 'passages':
        df_data.append(
            {
                'runid': r.get('runid'),
                'nDCG@10': float(r.get('results').get('ndcg').get('all').get('ndcg_cut_10')),
            }
        )

df = pd.DataFrame(df_data)
df = df.sort_values('nDCG@10', ascending=False)
 
fig, axes = plt.subplots(figsize=(18,3))
plt.xticks(rotation='vertical')
plt.title('TREC Deep Learning 2022 - Passage Ranking')
_ = axes.stem(df['runid'], df['nDCG@10'], basefmt=' ', markerfmt = 'd', label='nDCG@10') 
_ = plt.legend()

**TREC-30 Deep Learning - Passage Ranking**

In [None]:
import json 
from requests import get
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()

url = 'http://localhost:5000/trec/api/v1/trec30/deep/runs'

result = get(url=url).text
runs = json.loads(result)

df_data = []
for r in runs:
    if r.get('type') == 'auto' and r.get('task') == 'passages':
        df_data.append(
            {
                'runid': r.get('runid'),
                'nDCG@10': float(r.get('results').get('passages-eval').get('all').get('ndcg_cut_10')),
            }
        )

df = pd.DataFrame(df_data)
df = df.sort_values('nDCG@10', ascending=False)
 
fig, axes = plt.subplots(figsize=(12,3))
plt.xticks(rotation='vertical')
plt.title('TREC Deep Learning 2021 - Passage Ranking')
_ = axes.stem(df['runid'], df['nDCG@10'], basefmt=' ', markerfmt = 'd', label='nDCG@10') 
_ = plt.legend()

**TREC-29 Deep Learning - Passage Ranking**

In [None]:
import json 
from requests import get
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()

url = 'http://localhost:5000/trec/api/v1/trec29/deep/runs'

result = get(url=url).text
runs = json.loads(result)

df_data = []
for r in runs:
    if r.get('type') == 'auto' and r.get('task') == 'passages':
        df_data.append(
            {
                'runid': r.get('runid'),
                'nDCG@10': float(r.get('results').get('passages-eval').get('all').get('ndcg_cut_10')),
            }
        )

df = pd.DataFrame(df_data)
df = df.sort_values('nDCG@10', ascending=False)
 
fig, axes = plt.subplots(figsize=(12,3))
plt.xticks(rotation='vertical')
plt.title('TREC Deep Learning 2020 - Passage Ranking')
_ = axes.stem(df['runid'], df['nDCG@10'], basefmt=' ', markerfmt = 'd', label='nDCG@10') 
_ = plt.legend()

**TREC-28 Deep Learning - Passage Ranking**

In [None]:
import json 
from requests import get
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()

url = 'http://localhost:5000/trec/api/v1/trec28/deep/runs'

result = get(url=url).text
runs = json.loads(result)

df_data = []
for r in runs:
    if r.get('type') == 'auto' and r.get('task') == 'passages':
        df_data.append(
            {
                'runid': r.get('runid'),
                'nDCG@10': float(r.get('results').get('passages-eval').get('all').get('ndcg_cut_10')),
            }
        )

df = pd.DataFrame(df_data)
df = df.sort_values('nDCG@10', ascending=False)
 
fig, axes = plt.subplots(figsize=(12,3))
plt.xticks(rotation='vertical')
plt.title('TREC Deep Learning 2019 - Passage Ranking')
_ = axes.stem(df['runid'], df['nDCG@10'], basefmt=' ', markerfmt = 'd', label='nDCG@10') 
_ = plt.legend()