# PyTerrier Notebook for Full-Rank Submissions

In [13]:
import os

# Detect if we are in the TIRA sandbox
# Install the required dependencies if we are not in the sandbox.
if 'TIRA_DATASET_ID' not in os.environ:
    !pip3 install python-terrier tira ir_datasets
    !apt-get install ping
else:
    print('We are in the TIRA sandbox.')
    

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Package ping is a virtual package provided by:
  inetutils-ping 2:1.9.4-3ubuntu0.1
  iputils-ping 3:20161105-1ubuntu3
You should explicitly select one to install.

E: Package 'ping' has no installation candidate


In [2]:
from tira.third_party_integrations import ir_datasets, ensure_pyterrier_is_loaded, persist_and_normalize_run

# this loads and starts pyterrier so that it also works in the TIRA
ensure_pyterrier_is_loaded()

# PyTerrier must be imported after the call to ensure_pyterrier_is_loaded in TIRA.
import pyterrier as pt

# For more detiled outputs
import logging
logging.basicConfig(level=logging.DEBUG)

Start PyTerrier with version=5.7, helper_version=0.0.7, no_download=True


PyTerrier 0.9.2 has loaded Terrier 5.7 (built by craigm on 2022-11-10 18:30) and terrier-helper 0.0.7

No etc/terrier.properties, using terrier.default.properties for bootstrap configuration.


In [3]:
data = ir_datasets.load('cranfield')

In [4]:
print([i for i in data.queries_iter()][:3])

[INFO] [starting] http://ir.dcs.gla.ac.uk/resources/test_collections/cran/cran.tar.gz
[INFO] [finished] http://ir.dcs.gla.ac.uk/resources/test_collections/cran/cran.tar.gz: [00:00] [507kB] [2.07MB/s]
                                                                                               

[GenericQuery(query_id='1', text='what similarity laws must be obeyed when constructing aeroelastic models\nof heated high speed aircraft .'), GenericQuery(query_id='2', text='what are the structural and aeroelastic problems associated with flight\nof high speed aircraft .'), GenericQuery(query_id='4', text='what problems of heat conduction in composite slabs have been solved so\nfar .')]




In [5]:
data.docs_iter().__next__()

CranfieldDoc(doc_id='1', title='experimental investigation of the aerodynamics of a\nwing in a slipstream .', text='experimental investigation of the aerodynamics of a\nwing in a slipstream .\n  an experimental study of a wing in a propeller slipstream was\nmade in order to determine the spanwise distribution of the lift\nincrease due to slipstream at different angles of attack of the wing\nand at different free stream to slipstream velocity ratios .  the\nresults were intended in part as an evaluation basis for different\ntheoretical treatments of this problem .\n  the comparative span loading curves, together with\nsupporting evidence, showed that a substantial part of the lift increment\nproduced by the slipstream was due to a /destalling/ or\nboundary-layer-control effect .  the integrated remaining lift\nincrement, after subtracting this destalling lift, was found to agree\nwell with a potential flow theory .\n  an empirical evaluation of the destalling effects was made for\nthe s

In [6]:
iter_indexer = pt.IterDictIndexer("./index", meta={'docno': 100})
!rm -Rf index
indexref = iter_indexer.index({'docno': i.doc_id, 'text': i.text} for i in data.docs_iter())

12:23:09.347 [ForkJoinPool-1-worker-3] WARN org.terrier.structures.indexing.Indexer - Indexed 2 empty documents


In [7]:
pt_data = pt.get_dataset('irds:cranfield')

bm25 = pt.BatchRetrieve(indexref, wmodel="BM25")

In [8]:
run = bm25(pt_data.get_topics())

BR(BM25):   0%|          | 0/225 [00:00<?, ?q/s]

BR(BM25): 100%|██████████| 225/225 [00:05<00:00, 40.05q/s]


In [9]:
persist_and_normalize_run(run, 'bm25-by-team-xyz')

I use the environment variable "TIRA_OUTPUT_DIRECTORY" to determine where I should store the run file using "." as default.
Done. run file is stored under "./run.txt".
