# Introduction to PyTerrier

_DSAIT4050: Information retrieval lecture, TU Delft_

**Part 1: Setup**

[Terrier](http://terrier.org) is an open-source information retrieval platform aimed at reserach and experimentation. We'll use [PyTerrier](https://pyterrier.readthedocs.io/), which provides a Python API for Terrier, in this lecture. This series of notebooks gives a brief introduction to PyTerrier.

## Installation

PyTerrier can be installed using `pip`:


In [None]:
pip install python-terrier==0.12.1

You may want to consider using virtual environments, such as [`venv`](https://docs.python.org/3/library/venv.html) or [`conda`](https://www.anaconda.com/download). You'll also need an up-to-date version of the [Java development kit](https://www.oracle.com/java/technologies/downloads/) installed and the `JAVA_HOME` environment variable set. More detailed installation instructions and troubleshooting can be found [here](https://pyterrier.readthedocs.io/en/latest/installation.html).

Now you should be able to import `pyterrier`:


In [1]:
import pyterrier as pt

## A test run

Time to test our setup! PyTerrier provides support for loading and indexing a large number of IR datasets (more on that later). Let's load the [ANTIQUE](https://arxiv.org/abs/1905.08957) dataset:


In [2]:
dataset = pt.get_dataset("irds:antique")

Now we can print one of the documents in the corpus:


In [3]:
from pprint import pprint

for doc in dataset.get_corpus_iter():
    pprint(doc)
    break

[INFO] Please confirm you agree to the authors' data usage agreement found at <https://ciir.cs.umass.edu/downloads/Antique/readme.txt>
[INFO] If you have a local copy of https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt, you can symlink it here to avoid downloading it again: C:\IR\ir_datasets\downloads\684f7015aff377062a758e478476aac8
[INFO] [starting] https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt
                                                             
[A                                                                                                          [INFO] [finished] https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: [02:25] [93.6MB] [642kB/s]
antique documents:   0%|          | 0/403666 [02:26<?, ?it/s]

{'docno': '2020338_0',
 'text': 'A small group of politicians believed strongly that the fact that '
         'Saddam Hussien remained in power after the first Gulf War was a '
         'signal of weakness to the rest of the world, one that invited '
         'attacks and terrorism. Shortly after taking power with George Bush '
         'in 2000 and after the attack on 9/11, they were able to use the '
         'terrorist attacks to justify war with Iraq on this basis and '
         'exaggerated threats of the development of weapons of mass '
         'destruction. The military strength of the U.S. and the brutality of '
         "Saddam's regime led them to imagine that the military and political "
         'victory would be relatively easy.'}





If you see a document above now: Congratulations! The setup was successful. If not: Take a look at the [troubleshooting section](https://pyterrier.readthedocs.io/en/latest/installation.html#installation-troubleshooting) in the official documentation.
