# Wikifier Docker Runbook

## Steps to build, setup and run the Wikifier Docker

1. Download the git repository

```
git clone https://github.com/usc-isi-i2/wikidata-wikifier
```


2. change directory to `wikidata-wikifier`

```
cd wikidata-wikifier
```


3. Build the docker image

```
docker build -t wikidata-wikifier .
```


4. Setup environment variables in `docker-compose.yml`
      - WIKIFIER_ES_URL # Elasticsearch URL, http://localhost:9200
      - WIKIFIER_ES_INDEX # Elasticsearch Index, wikidatadwd-augmented-01
5. Bring the wikifier container up

```
docker-compose up -d
```

6. Wikifier should be running at `http://localhost:1703`


# Call Wikidata Wikifier Service

In [3]:
import os
import requests
import pandas as pd
from io import StringIO

## Setup parameters

In [1]:
wikifier_service_url = "http://localhost:1703/wikify"
input_file = '/Users/amandeep/Github/wikidata-wikifier/wikifier/sample_files/cricketers.csv'
column_to_wikify = "cricketers"

## Peek at the input file

In [6]:
pd.read_csv(input_file).fillna("")

Unnamed: 0,cricketers,teams,weight,dob
0,Virat Kohli,royal challengers bangalore,152,5/11/88
1,Tendulkar,mumbai indians,137,24/04/1973
2,Dhoni,chennai super kings,154,7/7/81
3,Jasprit Bumrah,mumbai indians,154,6/12/93
4,Ajinkya Rahane,rajasthan royals,134,6/6/88
5,Rohit Sharma,mumbai indians,159,30/04/1987
6,Bhuvneshwar Kumar,deccan chargers,154,5/2/90
7,Ravindra Jadeja,chennai super kings,132,6/12/88
8,Rishabh Pant,delhi capitals,136,4/8/97
9,Shikhar Dhawan,delhi capitals,157,5/12/85


## Call via Python

In [7]:
def call_wikifier(url, k=1):
    file_name = os.path.basename(input_file)
    url += f'?k={k}&columns={column_to_wikify}'

    files = {
        'file': (file_name, open(input_file, mode='rb'), 'application/octet-stream')
    }
    resp = requests.post(url, files=files)

    s = str(resp.content, 'utf-8')

    data = StringIO(s)

    return pd.read_csv(data, header=None)

In [None]:
df = call_wikifier(wikifier_service_url, k=3)

In [None]:
df

In [13]:
df.fillna("").to_csv('/tmp/linked_cricketers.csv', index=False)

## Call using `curl`

In [32]:
url  =  f'{wikifier_service_url}?k=3&columns={column_to_wikify}'

In [35]:
curl -XPOST -F "file=@$input_file"  $url

 curl -XPOST -F file=@/Users/amandeep/Github/wikidata-wikifier/wikifier/sample_files/cricketers.csv  https://dsbox02.isi.edu:8888/wikifier/wikify?k=3&columns=cricketers 
