# Snorkel Workshop: Extracting Spouse Relations from the News
## Part 2: Writing  Labeling Functions

### <span style="color:red">IMPORTANT</span>
Please enter below the ID provided to you on check-in:

In [1]:
USER_ID = ""

In [2]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import os
import re
import sys
import numpy as np

# Connect to the database backend and initalize a Snorkel session
from lib.init import *

from snorkel.lf_helpers import (
    get_left_tokens, get_right_tokens, get_between_tokens,
    get_text_between, get_tagged_text,
)

This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.



Snorkel requires that we formally define a type for our candidate.

In [3]:
try:
    Spouse = candidate_subclass('Spouse', ['person1', 'person2'])
except:
    print>>sys.stderr,"Info: Candidate type already defined"

In [4]:
from snorkel.annotations import load_gold_labels
L_gold_dev = load_gold_labels(session, annotator_name='gold', split=1)

## 3. Load our training data candidates

# I. Background

## 1. Preprocessing

In a real application, there is a lot of data preparation, parsing, and database loading that needs to be completed before we dive into writing labeling functions. Here we have pre-generated a database instance for you already. All _candidates_ and _gold labels_ (i.e., human-generated labels) are queried from this database for use in the the tutorial. 

See our preprocessing tutorial <a href="Workshop_5_Advanced_Preprocessing.ipynb">Workshop_5_Advanced_Preprocessing</a> for more details on how this database is built.

## 2. Using a _development set_ of human-labeled data

In our setting here, we will use the phrase _development set_ to refer to a set of examples (here, a subset of our training set) which we label by hand and use to help us develop and refine labeling functions.  Unlike the _test set_, which we do not look at and use for final evaluation, we can inspect the development set while writing labeling functions.

In [5]:
candidates = session.query(Candidate).filter(Candidate.split == 0).all()
print len(candidates)

22254


# II. Development Sandbox

## 1. Writing Labeling Functions
----

In Snorkel, our primary interface through which we provide training signal to the end extraction model we are training is by writing **labeling functions (LFs)** (as opposed to hand-labeling massive training sets).  We'll go through some examples for our spouse extraction task below.

A labeling function isn't anything special. It's just a Python function that accepts a `Candidate` as the input argument and returns `1` if it says the `Candidate` should be marked as true, `-1` if it says the `Candidate` should be marked as false, and `0` if it doesn't know how to vote and abstains. In practice, many labeling functions are unipolar: it labels only `1`s and `0`s, or it labels only `-1`s and `0`s.

Recall that our goal is to ultimately train a high-performance classification model that predicts which of our `Candidate`s are true mentions of spouse relations.  It turns out that we can do this by writing potentially low-quality labeling functions!

Please write your labeling function implementations below.  
Refer to our [Snorkel Cheatsheet](Workshop_1_Snorkel_API_Cheat_Sheet.ipynb) for API information.

### a) Pattern Matching Labeling Functions

Write your labeling functions below:

In [6]:
other = {'boyfriend', 'girlfriend'}

def LF_wife_in_sentence(c):
    """
    Test if the word "wife" occurs anywhere in the sentence.
    """
    return 1 if "wife" in get_between_tokens(c) else 0

def LF_other_relationship(c):
    """
    Test if the terms 'boyfriend' or 'girlfriend' occurs in the text between our 
    Span objects.
    """
    return -1 if len(other.intersection(get_between_tokens(c))) > 0 else 0

### b) Data Exploration

How do we come up with good patterns to encode as labeling functions? One way is to actually explore our training data. We can then easily by loading our training candidates the `SentenceNgramViewer` object to examine 

In [7]:
from snorkel.viewer import SentenceNgramViewer

SentenceNgramViewer(candidates[0:1000], session, n_per_page=1)

<IPython.core.display.Javascript object>

### c) Computing Individual LF Statistics
One simple metric we can compute quickly is our _coverage_, the number of candidates labeled by our LF, on our development set (or any other set), without saving it to the database.  This is simple to do. For example, we can easily get every candidate that this LF labels as true:

In [8]:
def coverage_LF(lf, split, gold=None):
    labeled = []
    cands = session.query(Spouse).filter(Spouse.split == split).order_by(Candidate.id).all()
    for i,c in enumerate(cands):
        if lf(c) != 0:
            if gold != None and gold.size != 0:
                labeled.append((c, gold[i,0]))
            else:
                labeled.append(c)
    print "{} labeled candidates: {}".format(lf.__name__, len(labeled))
    return labeled

In [9]:
labeled = coverage_LF(LF_wife_in_sentence, 1)

LF_wife_in_sentence labeled candidates: 148


We can also view the set of candidates covered by a single labeling function using the `SentenceNgramViewer` just by passing in a list of candidates. 

In [11]:
SentenceNgramViewer(labeled, session, n_per_page=1)

<IPython.core.display.Javascript object>

### d) Formal Metrics
We can compute formal precision, recall, and F1 metrics for the output of a single labeling function.
NOTE: These scores are un-adjusted, meaning they don't accurately measure recall.

In [12]:
from snorkel.lf_helpers import test_LF
tp, fp, tn, fn = test_LF(session, LF_wife_in_sentence, split=1, annotator_name='gold')

Scores (Un-adjusted)
Pos. class accuracy: 1.0
Neg. class accuracy: 0.0
Precision            0.376
Recall               1.0
F1                   0.546
----------------------------------------
TP: 53 | FP: 88 | TN: 0 | FN: 0



### e) Distant Supervision Labeling Functions

In addition to writing labeling functions that describe text pattern-based heuristics for labeling training examples, we can also write labeling functions that distantly supervise examples. Here, we'll load in a list of known spouse pairs and check to see if the candidate pair matches one of these.

**DBpedia**
http://wiki.dbpedia.org/
Out database of known spouses comes from DBpedia, which is a community-driven resource similar to Wikipedia but for curating structured data. We use a preprocesses snapshot as our knowledge base for all labeling function development.

In [13]:
from lib.dbpedia import known_spouses 

We can look at some of the example entries from DBPedia and use them in a simple distant supervision labeling function.

In [14]:
list(known_spouses)[0:10]

[('Eleanor Powell', 'Glenn Ford'),
 ('Andronikos Doukas', 'Maria of Bulgaria'),
 ('Marjorie Rambeau', 'Willard Mack'),
 ('Margo St. James', 'Paul Avery'),
 ('Joan of England', 'William II the Good'),
 ('Maiko Jeong Shun Lee', 'The Viscount Rothermere'),
 ('Heinrich von Coudenhove-Kalergi', 'Mitsuko Aoyama'),
 ('Kiran Nadar', 'Shiv Nadar ( )'),
 ('Cecilia Mnsdotter Eka', 'Erik Johansson Vasa'),
 ('Bonne of Bohemia', 'John the Good')]

In [15]:
def LF_distant_supervision(c):
    """
    Check if the *full names* of each person from the Spouse candidate  
    are also found in the DBpedia database of known spouses.
    """
    p1, p2 = c.person1.get_span(), c.person2.get_span()
    return 1 if (p1, p2) in known_spouses or (p2, p1) in known_spouses else 0

Evaluating this LF on our dev data, we see it has very low coverage (only 3 candidates), 2 of which true.

In [16]:
ds_labeled = coverage_LF(LF_distant_supervision, 1)
tp, fp, tn, fn = test_LF(session, LF_distant_supervision, split=1, annotator_name='gold')

LF_distant_supervision labeled candidates: 3
Scores (Un-adjusted)
Pos. class accuracy: 1.0
Neg. class accuracy: 0.0
Precision            0.667
Recall               1.0
F1                   0.8
----------------------------------------
TP: 2 | FP: 1 | TN: 0 | FN: 0



## 2. Applying the Labeling Functions
---

Next, we need to actually run the LFs over **all** of our training candidates, producing a set of `Labels` and `LabelKeys` (just the names of the LFs) in the database.  We'll do this using the `LabelAnnotator` class, a UDF which we will again run with `UDFRunner`.

### a) Computing the Label Matrix

First we put all our labeling functions into list:

In [17]:
LFs = [
    LF_wife_in_sentence,
    LF_other_relationship,
    LF_distant_supervision
]

Then we setup the label annotator class:

In [19]:
from snorkel.annotations import LabelAnnotator
labeler = LabelAnnotator()

In [None]:
np.random.seed(1701)
%time L_train = labeler.apply(split=0, lfs=LFs, parallelism=4)
L_train.shape

%time L_dev = labeler.apply(split=1, lfs=LFs, parallelism=4)
L_dev.shape

Clearing existing...
Running UDF...


In [None]:
L_train = labeler.load_matrix(session, split=0)
L_train.shape

### b) Label Matrix Summary Statistics

We can also view statistics about the resulting label matrix.

* **Coverage** is the fraction of candidates that the labeling function emits a non-zero label for.
* **Overlap** is the fraction candidates that the labeling function emits a non-zero label for and that another labeling function emits a non-zero label for.
* **Conflict** is the fraction candidates that the labeling function emits a non-zero label for and that another labeling function emits a *conflicting* non-zero label for.

In [None]:
L_train.lf_stats(session)

### c) Label Matrix Empirical Accuracies

If we have a small set of human-labeled data

In [None]:
L_dev.lf_stats(session, labels=L_gold_dev.toarray().ravel())

## 3. Iterating on Labeling Function Design

When writing labeling functions, you will want to iterate on the process outlined above several times. You should focus on tuning individual LFs, based on emprical accuracy metrics, and adding new LFs to improve coverage. 