<img align="left" src="imgs/logo.jpg" width="50px" style="margin-right:10px">
# Snorkel Workshop: Extracting Spouse Relations <br> from the News
## Part 2: Writing  Labeling Functions

In Snorkel, our primary interface through which we provide training signal to the end extraction model we are training is by writing **labeling functions (LFs)** (as opposed to hand-labeling massive training sets).  We'll go through some examples for our spouse extraction task below.

A labeling function isn't anything special. It's just a Python function that accepts a `Candidate` as the input argument and returns `1` if it says the `Candidate` should be marked as true, `-1` if it says the `Candidate` should be marked as false, and `0` if it doesn't know how to vote and abstains. In practice, many labeling functions are unipolar: it labels only `1`s and `0`s, or it labels only `-1`s and `0`s.

Recall that our goal is to ultimately train a high-performance classification model that predicts which of our `Candidate`s are true mentions of spouse relations.  It turns out that we can do this by writing potentially low-quality labeling functions!

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import re
import sys
import numpy as np

# Connect to the database backend and initalize a Snorkel session
from lib.init import *
from lib.scoring import *
from lib.lf_generators import *

from snorkel.lf_helpers import test_LF
from snorkel.annotations import load_gold_labels
from snorkel.lf_helpers import (
    get_left_tokens, get_right_tokens, get_between_tokens,
    get_text_between, get_tagged_text,
)

# initialize our candidate type
Spouse = candidate_subclass('Spouse', ['person1', 'person2'])

# I. Background

## A. Preprocessing the Database

In a real application, there is a lot of data preparation, parsing, and database loading that needs to be completed before we dive into writing labeling functions. Here we have pre-generated a database instance for you already. All _candidates_ and _gold labels_ (i.e., human-generated labels) are queried from this database for use in the the tutorial. 

See our preprocessing tutorial <a href="Workshop_5_Advanced_Preprocessing.ipynb">Workshop 5 Advanced Preprocessing</a> for more details on how this database is built.

## B. Using a _Development Set_ of Human-labeled Data

In our setting here, we will use the phrase _development set_ to refer to a set of examples (here, a subset of our training set) which we label by hand and use to help us develop and refine labeling functions.  Unlike the _test set_, which we do not look at and use for final evaluation, we can inspect the development set while writing labeling functions. This is a list of `{-1,1}` labels.

In [2]:
L_gold_dev = load_gold_labels(session, annotator_name='gold', split=1)

## C. Data Exploration

How do we come up with good keywords and patterns to encode as labeling functions? One way is to manually explore our training data. Here we load a subset of our training candidates into a `SentenceNgramViewer` object to examine candidates in their parent context. Our goal is to build an intuition for patterns and keywords that are predictive of a candidate's true label. 

In [3]:
from snorkel.viewer import SentenceNgramViewer

# load our list of training candidates
train_cands = session.query(Candidate).filter(Candidate.split == 0).all()
dev_cands = session.query(Candidate).filter(Candidate.split == 0).all()

SentenceNgramViewer(train_cands[0:500], session, n_per_page=1)

<IPython.core.display.Javascript object>

The installed widget Javascript is the wrong version. It must satisfy the semver range ~2.1.4.


## D. Labeling Function Metrics

### 1. Coverage
One simple metric we can compute quickly is our _coverage_, the number of candidates labeled by our LF, on our training set (or any other set).

### 2. Precision / Recall / F1
If we have gold labeled data, we can also compute standard precision, recall, and F1 metrics for the output of a single labeling function.

\begin{equation*}
precision = \frac{tp}{(tp + fp)}
\end{equation*}

\begin{equation*}
recall = \frac{tp}{(tp + fn)}
\end{equation*}

\begin{equation*}
F1 = 2 \cdot \frac{ (precision \cdot recall)}{(precision + recall)}
\end{equation*}

# II. Labeling Functions

## A. Pattern Matching Labeling Functions

One powerful form of labeling function design is defining sets of keywords or regular expressions that, as a human labeler, you know are correlated with the true label. In the terminology of [Bayesian inference](https://en.wikipedia.org/wiki/Statistical_inference#Bayesian_inference), this can be thought of as defining a [_prior_](https://en.wikipedia.org/wiki/Prior_probability) over your word features. 

For example, we could define a dictionary of terms that occur between person names in a candidate. One simple dictionary of terms indicating a true relation could be:
    
    marriage = {'husband', 'wife'}
 
We can then write a labeling function that checks for a match with these terms in the text that occurs between person names.

    def LF_marriage_terms_between(c):
        return -1 if len(marriage.intersection(get_between_tokens(c))) > 0 else 0
        
The idea is that we can easily create dictionaries that encode themes or categories that describe all kinds of relationships between 2 people and use these the objects to _weakly supervise_ our classification task.

    other_relationship = {'boyfriend', 'girlfriend'}
    

### 1. Labeling Function Generators (LFGs)
The above is a reasonable way to write labeling functions. However, since this type of design pattern is so common that we rely on another abstraction to help us build LFs more quickly: _labeling function generators_ (LFGs). LFGs accept simple inputs, like dictionaries of terms or regular expressions and autogenerate labeling functions. 

The `MatchTerms` and `MatchRegex` LFGs require a few parameter definitions to setup:
    
    name:    a string that describes the category of terms/regular expressons
    rvalue:  patterns correlate with a True or False label (1 or -1) 
    search:  search a specific part of the sentence ('left'|'right'|'between'|'sentence')
    window:  the length of tokens to match against for ('left'|'right') search spaces 


### 2. Term Matching LFGs
We illustrate below how you can use the `MatchTerms` LFG to create and test an LF on training candidates. When examining candidates in the `SentenceNgramViewer`, notice that husband or wife always occurs between person names. That is the supervision signal encoded by this LF!

In [4]:
marriage  = {'husband', 'wife'}

# we'll initialize our LFG and test its coverage on training candidates
lf1 = MatchTerms(name='marriage', terms=marriage, rvalue=1, search='between').lf()

# what candidates are covered by this LF?
labeled = coverage(session, lf1, split=0)

# now let's view what this LF labeled
SentenceNgramViewer(labeled, session, n_per_page=1)

Coverage: 7.75% (1724/22254)


<IPython.core.display.Javascript object>

The installed widget Javascript is the wrong version. It must satisfy the semver range ~2.1.4.


In [5]:
other_relationship  = {'boyfriend', 'girlfriend'}

lf2 = MatchTerms(name='other_relationship', terms=other_relationship, rvalue=-1, search='left', window=1).lf()
labeled = coverage(session, lf2, split=1)

# now let's view what this LF labeled
SentenceNgramViewer(labeled, session, n_per_page=1)

Coverage: 0.39% (11/2811)


<IPython.core.display.Javascript object>

The installed widget Javascript is the wrong version. It must satisfy the semver range ~2.1.4.


### 3. Regular Expression LFGs

Sometimes we want to express more generic textual patterns to match against in candidates. Perhaps we want to match a specific phrase like 'power couple' or look for modifier prefixes like 'ex' wife, husband, etc. 

We can generate this supervision in the same way as above using sets of [regular expressions](https://en.wikipedia.org/wiki/Regular_expression) -- a formal language for string matching.

In [6]:
exes_rgxs = {' ex[- ][husband|wife]'}

lf3 = MatchRegex(name='exes', rgxs=exes_rgxs, rvalue=-1, search='between').lf()
labeled = coverage(session, lf3, split=1)

# now let's view what this LF labeled
SentenceNgramViewer(labeled, session, n_per_page=1)

Coverage: 0.89% (25/2811)


<IPython.core.display.Javascript object>

The installed widget Javascript is the wrong version. It must satisfy the semver range ~2.1.4.


## B. Distant Supervision Labeling Functions

In addition to using LFGs that encode pattern matching heuristics, we can also write labeling functions that _distantly supervise_ examples. Here, we'll load in a list of known spouse pairs and check to see if the candidate pair matches one of these.

**DBpedia**
http://wiki.dbpedia.org/
Out database of known spouses comes from DBpedia, which is a community-driven resource similar to Wikipedia but for curating structured data. We use a preprocesses snapshot as our knowledge base for all labeling function development.

We can look at some of the example entries from DBPedia and use them in a simple distant supervision labeling function.

In [7]:
from lib.dbpedia import known_spouses 

list(known_spouses)[0:5]

[('Eleanor Powell', 'Glenn Ford'),
 ('Andronikos Doukas', 'Maria of Bulgaria'),
 ('Marjorie Rambeau', 'Willard Mack'),
 ('Margo St. James', 'Paul Avery'),
 ('Joan of England', 'William II the Good')]

If we have gold labeled data, which we do for our dev set, we can evaluate formal metrics. 

In [8]:
lf4 = DistantSupervision("dbpedia", kb=known_spouses).lf()
labeled = coverage(session, lf4, split=1)

# score out LF against dev set labels
score(session, lf4, split=1, gold=L_gold_dev)

SentenceNgramViewer(labeled, session, n_per_page=1)

Coverage: 0.92% (26/2811)
LF Score
Pos. class accuracy: 0.116
Neg. class accuracy: 0.998
Precision            0.846
Recall               0.116
F1                   0.204
----------------------------------------
TP: 22 | FP: 4 | TN: 2509 | FN: 168



<IPython.core.display.Javascript object>

The installed widget Javascript is the wrong version. It must satisfy the semver range ~2.1.4.


## C. Writing Custom Labeling Functions

The strength of LFs is that you can write any arbitrary function and use it to supervise a classification task. This approach can combine many of the same strategies discussed above or encode other information. For example, we observe that when mentions of person names occur far apart in a sentence, this is a good indicator that the candidate's label is False. 


cheat sheet pointer

In [9]:
def LF_too_far_apart(c):
    """Person mentions occur at a distance > 40 words"""
    return -1 if len(list(get_between_tokens(c))) > 40 else 0


def LF_distant_supervision(c):
    """
    Check if the *last names* of each person from the Spouse candidate  
    are also found in the DBpedia database of known spouses.
    """
    p1, p2 = c.person1.get_span(), c.person2.get_span()
    return 1 if (p1, p2) in known_spouses or (p2, p1) in known_spouses else 0

In [10]:
score(session, LF_too_far_apart, split=1, gold=L_gold_dev)

LF Score
Pos. class accuracy: 0.0
Neg. class accuracy: 1.0
Precision            0.0
Recall               0.0
F1                   0.0
----------------------------------------
TP: 0 | FP: 0 | TN: 2225 | FN: 176



# VI. Development Sandbox
----

## A. Writing Your Own Labeling Functions

Using the information above, write your own labeling functions for this task. 

In [11]:
# PLACE YOUR LFs HERE

spouses = {'spouse', 'wife', 'husband', 'ex-wife', 'ex-husband'}
family = {'father', 'mother', 'sister', 'brother', 'son', 'daughter',
              'grandfather', 'grandmother', 'uncle', 'aunt', 'cousin'}
family = family | {f + '-in-law' for f in family}
other = {'boyfriend', 'girlfriend' 'boss', 'employee', 'secretary', 'co-worker'}

# Helper function to get last name
def last_name(s):
    name_parts = s.split(' ')
    return name_parts[-1] if len(name_parts) > 1 else None    

def LF_husband_wife(c):
    return 1 if len(spouses.intersection(get_between_tokens(c))) > 0 else 0

def LF_husband_wife_left_window(c):
    if len(spouses.intersection(get_left_tokens(c[0], window=2))) > 0:
        return 1
    elif len(spouses.intersection(get_left_tokens(c[1], window=2))) > 0:
        return 1
    else:
        return 0
    
def LF_same_last_name(c):
    p1_last_name = last_name(c.person1.get_span())
    p2_last_name = last_name(c.person2.get_span())
    if p1_last_name and p2_last_name and p1_last_name == p2_last_name:
        if c.person1.get_span() != c.person2.get_span():
            return 1
    return 0

def LF_no_spouse_in_sentence(c):
    return -1 if np.random.rand() < 0.75 and len(spouses.intersection(c.get_parent().words)) == 0 else 0

def LF_and_married(c):
    return 1 if 'and' in get_between_tokens(c) and 'married' in get_right_tokens(c) else 0
    
def LF_familial_relationship(c):
    return -1 if len(family.intersection(get_between_tokens(c))) > 0 else 0

def LF_family_left_window(c):
    if len(family.intersection(get_left_tokens(c[0], window=2))) > 0:
        return -1
    elif len(family.intersection(get_left_tokens(c[1], window=2))) > 0:
        return -1
    else:
        return 0

def LF_other_relationship(c):
    return -1 if len(other.intersection(get_between_tokens(c))) > 0 else 0

import bz2

# Function to remove special characters from text
def strip_special(s):
    return ''.join(c for c in s if ord(c) < 128)

# Read in known spouse pairs and save as set of tuples
with bz2.BZ2File('data/spouses_dbpedia_workshop.csv.bz2', 'rb') as f:
    known_spouses = set(
        tuple(strip_special(x).strip().split(',')) for x in f.readlines()
    )
# Last name pairs for known spouses
last_names = set([(last_name(x), last_name(y)) for x, y in known_spouses if last_name(x) and last_name(y)])
    
def LF_distant_supervision(c):
    p1, p2 = c.person1.get_span(), c.person2.get_span()
    return 1 if (p1, p2) in known_spouses or (p2, p1) in known_spouses else 0

def LF_distant_supervision_last_names(c):
    p1, p2 = c.person1.get_span(), c.person2.get_span()
    p1n, p2n = last_name(p1), last_name(p2)
    return 1 if (p1 != p2) and ((p1n, p2n) in last_names or (p2n, p1n) in last_names) else 0


## B. Applying Labeling Functions
---

Next, we need to actually run the LFs over **all** of our training candidates, producing a set of `Labels` and `LabelKeys` (just the names of the LFs) in the database.  We'll do this using the `LabelAnnotator` class, a UDF which we will again run with `UDFRunner`.

### 1. Preparing your Labeling Functions

First we put all our labeling functions into list:

In [17]:
LFs = [
    DistantSupervision(name="dbpedia", kb=known_spouses).lf(),
    
    MatchRegex(name='exes', rgxs=exes_rgxs, rvalue=-1, search='between').lf(),
    MatchRegex(name='exes', rgxs=exes_rgxs, rvalue=-1, search='left', window=1).lf(),
    MatchRegex(name='exes', rgxs=exes_rgxs, rvalue=-1, search='left', window=2).lf(),
    MatchRegex(name='exes', rgxs=exes_rgxs, rvalue=-1, search='left', window=3).lf(),
    MatchRegex(name='exes', rgxs=exes_rgxs, rvalue=-1, search='right', window=1).lf(),
    MatchRegex(name='exes', rgxs=exes_rgxs, rvalue=-1, search='right', window=2).lf(),
    MatchRegex(name='exes', rgxs=exes_rgxs, rvalue=-1, search='right', window=3).lf(),
    
    MatchTerms(name='other_relationship', terms=other_relationship, rvalue=-1, search='between').lf(),
    MatchTerms(name='other_relationship', terms=other_relationship, rvalue=-1, search='left', window=1).lf(),
    MatchTerms(name='other_relationship', terms=other_relationship, rvalue=-1, search='left', window=2).lf(),
    MatchTerms(name='other_relationship', terms=other_relationship, rvalue=-1, search='left', window=3).lf(),
    MatchTerms(name='other_relationship', terms=other_relationship, rvalue=-1, search='right', window=1).lf(),
    MatchTerms(name='other_relationship', terms=other_relationship, rvalue=-1, search='right', window=2).lf(),
    MatchTerms(name='other_relationship', terms=other_relationship, rvalue=-1, search='right', window=3).lf(),
    
    MatchTerms(name='marriage', terms=marriage, rvalue=1, search='left', window=1).lf(),
    MatchTerms(name='marriage', terms=marriage, rvalue=1, search='left', window=2).lf(),
    MatchTerms(name='marriage', terms=marriage, rvalue=1, search='left', window=3).lf(),
    
    # place your lf names here
]

for lf in LFs:
    print lf.__name__

# LFs = [
#     LF_distant_supervision, LF_distant_supervision_last_names, 
#     LF_husband_wife, LF_husband_wife_left_window, LF_same_last_name,
#     LF_no_spouse_in_sentence, LF_and_married, LF_familial_relationship, 
#     LF_family_left_window, LF_other_relationship
# ]

LF_TERMS_other_relationship_[between|words]_FALSE
LF_TERMS_other_relationship_[left|words|window=1]_FALSE
LF_TERMS_other_relationship_[left|words|window=2]_FALSE
LF_TERMS_other_relationship_[left|words|window=3]_FALSE
LF_TERMS_other_relationship_[right|words|window=1]_FALSE
LF_TERMS_other_relationship_[right|words|window=2]_FALSE
LF_TERMS_other_relationship_[right|words|window=3]_FALSE


Then we setup the label annotator class:

In [18]:
from snorkel.annotations import LabelAnnotator
labeler = LabelAnnotator()

### 2. Generating the Label Matrix

In [19]:
np.random.seed(1701)

%time L_train = labeler.apply(split=0, lfs=LFs, parallelism=4)
print L_train.shape

%time L_dev = labeler.apply_existing(split=1, lfs=LFs, parallelism=4)
print L_dev.shape

Clearing existing...
Running UDF...
CPU times: user 10.1 s, sys: 3 s, total: 13.1 s
Wall time: 1min 2s
(22254, 7)
Clearing existing...
Running UDF...
CPU times: user 3.89 s, sys: 440 ms, total: 4.33 s
Wall time: 11 s
(2811, 7)


In [20]:
L_train.lf_stats(session)

Unnamed: 0,j,Coverage,Overlaps,Conflicts
LF_TERMS_other_relationship_[between|words]_FALSE,0,0.007999,0.000225,0.0
LF_TERMS_other_relationship_[left|words|window=1]_FALSE,1,0.002786,0.002786,0.0
LF_TERMS_other_relationship_[left|words|window=2]_FALSE,2,0.003235,0.003235,0.0
LF_TERMS_other_relationship_[left|words|window=3]_FALSE,3,0.00328,0.003235,0.0
LF_TERMS_other_relationship_[right|words|window=1]_FALSE,4,9e-05,9e-05,0.0
LF_TERMS_other_relationship_[right|words|window=2]_FALSE,5,0.000539,0.000539,0.0
LF_TERMS_other_relationship_[right|words|window=3]_FALSE,6,0.000584,0.000539,0.0


### 3. Label Matrix Empirical Accuracies

If we have a small set of human-labeled data

In [21]:
L_dev.lf_stats(session, labels=L_gold_dev.toarray().ravel())

Unnamed: 0,j,Coverage,Overlaps,Conflicts,TP,FP,FN,TN,Empirical Acc.
LF_TERMS_other_relationship_[between|words]_FALSE,0,0.01174,0.0,0.0,0,0,6,24,0.8
LF_TERMS_other_relationship_[left|words|window=1]_FALSE,1,0.003913,0.003913,0.0,0,0,0,11,1.0
LF_TERMS_other_relationship_[left|words|window=2]_FALSE,2,0.004269,0.004269,0.0,0,0,0,12,1.0
LF_TERMS_other_relationship_[left|words|window=3]_FALSE,3,0.005692,0.004269,0.0,0,0,0,16,1.0
LF_TERMS_other_relationship_[right|words|window=1]_FALSE,4,0.0,0.0,0.0,0,0,0,0,
LF_TERMS_other_relationship_[right|words|window=2]_FALSE,5,0.0,0.0,0.0,0,0,0,0,
LF_TERMS_other_relationship_[right|words|window=3]_FALSE,6,0.0,0.0,0.0,0,0,0,0,


## 3. Iterating on Labeling Function Design

When writing labeling functions, you will want to iterate on the process outlined above several times. You should focus on tuning individual LFs, based on emprical accuracy metrics, and adding new LFs to improve coverage. 