# **Weak Supervision and Labeling Functions** 
### A brief overview of Snorkel by Monika Daryani
---

## Weak Supervision 

Many traditional machine learning approaches have an insatiable appetite for labeled training data. The key problem of the current common approach of having SMEs (Subject Matter Experts) label large amounts of data is that it is very expensive, especially in specialty fields like medicine. It also poses a privacy risk allowing crowdsourced workers label the data manually. Several different approaches have been studied to alleviate this bottleneck like semi-supervised learning [1], transfer learning [2], multi-task learning [3].[4] and active learning [5]. Weak supervision, however takes a different approach where we leverage the concepts of data programming [6] to unify and model multiple sources of weak labels to create a strong label. Weak labels are obtained via heuristic rules, distant supervision techniques, keyword and/or pattern matches, third-party models, noisy labels from crowd workers, weak classifiers and more.


## Snorkel and Labeling Functions

> **Snorkel is a system that combines various sources of weak supervision to learn a generative model to apply labels to a data-set programmatically.**

In the work done by Ratner et al in the paper on Snorkel [7] they introduce the concept of Labeling Functions (LFs) as black box snippets of code written by SMEs which are in turn used to label subsets of unlabeled data. Each labeling function is a weak label generator based on the above mentioned methods of obtaining weak labels. Since a single LF can produce less-than-ideal training labels, Snorkel learns a generative model to combine the outputs of multiple LFs in order to generate probabilistic labels. This also solves the heterogeneity problem where multiple weak label sources can either overlap or conflict on a data point making it difficult and cumbersome to consolidate the labels. Ratner et al. also show that using the probabilistic training labels to train a powerful, flexible discriminative model (such as a deep neural network) will generalize beyond the signal expressed in our labeling functions.

**Enough of this boring theory part, let's start some practical stuff. Let's dig in !!**

# Labeling with Snorkel

We have divided our labeling procedure into 6 steps
1. Load data
2. Look at the data and deduce required labeling functions
3. Write Labeling Functions
4. Analyze Labeling Function results and improve it
5. Combine LFs and assign labels
6. Split into training - validation blocks and run a classifier

## Flow-chart for Labeling data via Snorkel

![](Flowchart.png)

**Basic information**

**Dataset :** Amazon Customer Reviews Dataset (Mobile Electronics) [9]

**Task :** Detecting helpful reviews

We define helpful reviews which share important information about the products which can help a user buy the product.

In [8]:
# We define constants to represent the class labels for helpful, not helpful and abstain
ABSTAIN = -1
HELPFUL = 0
NOT_HELPFUL = 1

## Pre-step : Install Snorkel


## Step 1 : Load data

In [9]:
import pandas as pd
import csv

In [10]:
## Loading data ##
data_df = pd.read_csv("amazon_reviews_us_Mobile_Electronics_v1_00.tsv",error_bad_lines=False, sep='\t', quoting=csv.QUOTE_NONE)

In [18]:
data_df.head()

Unnamed: 0,marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date
0,US,20422322,R8MEA6IGAHO0B,B00MC4CED8,217304173,BlackVue DR600GW-PMP,Mobile_Electronics,5,0,0,N,Y,Very Happy!,"As advertised. Everything works perfectly, I'm...",2015-08-31
1,US,40835037,R31LOQ8JGLPRLK,B00OQMFG1Q,137313254,GENSSI GSM / GPS Two Way Smart Phone Car Alarm...,Mobile_Electronics,5,0,1,N,Y,five star,it's great,2015-08-31
2,US,51469641,R2Y0MM9YE6OP3P,B00QERR5CY,82850235,iXCC Multi pack Lightning cable,Mobile_Electronics,5,0,0,N,Y,great cables,These work great and fit my life proof case fo...,2015-08-31
3,US,4332923,RRB9C05HDOD4O,B00QUFTPV4,221169481,abcGoodefg® FBI Covert Acoustic Tube Earpiece ...,Mobile_Electronics,4,0,0,N,Y,Work very well but couldn't get used to not he...,Work very well but couldn't get used to not he...,2015-08-31
4,US,44855305,R26I2RI1GFV8QG,B0067XVNTG,563475445,Generic Car Dashboard Video Camera Vehicle Vid...,Mobile_Electronics,2,0,0,N,Y,Cameras has battery issues,"Be careful with these products, I have bought ...",2015-08-31


In [11]:
data_df = data_df[['product_id','product_title','star_rating','helpful_votes','total_votes','vine','verified_purchase','review_headline','review_body']]
data_df = data_df.dropna()
data_df.head()

Unnamed: 0,product_id,product_title,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body
0,B00MC4CED8,BlackVue DR600GW-PMP,5,0,0,N,Y,Very Happy!,"As advertised. Everything works perfectly, I'm..."
1,B00OQMFG1Q,GENSSI GSM / GPS Two Way Smart Phone Car Alarm...,5,0,1,N,Y,five star,it's great
2,B00QERR5CY,iXCC Multi pack Lightning cable,5,0,0,N,Y,great cables,These work great and fit my life proof case fo...
3,B00QUFTPV4,abcGoodefg® FBI Covert Acoustic Tube Earpiece ...,4,0,0,N,Y,Work very well but couldn't get used to not he...,Work very well but couldn't get used to not he...
4,B0067XVNTG,Generic Car Dashboard Video Camera Vehicle Vid...,2,0,0,N,Y,Cameras has battery issues,"Be careful with these products, I have bought ..."


## Step 2: Look at the data and deduce required labeling functions

We can sample small samples of our data frame and run multiple times to look at examples which we will consider helpful and which we will not.

In [6]:
data_df.sample(10)

Unnamed: 0,product_id,product_title,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body
39633,B007WDQFB6,i-BLASON Barnes & Noble NOOK Simple Touch and ...,5,0,0,N,Y,good looking and durable,I have had this cover for over a year and stil...
51330,B00B4MGPHU,i-Blason 8 Pin Lightning Connector Integrated ...,5,1,3,N,Y,Fantastic Speaker Dock,"Since the day I bought it, I haven't stopped u..."
9035,B00PHBVIN0,Disnix T66n Fm Transmitter with Dual 5v/2.1a U...,1,0,0,N,Y,Is it available again?,This item is total junk DOA
19961,B00I6DLIB8,MSI GTX 750 TI 2GB DDR5 OC 128Bit DVI-D/D-SUB/...,5,0,1,N,Y,Five Stars,Love it!
85578,B0043TAMAM,Bundle Monster Borders Kobo (1st Generation) E...,5,0,0,N,Y,Great Product,Great product. It fits my Kobo ereader perfec...
77940,B0056QYBRQ,Jensen Jmc-180 Wall-Mountable Cd System With A...,2,59,62,N,N,Save your money,1. The remote only controls a few items on the...
41451,B00F4HLIMS,Axess PB2704 Blue Portable Boombox MP3/CD Play...,1,3,5,N,Y,has no output connections,"love the number of input options, however, nev..."
3948,B00V49YO2Q,For GMC Sereies Chevrolet Chevy Avalanche 07-1...,5,0,1,N,Y,Five Stars,We r still working out the problem
41370,B008CURD46,3D RED LED Hyundai Logo Badge Light Car Trunk ...,1,0,0,N,Y,I wish it was better but its not.,When this arrived it was poorly packaged. This...
52153,B00AE0FTDO,HHI Silicone Skin Case for iPod Nano 7th Gener...,5,3,3,N,Y,Adorable!,"I'm a penguin lover, so this was perfect for m..."


## Step 3 : Write Labeling Functions

Writing Labeling functions is an iterative process. You write a labeling function, apply it, check output and statistics and then keep writing more labeling functions or reforming them. We will here discuss types of labeling functions you can generate.

In [12]:
from snorkel.labeling import labeling_function

**(i) General Labeling Functions** 


This is the simplest form of labeling function. Our model will label a column as helpful if we have more than 5 helpful_votes or total_votes.


In [13]:
@labeling_function()
def vote(x):
    return HELPFUL if int(x.helpful_votes) > 5 or int(x.total_votes) > 5 else ABSTAIN


**(ii) Keyword Search** 


This another simple form of labeling function. Our model will label a column as helpful if we have some keyword match, else, it will not.

Let us say helpful reviews discuss about quality and durability of the product. A  review containing images and videos are also helpful.

In [14]:
# checks if the review discusses quality of product
@labeling_function()
def quality(x):
    return HELPFUL if "quality" in str(x.review_body).lower() else ABSTAIN

# checks if the review discusses durability of product
@labeling_function()
def durability(x):
    return HELPFUL if "durable" in str(x.review_body).lower() else ABSTAIN


In [15]:
# checks if review contains video
@labeling_function()
def video(x):
    return HELPFUL if "videoid" in str(x.review_body).lower() else ABSTAIN

**(iii) Regex / Pattern matching**

We know that in a natural language processing model regex help us detect patterns which may not be obtained via normal keyword matching. The reviews discussing durability can use multiple words like "durable", "durability", "lasts long", "long-lasting", "lasted long", etc. We will change our definition of durability function above.

In [16]:
import re

# checks if the review discusses durability of product
@labeling_function()
def durability(x):
    return HELPFUL if re.search(r"durab.*", str(x.review_body), flags=re.I) or re.search(r"last.*long", str(x.review_body), flags=re.I) else ABSTAIN


**(iv) Heuristics**

We can write labeling functions which use heuristic functions, such as length of the review. We would expect not helpful reviews be short. Again, here we will be labeling not helpful reviews because longer reviews do not mean them being helpful.

In [18]:
# checks if review is greater than a certain length 
@labeling_function()
def short_review(x):
    return NOT_HELPFUL if len(str(x.review_body).split()) < 10 else ABSTAIN


**(v) Using third-party models**

We can use various third-party models such as TextBlob, Spacy, etc.

We here assume highly subjective opinions are not helpful enough as they don't actually share a personal opinion. 

In [19]:
from textblob import TextBlob

# checks if the review is a personal opinion
@labeling_function()
def subjectivity(x):
    return NOT_HELPFUL if TextBlob(x.review_body).sentiment.subjectivity > 0.8 else ABSTAIN

**(vi) Using preprocessors**

A Snorkel Preprocessor pre-processes a data point and maps it to a new data point, which can then be used by a LF for processing further. Preprocessors can also use memoization (caching) to avoid re-executing for every LF.

We can change above subjectivity function to use a preprocessor.

In [20]:
from snorkel.preprocess import preprocessor
from textblob import TextBlob


@preprocessor(memoize=True)
def textblob_sentiment(x):
    scores = TextBlob(str(x.review_body))
    x.polarity = scores.sentiment.polarity
    x.subjectivity = scores.sentiment.subjectivity
    return x

In [21]:
# checks if the review is a personal opinion
@labeling_function(pre=[textblob_sentiment])
def subjectivity(x):
    return NOT_HELPFUL if x.subjectivity > 0.8 else ABSTAIN


Now, we assume that very neutral reviews would also not be helpful enough. We want to know positive and negative opinions of people. Hence we use Textblob polarity for this.

Here, This will run faster than the last cell, since we memoized the Preprocessor outputs.

In [22]:
@labeling_function(pre=[textblob_sentiment])
def polarity(x):
    return NOT_HELPFUL if x.polarity > -0.2 and x.polarity < 0.2 else ABSTAIN


Snorkel supports more domain specific or complex preprocessors like SpacyPreprocessor, spark.make_spark_preprocessor and more. Please look at documentation[10] for more details.

## Step 4 : Analyze Labeling Function results and improve it

With **each** labeling function one is supposed to spot-check for any and with the **combined** labeling functions, one is supposed to check results obtained and improve them as required. The task involves improving accuracy and coverage.

For this Snorkel provides tooling for common LF analyses using the LFAnalysis utility [8]
- Polarity: The set of unique labels this LF outputs (excluding abstains)
- Coverage: The fraction of the dataset the LF labels
- Overlaps: The fraction of the dataset where this LF and at least one other LF label
- Conflicts: The fraction of the dataset where this LF and at least one other LF label and disagree
- Correct: The number of data points this LF labels correctly (if gold labels are provided)
- Incorrect: The number of data points this LF labels incorrectly (if gold labels are provided)
- Empirical Accuracy: The empirical accuracy of this LF (if gold labels are provided)

Note that in our current setup, we can't compute Correct, Incorrect, and Empirical Accuracy statistics because we don't have any ground-truth labels


Each label should create as less false positives as possible. Then the combination of LFs should have high coverage and accuracy. We will choose accuracy above coverage here.

In [27]:
from snorkel.labeling import LabelModel, PandasLFApplier

# Define the set of labeling functions (LFs)
lfs = [vote,video,short_review,quality,durability,subjectivity]

In [28]:
applier = PandasLFApplier(lfs=lfs)
L_train = applier.apply(df=data_df)

  from pandas import Panel
100%|██████████| 104972/104972 [00:14<00:00, 7152.83it/s]


In [29]:
from snorkel.labeling import LFAnalysis

LFAnalysis(L=L_train, lfs=lfs).lf_summary()

Unnamed: 0,j,Polarity,Coverage,Overlaps,Conflicts
vote,0,[0],0.062359,0.016242,0.002829
video,1,[0],0.001839,0.001096,0.000124
short_review,2,[1],0.108562,0.02354,0.005401
quality,3,[0],0.110544,0.023768,0.007145
durability,4,[0],0.032847,0.009765,0.001505
subjectivity,5,[1],0.0726,0.024749,0.006611


In [34]:
from snorkel.analysis import get_label_buckets

buckets = get_label_buckets(L_train[:, 3],L_train[:, 2])
data_df.iloc[buckets[(ABSTAIN, HELPFUL)]].sample(10, random_state=1)

Unnamed: 0,product_id,product_title,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body
74147,B0079LGLOC,JVC Bluetooth DVD/CD/USB/SD Receiver with 7-in...,4,26,28,N,Y,Decent Unit for the money.,I installed this into a 2004 Ford F150 Lariat ...
40918,B0067XVNTG,Generic Car Dashboard Video Camera Vehicle Vid...,4,0,0,N,Y,Excellent for the Money.,Mixed reviews had me concerned but I took a ch...
83103,B004BD6VSW,SanDisk Sansa Fuze 4 GB Video MP3 Player (White),5,0,0,N,N,Excellent MP3 player,"I absolutely loved my white Sansa Fuze player,..."
82616,B0050MW0Z6,LCD Display and Touch Digitizer Screen Assembl...,5,0,2,N,Y,LCD and Touch screen Digitizer,"Received part good quality, installed the same..."
5385,B00TR246AC,Ls-4167,4,1,1,N,Y,Great value,"As with most underpriced Bluetooth speakers, t..."
100952,B001268ZMQ,Sonic Impact Video-55 Video Player w/7'' LCD f...,2,0,0,N,N,Disappointed,Just purchased two of these for my kids to vie...
90799,B002X53B60,Cbus Wireless Three Silicone Cases / Skins / C...,4,0,0,N,Y,good price for 3 covers,Fair price for a 3 pack of cases. The quality ...
14482,B00RFMRWW0,"Blusmart Car Fm Bluetooth Radio Transmitter, H...",5,2,2,N,N,Recommended,"I really love this product, I was looking for ..."
46818,B008Y6LG8W,MiniGuard Samsung Galaxy Note 10.1 Inch Tablet...,4,0,0,N,Y,A great value!,I didn't have very high expectations for these...
69360,B007TOFPZK,VIGO Digital Wireless Automobile FM Transmitte...,4,0,0,N,Y,Power sucker but works well,Full range of FM frequencies means I can find ...


In [37]:
L_train

array([[-1, -1, -1, ..., -1, -1, -1],
       [-1, -1, -1, ...,  1, -1, -1],
       [-1, -1, -1, ..., -1, -1, -1],
       ...,
       [ 0, -1, -1, ..., -1, -1, -1],
       [ 0, -1, -1, ..., -1, -1,  1],
       [ 0, -1, -1, ..., -1, -1,  1]])

## Step 5 : Combining LFs and applying labels

In [30]:
from snorkel.labeling import MajorityLabelVoter

majority_model = MajorityLabelVoter()
preds_train = majority_model.predict(L=L_train)

In [31]:
data_df["label"] = preds_train

In [54]:
data_df[data_df.label!=ABSTAIN]

Unnamed: 0,product_id,product_title,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,label
1,B00OQMFG1Q,GENSSI GSM / GPS Two Way Smart Phone Car Alarm...,5,0,1,N,Y,five star,it's great,1
4,B0067XVNTG,Generic Car Dashboard Video Camera Vehicle Vid...,2,0,0,N,Y,Cameras has battery issues,"Be careful with these products, I have bought ...",1
6,B00MJCDPM2,Sentey LS-4460 B-Trek S8 Bluetooth Portable St...,3,0,1,N,Y,Didn't love the first one,"First one arrived as a brick. Wouldn't work, ...",1
7,B00ET5AWBY,iPad Car Headrest Mount Holder for iPad 2/ iPa...,5,0,0,N,Y,Five Stars,Worked great for vacation,1
9,B00YO3UYXW,Jensen MCR-100 Cassette Player/Recorder 1 Touc...,5,164,168,N,Y,I LOVE my recorder,I LOVE my recorder. Bought it obviously becaus...,0
...,...,...,...,...,...,...,...,...,...,...
104962,B00005OTZQ,Royal SE 2800 Hand-Held Spot Cleaner,1,0,0,N,N,Don't waste your money......,This machine has broken down on me 4 times dur...,1
104964,B00005OTZQ,Royal SE 2800 Hand-Held Spot Cleaner,5,41,45,N,N,A Dream Spot-Cleaning Machine,This little spot-scrubber is one hard-working ...,0
104966,B00005OTZQ,Royal SE 2800 Hand-Held Spot Cleaner,5,0,0,N,N,FABULOUS,Works wonders!! Wish I had gotten one sooner a...,1
104971,B00005OTZQ,Royal SE 2800 Hand-Held Spot Cleaner,5,10,11,N,N,Well worth [it],We live in an apartment with hardwood floors a...,0


### Let us look at our helpful reviews

In [51]:
data_df[data_df.label==HELPFUL].sample(10)

Unnamed: 0,product_id,product_title,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,label
25814,B00L87LJ54,"32gb Blue Slim 1.8"" 4th LCD Mp3 Mp4 Player Fm ...",3,85,87,N,Y,the GUI is simple but at first hard to operate...,Well you get what you paid for true to adverti...,0
44191,B009X24URK,HHI Silicone Game Boy Case for iPod Touch 5th ...,5,0,0,N,Y,Great looking iPod cover.,Bought this for my daughter and she loves it. ...,0
18182,B00M69QUY8,Bluetooth Speakers Dylan™ Cocoon Portable Blue...,3,1,1,N,Y,Not bad for the price,"Not bad for the price. Loud, but sound quality...",0
27305,B00F2GMWCG,E-PRANCE® Mini 0801 Car DVR Dash Camera Ambare...,5,1,5,N,Y,Great Camera,This item arrived pretty fast and it's video q...,0
49831,B00A73KQPY,Ion Tape Express Usb Cassette Tape To Mp3 Conv...,3,7,8,N,Y,"It works, but background noise with audio tapes",Product worked as advertised but I tried to us...,0
45108,B00DHC5UGC,Ruichen Black Unisex Stylish Jelly Silicone St...,5,0,0,N,Y,Perfect!,I have to admit that for the prize I was watin...,0
103678,B000LI0QFU,Wireless Remote Control for iPod,4,6,6,N,Y,Best Price & Design for an iPod to Stereo Hookup,I was looking for a remote control for my iPod...,0
10746,B003JW0C7C,Sennheiser HMEC 250 NoiseGard Pilot Headset,2,1,1,N,Y,arrived early,"Good quality, came with out the extra boom mic...",0
87850,B004911E9M,Wall AC Charger USB Sync Data Cable for iPhone...,1,146,156,N,Y,"Very dissapointed, the adapter plug does not w...",I wanted this to take to work for charging my ...,0
54420,B00C0LRWTY,HD 720P Car DVR Camera Dash Cam Video Recorder...,2,0,0,N,Y,Good picture quality but hard to set time/date...,"Originally, I thought about giving one star si...",0


### Let us look at our not helpful reviews

In [46]:
data_df[data_df.label==NOT_HELPFUL].sample(10)

Unnamed: 0,product_id,product_title,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,label
3700,B0034DIZ4S,FM Transmitter /Car Charger/ Holder for iPhone...,1,0,0,N,Y,One Star,Very bad quality,1
20105,B00KTALNA2,Car Flush Mount Rear View & Side View Dual Use...,3,0,0,N,Y,Three Stars,It's okay Not Bad,1
33506,B00H460MO2,Processing time 2 days-10 Pieces FR4 Copper Cl...,5,0,0,N,Y,Excellent seller,I like this beautiful item is excellent and cl...,1
15452,B0047SK8GM,Cassette Tape Adapter For Apple Ipod Mini-3.5mm,5,0,0,N,Y,Five Stars,Great product,1
18111,B00IAA81P8,Bose SoundLink III Bluetooth Speaker with Soft...,5,0,0,N,Y,Five Stars,excellent,1
13966,B00MJCDPM2,Sentey LS-4460 B-Trek S8 Bluetooth Portable St...,4,0,0,N,Y,Four Stars,I love it! The look and sound is great,1
102435,B000ML3I2Y,Headphone/ Earphone/ Earbud Smart Wrap for App...,3,2,4,N,Y,MP3 cord winder,Nice product; but I must say the shipping char...,1
16675,B00J46XO9U,"iXCC Lightning Cable 3ft, iPhone charger, for ...",5,0,0,N,N,Five Stars,Work very well. Thank you!,1
26816,B000HHERSC,AC to DC Car Cigarette Lighter Socket Adapter,5,0,0,N,Y,Five Stars,Works great!!!,1
13199,B00IHS9BPM,Waterproof Bluetooth Shower Speaker Portable S...,4,0,0,N,Y,Four Stars,Wish it came with a charger,1


We can see that helpful reviews are generally pretty descriptive of our product (telling what worked well and what did not) while not helpful ones aren't. 

Thus, we labeled our data-set without having any ground-truth labels. **Hurray!!**

## Step  6 : Split into training - validation blocks and run a classifier

One can use all deep-learning and machine learning methods once we have labels. This is not covered here as this is not technically part of Snorkel.

## Summary and other functionalities

Thus, we can see that Snorkel is a powerful tool to label unlabeled data. The labeling functions are important aspects of Snorkel and highly dependent on users. Thus, users are responsible for good Labeling functions and a better labelled data-set. Reducing false positives for each labeling function and then improving overall accuracy and coverage would be a good practice for writing labeling functions. Again, we should keep in mind that this is weak supervision, so accuracy of labelling will not be as good as labels by SMEs.

So far, we have just used Snorkel's Labeling functions (LFs). Snorkel also has other operators such as Transformation Functions(TFs) and Slicing Functions(SFs). Transformation functions are functions that can be applied to a training data point to create another valid training data point of the same class. TFs are used for data-augmentaton. Slicing functions create slices of the data-set which can be used to check or improve performance over those slices.

Snorkel provides various other functionalities too. Snorkel can be used for Recommender System, Information Extraction  and Image classification tasks. Snorkel is compatible with distributed environment via Dask and Spark, which helps it to be easily workable in industrial set-ups. Snorkel can also work in conjunction with crowdsourcing labels to label the dataset.

For more details, please check <https://www.snorkel.org/> and <https://snorkel.readthedocs.io/en/v0.9.3/>

> **by Monika Daryani**   
> (Spotlight submission for CSCE 670, Texas A&M University)

## Bibliography

[1] Chapelle, Olivier, Bernhard Scholkopf, and Alexander Zien. "Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]." IEEE Transactions on Neural Networks 20.3 (2009): 542-542.

[2] Pan, Sinno Jialin, and Qiang Yang. "A survey on transfer learning." IEEE Transactions on knowledge and data engineering 22.10 (2009): 1345-1359. 

[3] Caruana, Richard A. “Multitask Learning: A Knowledge-Based Source of Inductive Bias.” Machine Learning Proceedings 1993, 1993, pp. 41–48., doi:10.1016/b978-1-55860-307-3.50012-5.

[4] Augenstein, Isabelle, Andreas Vlachos, and Diana Maynard. "Extracting relations between non-standard entities using distant supervision and imitation learning." Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2015. 

[5] Settles, Burr. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences, 2009.

[6] Ratner, Alexander J., et al. "Data programming: Creating large training sets, quickly." Advances in neural information processing systems. 2016.

[7] Ratner, Alexander, et al. "Snorkel: Rapid training data creation with weak supervision." The VLDB Journal 29.2 (2020): 709-730.

[8] https://www.snorkel.org/use-cases/01-spam-tutorial

[9] https://s3.amazonaws.com/amazon-reviews-pds/tsv/index.txt

[10] https://snorkel.readthedocs.io/en/master/packages/preprocess.html