### Explore annotated airline tweet data provided by Crowdflower
A Super Handy CrowdFlower Glossary of Terms can be found [here](https://success.crowdflower.com/hc/en-us/articles/202703305-Glossary-of-Terms)!

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline

#### Read-In Jobs-Level Data (from CrowdFlower's *Data for Everyone* [library](https://www.crowdflower.com/data-for-everyone/))

In [2]:
cf = pd.read_csv("http://cdn2.hubspot.net/hub/346378/file-2612489700-csv/DFE_CSVs/Airline-Full-Non-Ag-DFE-Sentiment.csv")
print cf.columns
cf.head(2)

Index([u'_unit_id', u'_created_at', u'_golden', u'_id', u'_missed',
       u'_started_at', u'_tainted', u'_channel', u'_trust', u'_worker_id',
       u'_country', u'_region', u'_city', u'_ip', u'airline_sentiment',
       u'negativereason', u'airline', u'airline_sentiment_gold', u'name',
       u'negativereason_gold', u'retweet_count', u'text', u'tweet_coord',
       u'tweet_created', u'tweet_id', u'tweet_location', u'user_timezone'],
      dtype='object')


Unnamed: 0,_unit_id,_created_at,_golden,_id,_missed,_started_at,_tainted,_channel,_trust,_worker_id,...,airline_sentiment_gold,name,negativereason_gold,retweet_count,text,tweet_coord,tweet_created,tweet_id,tweet_location,user_timezone
0,681448150,2/25/2015 04:52:40,False,1575073003,,2/25/2015 04:49:12,False,elite,0.8108,31110645,...,,cairdin,,0,@VirginAmerica What @dhepburn said.,,2015-02-24 11:35:52 -0800,570306133677760513,,Eastern Time (US & Canada)
1,681448150,2/25/2015 05:22:10,False,1575093916,,2/25/2015 05:19:59,False,prodege,0.8919,1908948,...,,cairdin,,0,@VirginAmerica What @dhepburn said.,,2015-02-24 11:35:52 -0800,570306133677760513,,Eastern Time (US & Canada)


#### Look _golden and _missed flags

In [None]:
print cf._golden.value_counts(dropna=False) #Golden=Test Tweets
print cf._missed.value_counts(dropna=False) #not sure what this is, not necesarily this tweet was missed...

#### Look at "Tainted" Tweets and worker ID Trust Scores

In [13]:
print cf._tainted.value_counts(dropna=False) #no tweets marked at tainted
print cf._trust.describe() ##All Trust Scores in Range 70% - 100% - "tainted" judgements dropped

False    43786
True     11997
Name: _golden, dtype: int64
NaN     53924
True     1859
Name: _missed, dtype: int64
False    55783
Name: _tainted, dtype: int64
count    55783.000000
mean         0.850374
std          0.066688
min          0.700000
25%          0.809500
50%          0.857100
75%          0.892900
max          1.000000
Name: _trust, dtype: float64


In [19]:
###Look at Progression
cf.sort_values(by=["_worker_id","_started_at"])[["_worker_id","_started_at","_created_at","tweet_id","text",\
                                                  "_golden","airline_sentiment","airline_sentiment_gold","_trust"]].head(200)

Unnamed: 0,_worker_id,_started_at,_created_at,tweet_id,text,_golden,airline_sentiment,airline_sentiment_gold,_trust
3954,1908948,2/25/2015 03:16:13,2/25/2015 03:32:53,569851578276048896,"@united I'm aware of the flight details, thank...",True,negative,negative,0.8919
6398,1908948,2/25/2015 03:16:13,2/25/2015 03:32:53,569473998519578624,@united flighted delayed for hours. 10pm arriv...,True,negative,negative,0.8919
10921,1908948,2/25/2015 03:16:13,2/25/2015 03:32:53,568637541513089024,@united rebooked 24 hours after original fligh...,True,negative,negative,0.8919
19831,1908948,2/25/2015 03:16:13,2/25/2015 03:32:53,568752276040495104,@SouthwestAir If a travel advisory is posted f...,True,neutral,neutral,0.8919
23682,1908948,2/25/2015 03:16:13,2/25/2015 03:32:53,567717985092395008,@southwestair - kind of early but any idea whe...,True,neutral,neutral,0.8919
29994,1908948,2/25/2015 03:16:13,2/25/2015 03:32:53,568182544124014592,@JetBlue I am heading to JFK now just on princ...,True,negative,negative,0.8919
44130,1908948,2/25/2015 03:16:13,2/25/2015 03:32:53,568824537338417154,@AmericanAir - how long does it take to get cr...,True,negative,negative,0.8919
44347,1908948,2/25/2015 03:16:13,2/25/2015 03:32:53,568551906634797056,@AmericanAir Hopefully you ll see bad ones as ...,True,neutral,positive,0.8919
14307,1908948,2/25/2015 03:32:54,2/25/2015 03:33:59,567778009013178368,@united So what do you offer now that my fligh...,True,negative,negative,0.8919
43466,1908948,2/25/2015 03:32:54,2/25/2015 03:33:59,569601363799359488,@AmericanAir should reconsider #usairways acqu...,True,negative,negative,0.8919


* Starts with a series of Test (golden) tweets to determine trust score, then occational spot checks w/ a test tweet
* Doesnt look like trust score fluctuates with performance in judgement-level data. 
Confirm that there is 1 trust score per worker only.