# Today: EDA GROUPS!

Choose a team, and then spend some time looking at data.  We want you to explore the data using the techniques we learned this far including:

- Grouping / subsetting / segmentation
- Summary statistics
    - Histograms
    - Plotting
- Slicing
- Cleaning data
    - assessing proper types
    - expected values
    - object converstion
   

At the end of our exploratory analysis, each group will be giving a 10 minute presentation on their findings to the rest of class.


In [2]:
import pandas as pd, numpy as np, seaborn as sns

%matplotlib inline

## Team Alpha Drone

Since the API from `api.dronestre.am` provides data on drone strikes in near real time, this **might** be useful to hold President Obama accountable to his promise of reducing drone strikes.  Your mission, is to explore drone strike data, doing any accomanying research with your analysis, and report back any good summary statistics.

Also, we would like to know:
 - Is this a good source of data?
     - Why / why not?
     
*Politics aside -- let's keep it to what is measurable in our dataset.  This isn't meant to prove or disprove anything.  It's a **fun** dataset to look at moreso than a motivator of political discourse.*

In [3]:
# First we need to fetch some data using Python requests from API
# Read more about Python requests:
# http://docs.python-requests.org/en/master/user/quickstart/

import requests

response = requests.get("http://api.dronestre.am/data")
json_data = response.json()
drone_df = pd.DataFrame(json_data['strike'])

In [6]:
drone_df.shape

(629, 22)

In [12]:
drone_df.country.value_counts()

Pakistan                       429
Yemen                          178
Somalia                         21
Pakistan-Afghanistan Border      1
Name: country, dtype: int64

In [16]:
drone_df.bureau_id.value_counts().count()

615

In [None]:
drone_df.bureau_id.value_counts().count()

In [17]:
drone_df.location.value_counts().count()

38

In [20]:
drone_df.location.value_counts()

North Waziristan                         303
South Waziristan                          98
Abyan Province                            45
Shabwah Province                          39
Hadhramaut Province                       32
Al Bayda Province                         23
Marib Province                            20
Al Jawf Province                           8
Lower Shabelle                             7
Kurram Agency                              7
Khyber Agency                              6
Bajaur Agency                              4
Lower Juba                                 4
Middle Juba                                3
Bannu Frontier                             3
Gedo                                       2
Sanaa Province                             2
Lahj Province                              2
Unknown                                    2
Khyber Pakhtunkhwa                         1
Lower Kurram                               1
South-central Somalia                      1
          

In [23]:
drone_df.bureau_id.value_counts().sort_index()

            4
B1          1
B10         1
B11         1
B12         1
B13         1
B14         1
B15         1
B16         1
B17         1
B18         1
B19         1
B2          1
B20         1
B21         1
B22         1
B23         1
B24         1
B25         1
B26         1
B27         1
B28         1
B29         1
B3          1
B30         1
B31         1
B32         1
B33         1
B34         1
B35         1
B36         1
B37         1
B38         1
B39         1
B4          1
B40         1
B41         1
B42         1
B43         1
B44         1
B45         1
B46         1
B47         1
B48         1
B49         1
B4c         1
B5          1
B50         1
B51         1
B6          1
B7          1
B8          1
B9          1
Ob1         1
Ob10        1
Ob100       1
Ob101       1
Ob102       1
Ob103       1
Ob104       1
Ob105       1
Ob106       1
Ob107       1
Ob108       1
Ob109       1
Ob11        1
Ob110       1
Ob111       1
Ob112       1
Ob113       1
Ob114       1
Ob115 

In [24]:
drone_df.town.value_counts()   #.sort_index()

Datta Khel                                37
                                          26
Miranshah                                 23
Shawal                                    19
Danda Darpakhel                           13
Mir Ali                                   12
Azam Warsak                                9
Mukalla                                    9
Al Mahfad                                  8
Ghulam Khan                                7
Radaa                                      7
Machi Khel                                 7
Wadi Abeeda                                6
Al Qatn                                    6
Darga Mandi                                6
Norak                                      6
Khushali                                   6
Jaar                                       5
Wadi Abida                                 5
Zinjibar                                   5
Ladha                                      5
Mohammad Khel                              5
Doga Mada 

In [29]:
drone_df.bij_link[2]

u'http://www.thebureauinvestigates.com/2011/08/10/the-bush-years-2004-2009/'

In [31]:
drone_df[drone_df.deaths_max.str.isdigit() == False]

Unnamed: 0,_id,articles,bij_link,bij_summary_short,bureau_id,children,civilians,country,date,deaths,deaths_max,deaths_min,injuries,lat,location,lon,names,narrative,number,target,town,tweet_id
282,55c79e721cbee48856a309a0,[],http://www.thebureauinvestigates.com/2012/03/2...,Six civilians were wounded in an apparent dron...,YEM015,,,Yemen,2011-06-18T00:00:00.000Z,0,,,6.0,13.218055,Abyan Province,45.307832,[],At least 6 Yemeni civilians wounded by a US dr...,283,,Jaar,299605330252398593
313,55c79e721cbee48856a309bf,[],http://www.thebureauinvestigates.com/2012/02/2...,The United States launched a series of drone a...,SOM013,,,Somalia,2011-09-25T00:00:00.000Z,Unknown,?,0,,-0.354098,Lower Juba,42.545328,[],US drones attacked the port city of Kismayo. R...,314,,Kismayo,301525050341855235
334,55c79e721cbee48856a309d4,[],http://www.thebureauinvestigates.com/2012/03/2...,The militant stronghold of Rumeila was targete...,YEM038,,,Yemen,2011-11-08T00:00:00.000Z,Unknown,?,?,,13.612304,Abyan Province,46.106282,[],Five US drone strikes' killed an unknown numbe...,335,,Rumeila,309463189454716928
401,55c79e721cbee48856a30a17,[],http://www.thebureauinvestigates.com/2012/05/0...,The Yemen Times reported a US drone strike hit...,YEM097,,,Yemen,2012-06-14T00:00:00.000Z,Unknown,?,?,,14.33936309,Shabwah Province,47.44163168,[],A US drone strike hit the town of Azzan. 'Casu...,402,,Azzan,337636403427028992


In [49]:
drone_df.deaths_max = drone_df.deaths_max.apply(pd.to_numeric, args=('coerce',)).fillna(0)

In [53]:
drone_df.deaths_min = drone_df.deaths_min.apply(pd.to_numeric, args=('coerce',)).fillna(0)

In [56]:
type(drone_df.ix[0,'children'])

unicode

In [54]:
drone_df[drone_df.children.str.isdigit() == False]

Unnamed: 0,_id,articles,bij_link,bij_summary_short,bureau_id,children,civilians,country,date,deaths,deaths_max,deaths_min,injuries,lat,location,lon,names,narrative,number,target,town,tweet_id
0,55c79e711cbee48856a30886,[],http://www.thebureauinvestigates.com/2012/03/2...,In the first known US targeted assassination u...,YEM001,,0,Yemen,2002-11-03T00:00:00.000Z,6,6.0,6.0,,15.47467,Marib Province,45.322755,"[Qa'id Salim Sinan al-Harithi, Abu Ahmad al-Hi...",In the first known US targeted assassination u...,1,,,2.785446894838907e+17
2,55c79e711cbee48856a30888,[],http://www.thebureauinvestigates.com/2011/08/1...,"Two killed, including Haitham al-Yemeni an al ...",B2,,,Pakistan,2005-05-08T00:00:00.000Z,2,2.0,2.0,,32.98677989,North Waziristan,70.26082993,"[Haitham al-Yemeni, Samiullah Khan]",2 people killed in a Predator strike which rep...,3,Haitham al-Yemeni,Toorikhel,2.785448122553672e+17
8,55c79e721cbee48856a3088e,[],http://www.thebureauinvestigates.com/2011/08/1...,Initial claims of 30 Taliban killed were lower...,B7,0-1,8,Pakistan,2007-01-16T00:00:00.000Z,8,8.0,8.0,,32.83575063,South Waziristan,69.55581665,"[Katoor Khan, Taj Alam]","8 people, many thought to be innocent woodcutt...",9,,Zamazola,2.785450977909801e+17
9,55c79e721cbee48856a3088f,[],http://www.thebureauinvestigates.com/2011/08/1...,"Attack on a house and madrassa killed 3-4, rep...",B8,,4,Pakistan,2007-04-27T00:00:00.000Z,3-4,4.0,3.0,9,33.09499311,North Waziristan,70.05912781,[],An attack on a religious school killed four pe...,10,Maulvi Noor Mohammed,Saidgai,2.785451579163566e+17
10,55c79e721cbee48856a30890,[],http://www.thebureauinvestigates.com/2011/08/1...,"20-34 killed in attack on a madrassa, includin...",B9,Possibly,0-34,Pakistan,2007-06-19T00:00:00.000Z,20-34,34.0,20.0,15,32.98879572,North Waziristan,70.28743744,[],"20-50 people, including children, were killed ...",11,,Mami Rogha,2.7854519936024576e+17
11,55c79e721cbee48856a30891,[],http://www.thebureauinvestigates.com/2011/08/1...,CIA drone strike kills 5-10 alleged Haqqani Ne...,B10,,,Pakistan,2007-11-02T00:00:00.000Z,5-10,10.0,5.0,12,33.020179,North Waziristan,70.07286072,[],A missile fired from an unmanned aerial drone ...,12,Jalalludin Haqqani,Danda Darpakhel,2.785452811030528e+17
12,55c79e721cbee48856a30892,[],http://www.thebureauinvestigates.com/2011/08/1...,Strike injures Egyptian al Qaeda leader and id...,B11,,,Pakistan,2007-12-03T00:00:00.000Z,Unknown,0.0,0.0,1,32.80285902,Bannu Frontier,70.48690796,[],A US drone fired at least 2 missiles early Wed...,13,Shaykh Issa al-Masri,Jani Khel,2.7854532195976806e+17
14,55c79e721cbee48856a30894,[],http://www.thebureauinvestigates.com/2011/08/1...,"At least 8 killed, including possibly students...",B13,,0-5,Pakistan,2008-02-28T00:00:00.000Z,8-13,13.0,8.0,16,32.30222379,South Waziristan,69.40544128,[],A 2am attack on a house killed up to 13 people.,15,,Azam Warsak,2.7854540612685005e+17
15,55c79e721cbee48856a30895,[],http://www.thebureauinvestigates.com/2011/08/1...,At least 12 alleged militants in an attack on ...,B14,,0-4,Pakistan,2008-03-16T00:00:00.000Z,12-20,20.0,12.0,9,32.30802742,South Waziristan,69.45762634,[],12-15 people were killed in a strike outside W...,16,,Dhook Pir Bagh,2.7854549939979056e+17
17,55c79e721cbee48856a30897,[],http://www.thebureauinvestigates.com/2011/08/1...,"One killed in strike on Makeen, South Waziristan.",B16,,,Pakistan,2008-06-14T00:00:00.000Z,1,1.0,1.0,,32.62780989,South Waziristan,69.84283447,[],1 person was killed as unmanned US drones fire...,18,Baitullah Mehsud,Makeen,2.78545584460288e+17


In [52]:
drone_df[drone_df.deaths.str.isdigit() == False]

Unnamed: 0,_id,articles,bij_link,bij_summary_short,bureau_id,children,civilians,country,date,deaths,deaths_max,deaths_min,injuries,lat,location,lon,names,narrative,number,target,town,tweet_id
1,55c79e711cbee48856a30887,[],http://www.thebureauinvestigates.com/2011/08/1...,First known drone strike in Pakistan kills at ...,B1,2,2,Pakistan,2004-06-17T00:00:00.000Z,6-8,8.0,6,1,32.30512565,South Waziristan,69.57624435,"[Nek Mohammad, Fakhar Zaman, Azmat Khan, Marez...",The first known fatal US drone strike inside P...,2,Nek Mohammed,Wana,2.7854475086753382e+17
6,55c79e721cbee48856a3088c,[],http://www.thebureauinvestigates.com/2011/08/1...,A strike targeted a possible militant commande...,B5,5,10-18,Pakistan,2006-01-13T00:00:00.000Z,13-22,22.0,13,,34.81549453,Bajaur Agency,71.4969635,[],"18 civilians, including 6 children, were kille...",7,Ayman al-Zawahiri; Abu Khabab al-Masri; Abd Ra...,Damadola,2.7854501604401152e+17
7,55c79e721cbee48856a3088d,[],http://www.thebureauinvestigates.com/2011/08/1...,"An attack on a madrassa (allegedly a ""Taliban ...",B6,69,80-82,Pakistan,2006-10-30T00:00:00.000Z,81-83,83.0,81,3,34.83634999,Bajaur Agency,71.49215698,"[Maulvi Liaqat, Mohammad Tahir (16), Maulvi Kh...","80-83 civilians, including 69 children, report...",8,Maulvi Liaqat,Chenegai,2.7854505521461248e+17
9,55c79e721cbee48856a3088f,[],http://www.thebureauinvestigates.com/2011/08/1...,"Attack on a house and madrassa killed 3-4, rep...",B8,,4,Pakistan,2007-04-27T00:00:00.000Z,3-4,4.0,3,9,33.09499311,North Waziristan,70.05912781,[],An attack on a religious school killed four pe...,10,Maulvi Noor Mohammed,Saidgai,2.785451579163566e+17
10,55c79e721cbee48856a30890,[],http://www.thebureauinvestigates.com/2011/08/1...,"20-34 killed in attack on a madrassa, includin...",B9,Possibly,0-34,Pakistan,2007-06-19T00:00:00.000Z,20-34,34.0,20,15,32.98879572,North Waziristan,70.28743744,[],"20-50 people, including children, were killed ...",11,,Mami Rogha,2.7854519936024576e+17
11,55c79e721cbee48856a30891,[],http://www.thebureauinvestigates.com/2011/08/1...,CIA drone strike kills 5-10 alleged Haqqani Ne...,B10,,,Pakistan,2007-11-02T00:00:00.000Z,5-10,10.0,5,12,33.020179,North Waziristan,70.07286072,[],A missile fired from an unmanned aerial drone ...,12,Jalalludin Haqqani,Danda Darpakhel,2.785452811030528e+17
12,55c79e721cbee48856a30892,[],http://www.thebureauinvestigates.com/2011/08/1...,Strike injures Egyptian al Qaeda leader and id...,B11,,,Pakistan,2007-12-03T00:00:00.000Z,Unknown,0.0,0,1,32.80285902,Bannu Frontier,70.48690796,[],A US drone fired at least 2 missiles early Wed...,13,Shaykh Issa al-Masri,Jani Khel,2.7854532195976806e+17
13,55c79e721cbee48856a30893,[],http://www.thebureauinvestigates.com/2011/08/1...,"Abu Laith al-Libi, a senior al Qaeda figure, w...",B12,3,4-6,Pakistan,2008-01-29T00:00:00.000Z,12-15,15.0,12,1,32.95826549,North Waziristan,70.24108887,[],"5 civilians, including 3 children, were report...",14,Abu Laith al-Libi,Mir Ali,2.7854535946523846e+17
14,55c79e721cbee48856a30894,[],http://www.thebureauinvestigates.com/2011/08/1...,"At least 8 killed, including possibly students...",B13,,0-5,Pakistan,2008-02-28T00:00:00.000Z,8-13,13.0,8,16,32.30222379,South Waziristan,69.40544128,[],A 2am attack on a house killed up to 13 people.,15,,Azam Warsak,2.7854540612685005e+17
15,55c79e721cbee48856a30895,[],http://www.thebureauinvestigates.com/2011/08/1...,At least 12 alleged militants in an attack on ...,B14,,0-4,Pakistan,2008-03-16T00:00:00.000Z,12-20,20.0,12,9,32.30802742,South Waziristan,69.45762634,[],12-15 people were killed in a strike outside W...,16,,Dhook Pir Bagh,2.7854549939979056e+17


In [50]:
drone_df.deaths_max.describe()

count    629.000000
mean       8.736089
std       11.025462
min        0.000000
25%        4.000000
50%        6.000000
75%       10.000000
max      200.000000
Name: deaths_max, dtype: float64

In [5]:
drone_df.head().transpose()

Unnamed: 0,0,1,2,3,4
_id,55c79e711cbee48856a30886,55c79e711cbee48856a30887,55c79e711cbee48856a30888,55c79e721cbee48856a30889,55c79e721cbee48856a3088a
articles,[],[],[],[],[]
bij_link,http://www.thebureauinvestigates.com/2012/03/2...,http://www.thebureauinvestigates.com/2011/08/1...,http://www.thebureauinvestigates.com/2011/08/1...,http://www.thebureauinvestigates.com/2011/08/1...,http://www.thebureauinvestigates.com/2011/08/1...
bij_summary_short,In the first known US targeted assassination u...,First known drone strike in Pakistan kills at ...,"Two killed, including Haitham al-Yemeni an al ...","Failed strike against Abu Hamza Rabia (""al Qae...","Syrian Abu Hamza Rabia, the senior al Qaeda op..."
bureau_id,YEM001,B1,B2,B3,B4
children,,2,,3,2
civilians,0,2,,3-8,2
country,Yemen,Pakistan,Pakistan,Pakistan,Pakistan
date,2002-11-03T00:00:00.000Z,2004-06-17T00:00:00.000Z,2005-05-08T00:00:00.000Z,2005-11-05T00:00:00.000Z,2005-12-01T00:00:00.000Z
deaths,6,6-8,2,8,5


In [48]:
drone_df.ix[313,'deaths_max']

nan

In [14]:
pd.options.display.max_columns = 999
pd.options.display.max_rows = 999

In [57]:
drone_df.deaths_max.describe()

count    629.000000
mean       8.736089
std       11.025462
min        0.000000
25%        4.000000
50%        6.000000
75%       10.000000
max      200.000000
Name: deaths_max, dtype: float64

In [58]:
for i in range(6,10)[::-1]:
    print i

9
8
7
6


## Team Popcorn

You're a force to be reckoned with when you `read_csv` into your `movie_df` dataframe.  You are team "Popcorn".  It would be nice to know:

 - Which movies remained in the top 10 the longest
 - Which movies were good investments?
 
 Bonus:
 - Do any holidays impact sales performance or position?


_[There's a data dictionary available!](http://www.amstat.org/publications/jse/v17n1/datasets.mclaren.html)_

In [61]:
movie_df = pd.read_csv("../assets/data/movie_weekend.csv")
movie_df.head(100)

Unnamed: 0,NUMBER,MOVIE,WEEK_NUM,WEEKEND_PER_THEATER,WEEKEND_DATE
0,1.0,A Beautiful Mind,1.0,701.0,12/21/01
1,1.0,A Beautiful Mind,2.0,14820.0,12/28/01
2,1.0,A Beautiful Mind,3.0,8940.0,1/4/02
3,1.0,A Beautiful Mind,4.0,6850.0,1/11/02
4,1.0,A Beautiful Mind,5.0,5280.0,1/18/02
5,1.0,A Beautiful Mind,6.0,5155.0,1/25/02
6,1.0,A Beautiful Mind,7.0,3735.0,2/1/02
7,1.0,A Beautiful Mind,8.0,2840.0,2/8/02
8,1.0,A Beautiful Mind,9.0,3890.0,2/15/02
9,1.0,A Beautiful Mind,10.0,2565.0,2/22/02


In [60]:
movie_df.shape

(1281, 5)

## Team Titanic

Known for it's honesty, the Titanic dataset is a very common dataset for doing classification prediction of fatalities.  For our challenge, why don't we try to focus on the latent characteristics. 

For the record, this is how much know:

![](http://www.glencoe.com/sec/math/studytools/books/0-07-829631-5/images/IQ02-003W-8228662.gif)

Certainly there is a better story to tell.

**Bonus**
 - Can you pull out titles (ie: Mr., Miss, Mrs) from the feature "Name" and assign it to a new variable? We think there could be something interesting to look at in aggregate based on titles!

In [9]:
titanic_df = pd.read_csv("../assets/data/titanic.csv")
titanic_df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
