<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">


# Telecomm EDA Challenge Lab

_Author: Alex Combs (NYC) _

---

Let's do some Exploratory Data Analysis (EDA)! As a data scientist, you often may find yourself given a data set you've never seen before, and asked to do a rapid analysis. This is today's goal.

# Prompt

You work for a telecommunications company. The company has been storing metadata about customer phone usage, as part of the regular course of business. Currently, this data is sitting in an unsecured database. The company doesn't want to pay to increase their database security, because they don't think there's really anything to be learned from the metadata.

They are under pressure from "right to privacy" organizations to beef up the database security. These organizations argue that you can learn a lot about a person from their cell phone metadata.

The telecom company wants to understand if this is true, and they want your help. They will give you one person's metadata for 2014 and want to see what you can learn from it.

Working in teams, create a report revealing everything you can about the person. Prepare a presentation, with slides, showcasing your findings.


# The Data

The [person's metadata](./datasets/metadata.csv) has the following fields:

| Field Name          | Description
| ---                 | ---
| **Cell Cgi**        | cell phone tower identifier
| **Cell Tower**      | cell phone tower location
| **Comm Identifier** |	de-identified recipient of communication
| **Comm Timedate String** | time of communication
| **Comm Type	Id**  | type of communication
| **Latitude**        | latitude of communication
| **Longitude**       | longitude of communication


# Hints

This is totally open-ended! If you're totally stumped -- and only if stumped -- should you look below for prompts. As a starting point, given that you have geo-locations, consider investigating ways to display this type of information (i.e. mapping functionality).

<font color='white'>
Well for starters, he's in Australia!

Ideas for things to look into:
- where does he work?
- where does he live?
- who does he contact most often?
- what hours does he work?
- did he move?
- did he go on holiday?  If so, where did he go?
- did he get a new phone?

Challenges:
- how does he get to work?
- where does his family live?
- if he went on holiday, can you find which flights he took?
- can you guess who some of his contacts are, based on the frequency, location, time and mode (phone/text) of communications?


If you're stuck on how to map the data, you can try "basemap" or "gmplot", or anything else you find online.
</font>

In [2]:
import numpy as np
import pandas as pd

from sklearn.linear_model import Ridge, Lasso, ElasticNet, LinearRegression
from sklearn.model_selection import cross_val_score, train_test_split

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
meta = pd.read_csv('./datasets/metadata.csv')

In [4]:
print '\n Describe:'
print meta.describe()


 Describe:
           Latitude     Longitude
count  10476.000000  10476.000000
mean     -35.136188    150.612841
std        3.141723      1.470169
min      -42.884810    144.848243
25%      -33.884603    151.202296
50%      -33.796610    151.266540
75%      -33.788150    151.266540
max      -33.557310    151.289340


In [5]:
!pip install folium

!pip install geopy

# pygmaps

!pip install gpxpy
import gpxpy.geo



In [6]:
meta.head()

Unnamed: 0,Cell Cgi,Cell Tower Location,Comm Identifier,Comm Timedate String,Comm Type,Latitude,Longitude
0,50501015388B9,REDFERN TE,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 9:40,Phone,-33.892933,151.202296
1,50501015388B9,REDFERN TE,62157ccf2910019ffd915b11fa037243b75c1624,4/1/14 9:42,Phone,-33.892933,151.202296
2,505010153111F,HAYMARKET #,c8f92bd0f4e6fb45ed7fce96fc831b283db2b642,4/1/14 13:13,Phone,-33.880329,151.20569
3,505010153111F,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 13:13,Phone,-33.880329,151.20569
4,5.05E+106,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 17:27,Phone,-33.880329,151.20569


In [7]:
meta['comm_date'] = meta['Comm Timedate String'].str.slice(0,6)

In [8]:
meta.head()

Unnamed: 0,Cell Cgi,Cell Tower Location,Comm Identifier,Comm Timedate String,Comm Type,Latitude,Longitude,comm_date
0,50501015388B9,REDFERN TE,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 9:40,Phone,-33.892933,151.202296,4/1/14
1,50501015388B9,REDFERN TE,62157ccf2910019ffd915b11fa037243b75c1624,4/1/14 9:42,Phone,-33.892933,151.202296,4/1/14
2,505010153111F,HAYMARKET #,c8f92bd0f4e6fb45ed7fce96fc831b283db2b642,4/1/14 13:13,Phone,-33.880329,151.20569,4/1/14
3,505010153111F,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 13:13,Phone,-33.880329,151.20569,4/1/14
4,5.05E+106,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 17:27,Phone,-33.880329,151.20569,4/1/14


In [9]:
meta['comm_time'] = meta['Comm Timedate String'].str.slice(7,12)

In [10]:
meta.head()

Unnamed: 0,Cell Cgi,Cell Tower Location,Comm Identifier,Comm Timedate String,Comm Type,Latitude,Longitude,comm_date,comm_time
0,50501015388B9,REDFERN TE,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 9:40,Phone,-33.892933,151.202296,4/1/14,9:40
1,50501015388B9,REDFERN TE,62157ccf2910019ffd915b11fa037243b75c1624,4/1/14 9:42,Phone,-33.892933,151.202296,4/1/14,9:42
2,505010153111F,HAYMARKET #,c8f92bd0f4e6fb45ed7fce96fc831b283db2b642,4/1/14 13:13,Phone,-33.880329,151.20569,4/1/14,13:13
3,505010153111F,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 13:13,Phone,-33.880329,151.20569,4/1/14,13:13
4,5.05E+106,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 17:27,Phone,-33.880329,151.20569,4/1/14,17:27


In [11]:
meta.shape

(10476, 9)

In [12]:
meta['Latitude'].unique()

array([-33.89293336, -33.88032891, -33.88417103, -33.88024   ,
       -33.86113   , -33.79661   , -33.796679  , -33.88964   ,
       -33.87829   , -33.8864    , -33.93416   , -42.83762   ,
       -42.84338   , -42.85984   , -42.85307   , -42.8606    ,
       -33.86655   , -33.87932   , -33.89605   , -33.89233   ,
       -33.87814   , -33.87055   , -33.8703    , -33.87215   ,
       -33.793648  , -33.80638   , -33.83096   , -33.924799  ,
       -42.88194   , -42.88101   , -33.884603  , -37.670418  ,
       -37.7778    , -37.7929    , -33.828073  , -33.79948   ,
       -33.879     , -33.779333  , -33.86633   , -33.78815   ,
       -33.87513   , -33.79345   , -33.79275   , -33.934674  ,
       -33.923217  , -33.937558  , -33.94674   , -33.93285   ,
       -33.90533   , -33.85947   , -33.8294    , -33.791965  ,
       -42.87457   , -42.88029   , -33.83415   , -33.88058   ,
       -42.82842   , -42.88481   , -33.75949   , -33.63599   ,
       -33.71763   , -33.57839   , -33.6038    , -33.55

In [13]:
meta.describe()

Unnamed: 0,Latitude,Longitude
count,10476.0,10476.0
mean,-35.136188,150.612841
std,3.141723,1.470169
min,-42.88481,144.848243
25%,-33.884603,151.202296
50%,-33.79661,151.26654
75%,-33.78815,151.26654
max,-33.55731,151.28934


In [14]:
meta.groupby("Latitude").size()

Latitude
-42.884810       4
-42.881940      15
-42.881010       1
-42.880290       4
-42.874570       7
-42.860600       3
-42.859840     501
-42.853070     197
-42.843380     723
-42.837620      17
-42.828420       4
-37.792900       1
-37.777800       1
-37.670418       1
-36.502180      29
-36.356700     112
-36.331075      20
-36.167930       7
-33.946740      33
-33.937558      65
-33.934674       2
-33.934160      22
-33.932850      13
-33.924799       1
-33.923217       5
-33.911320       2
-33.905330       5
-33.896050       1
-33.892933     712
-33.892330       1
              ... 
-33.878140       6
-33.875130      22
-33.872150       5
-33.870550       4
-33.870300       1
-33.866550       3
-33.866330      15
-33.862850       1
-33.861130      49
-33.859470       8
-33.834150       6
-33.830960      14
-33.829400      29
-33.828073       6
-33.806380      20
-33.799480      92
-33.796679     231
-33.796610     454
-33.793648     106
-33.793450      30
-33.792750      19
-33

In [16]:
locations = meta[['Latitude',"Longitude"]]
len(locations)

10476

In [25]:
locations.head()

Unnamed: 0,Latitude,Longitude
0,-33.892933,151.202296
1,-33.892933,151.202296
2,-33.880329,151.20569
3,-33.880329,151.20569
4,-33.880329,151.20569


In [33]:
locations['Latitude'].dtype

dtype('float64')

In [34]:
locations['Longitude'].dtype

dtype('float64')

In [18]:
import folium

In [44]:
locations_new = locations.dropna()

In [45]:
locations_new.reset_index(inplace = True)

In [49]:
locations_new = locations_new[["Latitude","Longitude"]]

In [56]:
locations_new = locations_new.dropna()

In [62]:
type(locations_new)

pandas.core.frame.DataFrame

In [63]:
meta.dropna()

Unnamed: 0,Cell Cgi,Cell Tower Location,Comm Identifier,Comm Timedate String,Comm Type,Latitude,Longitude,comm_date,comm_time
0,50501015388B9,REDFERN TE,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 9:40,Phone,-33.892933,151.202296,4/1/14,9:40
1,50501015388B9,REDFERN TE,62157ccf2910019ffd915b11fa037243b75c1624,4/1/14 9:42,Phone,-33.892933,151.202296,4/1/14,9:42
2,505010153111F,HAYMARKET #,c8f92bd0f4e6fb45ed7fce96fc831b283db2b642,4/1/14 13:13,Phone,-33.880329,151.205690,4/1/14,13:13
3,505010153111F,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 13:13,Phone,-33.880329,151.205690,4/1/14,13:13
4,5.05E+106,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 17:27,Phone,-33.880329,151.205690,4/1/14,17:27
5,5050101532B23,CHIPPENDALE,6bbc17070aa91e2dab7909b96c6eecbd6109ba56,4/1/14 17:36,Phone,-33.884171,151.202350,4/1/14,17:36
6,5050101536E5E,CHIPPENDALE,6bbc17070aa91e2dab7909b96c6eecbd6109ba56,4/1/14 17:40,Phone,-33.884171,151.202350,4/1/14,17:40
7,5050101531F08,REDFERN TE,7cb96eadd3ff95e25406d24794027c443c0661c5,4/2/14 19:18,Phone,-33.892933,151.202296,4/2/14,19:18
8,505010153111F,HAYMARKET #,de40c5c1f9249f95f7fb216931db58747afef74f,4/3/14 14:35,Phone,-33.880329,151.205690,4/3/14,14:35
9,505010153111F,HAYMARKET #,66f32c1163d0e597983b65c51f5a477070ad3785,4/3/14 14:36,Phone,-33.880329,151.205690,4/3/14,14:36


In [66]:
map = folium.Map(location=[-35.136188,  150.612841], zoom_start=12)
for point in range(0,len(meta)):
    folium.Marker(location = [meta["Latitude"],meta["Longitude"]], popup = meta['comm_date'][point]).add_to(map)
map

TypeError: cannot convert the series to <type 'float'>