# event classification notebook

This notebook is created in support of data science work at <a href="https://evolytics.com">Evolytics</a>. 

Our goal is to extract event data from the Google Calendar API, clean the data, and use event description, summary, start time, and other features to classify an event as billable to a given client. 

In [14]:
import datetime
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
from dateutil import parser
import pandas as pd

SCOPES = 'https://www.googleapis.com/auth/calendar.readonly'

creds = 'credentials.json'

Authenticate against the Google API to generate a service that we'll use to pull event data:

In [3]:
def authenticate(credsfile):
    token = file.Storage('token.json')
    creds = token.get()
    
    if not creds or creds.invalid:
        flow = client.flow_from_clientsecrets(credsfile, SCOPES)
        creds = tools.run_flow(flow, store)
    
    service = build('calendar', 'v3', http=creds.authorize(Http()))
    
    return service 

In [6]:
service = authenticate(creds)

Query events related to Evolytics internal project, the "data dojo":

In [None]:
results = service.events().list(calendarId='primary',singleEvents=True,q='dojo').execute()

In [40]:
events = results.get('items', [])

Functions that extract emails from event, and create pandas dataframe from relevant event metadata:

In [106]:
def get_emails(event):
    l = []
    a = event['attendees']
    for attendee in a:
        l.append(str(attendee['email']))
    return str(l)


def pandify(event):
    Start = event['start'].get('dateTime')
    Start = str(Start)
    s = parser.parse(Start)
    
    End = event['end'].get('dateTime')
    End = str(End)
    e = parser.parse(End)
    
    try:
        d = str(event['description'])
    except KeyError:
        d = None
    
    dictionary = {
        "id" : str(event['id']),
        "start" : s,
        "end" : e,
        "description": d,
        "summary" : str(event['summary']),
        "attendees" : get_emails(event)
    }
    
    df = pd.DataFrame(data=dictionary, index=[0])
    return df
    

Resulting dataframe ready for analysis:

In [110]:
frames = []

for event in events:
    df = pandify(event)
    frames.append(df)
    
frames = pd.concat(frames)

frames.reset_index()

Unnamed: 0,index,id,start,end,description,summary,attendees
0,0,28t6j2868lfsgoiebp1krg8uuv,2018-12-20 08:00:00-06:00,2018-12-20 08:30:00-06:00,<br><b>1. </b>How are things going? :) Any que...,Joe & Sarah Touch Base,"['sowen@evolytics.com', 'jgruenbaum@evolytics...."
1,0,3ff76qccgr5begjanc487bs9lp,2019-01-03 15:00:00-06:00,2019-01-03 16:00:00-06:00,"\nHey Joe,\nJosh and I are conducting some use...",[Data Dojo] User Testing,['evolytics.com_36383832383530383839@resource....
2,0,1kkk1tno32as9t0cr9lqav3jjc,2019-01-07 13:00:00-06:00,2019-01-07 14:30:00-06:00,,[data dojo] Onboarding,"['jgruenbaum@evolytics.com', 'evermilyea@evoly..."
3,0,6dum1v57a199hdeqi6u63btvee,2019-01-14 13:30:00-06:00,2019-01-14 14:30:00-06:00,,[dojo] user dashboard progress,"['jlehne@evolytics.com', 'evermilyea@evolytics..."
