# Introduction
The main goal of this workbook is to lay the foundation of the Propensity to Buy model that will be developed. The project's objective is to identify sessions on Google Analytics that have a higher than average propensity to buy.

In this notebook basic Exploratory Data Analysis will be performed before moving onto the more advanced modelling and predictions. This is a preliminary notebook without prior knowledge of which techniques and models will and perform the best. 

The foundations laid here will lead to other areas of interest such as CTR on AdWords content and the propensity of a customer to churn, which will require other data sources. 

First a Google Analytics connection needs to be established to gather the required data.

### Import Packages and Configuration

In [5]:
# Import necessary packages
import os
from googleapiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Initializing Google Analytics Reporting API

In [7]:
# define key variables used in analysis
SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
KEY_FILE_LOCATION = os.path.join(os.path.abspath(os.path.join(os.getcwd(), os.pardir)), 'bhn-seo-tools-analytics-key.json')
VIEW_ID = '140617562'
DIMENSIONS = ['ga:source', 'ga:medium']
METRICS = ['ga:users', 'ga:sessions']


In [8]:
# create API connection
def initialize_analytics_reporting():
  """Initializes an Analytics Reporting API V4 service object.

  Returns:
    An authorized Analytics Reporting API V4 service object.
  """
  credentials = ServiceAccountCredentials.from_json_keyfile_name(
      KEY_FILE_LOCATION, SCOPES)

  # Build the service object.
  analytics = build('analyticsreporting', 'v4', credentials=credentials)

  return analytics

### Gather Data

In [9]:
# define report to return
def get_report(analytics):
  """Queries the Analytics Reporting API V4.
  https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/ for all metrics and dimensions

  Args:
    analytics: An authorized Analytics Reporting API V4 service object.
  Returns:
    The Analytics Reporting API V4 response.
  """
  return analytics.reports().batchGet(
      body={
        'reportRequests': [
        {
          'viewId': VIEW_ID,
          'dateRanges': [{'startDate': '7daysAgo', 'endDate': 'today'}],
            'metrics': [{'expression': i} for i in METRICS],
            'dimensions': [{'name': j} for j in DIMENSIONS]
        }]
      }
  ).execute()


# Data Pre-Processing

In [10]:
def convert_to_dataframe(response):
    for report in response.get('reports', []):
        column_header = report.get('columnHeader', {})
        dimension_headers = column_header.get('dimensions', [])
        metric_headers = [i.get('name', {}) for i in column_header.get('metricHeader', {}).get('metricHeaderEntries', [])]
        final_rows = []

        for row in report.get('data', {}).get('rows', []):
            dimensions = row.get('dimensions', [])
            metrics = row.get('metrics', [])[0].get('values', {})
            row_object = {}

            for header, dimension in zip(dimension_headers, dimensions):
                row_object[header] = dimension

            for metricHeader, metric in zip(metric_headers, metrics):
                row_object[metricHeader] = metric

            final_rows.append(row_object)

    data_frame_format = pd.DataFrame(final_rows)
    return data_frame_format

In [None]:
# parse JSON response from API and print
# todo: change to pandas df
def print_response(response):
  """Parses and prints the Analytics Reporting API V4 response.

  Args:
    response: An Analytics Reporting API V4 response.
  """
  for report in response.get('reports', []):
    column_header = report.get('columnHeader', {})
    dimension_headers = column_header.get('dimensions', [])
    metric_headers = column_header.get('metricHeader', {}).get('metricHeaderEntries', [])

    for row in report.get('data', {}).get('rows', []):
      dimensions = row.get('dimensions', [])
      date_range_values = row.get('metrics', [])

      for header, dimension in zip(dimension_headers, dimensions):
        print(header + ': ' + dimension)

      for i, values in enumerate(date_range_values):
        print('Date range: ' + str(i))
        for metricHeader, value in zip(metric_headers, values.get('values')):
          print(metricHeader.get('name') + ': ' + value)

In [13]:
# run gatherer and print/df parse
def main():
  analytics = initialize_analytics_reporting()
  response = get_report(analytics)
  df = convert_to_dataframe(response)
  print(df)

if __name__ == '__main__':
  main()


                                        ga:source   ga:medium ga:users  \
0                                        (direct)      (none)    13605   
1                      10-Cycling-Myths-Uncovered      Social        7   
2      10-Things-cyclists-wish-drivers-understood  newsletter        1   
3    10-ways-to-encourage-people-to-cycle-to-work      social        1   
4                               10.100.1.66:15871    referral        1   
..                                            ...         ...      ...   
995                         www-csuk.grgcloud.net    referral        2   
996      www-cyclingweekly-com.cdn.ampproject.org    referral       13   
997        www-telegraph-co-uk.cdn.ampproject.org    referral        2   
998                              www1.emarsys.net    referral        8   
999                               www2.worc.ac.uk    referral        1   

    ga:sessions  
0         19818  
1             7  
2             1  
3             1  
4             1  
.. 

# Exploratory Data Analysis


# Feature Relevance


# Feature Engineering


# Modelling


## Training


## Evaluation


## Prediction


# Conclusion