<a href="https://colab.research.google.com/github/thursy/GCP-BQML/blob/master/BQML_Code_GettingStarted.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# BigQuery Machine Learning

In this tutorial, you use the sample Google Analytics sample dataset for BigQuery to create a model that predicts whether a website visitor will make a transaction.
In this tutorial, you use:
 

*   BigQuery ML to create a binary logistic regression model using the CREATE MODEL statement
*   The ML.EVALUATE function to evaluate the ML model
*   The ML.PREDICT function to make predictions using the ML model






###In case using Google Colab, user Authentication is needed by running the code below

In [0]:
from google.colab import auth
auth.authenticate_user()

After making a project and enable the API, here are the steps to do Machine Learning using BigQuery

* Step one: Create your dataset
* Step two: Create your model
* Step three: Run and get training statistics
* Step four: Evaluate your model
* Step five: Use your model to predict outcomes


###Please provide the project Id and dataset for this tutorial purpose. Verify the availabe dataset here

In [20]:
from google.cloud import bigquery

# https://cloud.google.com/resource-manager/docs/creating-managing-projects
project_id = "tutorial-medium-2020"
client = bigquery.Client(project=project_id)

for dataset in client.list_datasets():
  print(dataset.dataset_id)

bqml_tutorial


###Run the cell below to create a model 
 * In the first line change the project Id with your project name
 * In the second line change the dataset the model name

In [0]:
%%bigquery --project tutorial-medium-2020 df
CREATE MODEL `bqml_tutorial.sample_model`
OPTIONS(model_type='logistic_reg') AS
SELECT
  IF(totals.transactions IS NULL, 0, 1) AS label,
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(geoNetwork.country, "") AS country,
  IFNULL(totals.pageviews, 0) AS pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'

###Run the following cell to Evaluate the model 


In [25]:
%%bigquery --project tutorial-medium-2020
SELECT
  *
FROM
  ML.EVALUATE(MODEL `bqml_tutorial.sample_model`, (
SELECT
  IF(totals.transactions IS NULL, 0, 1) AS label,
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(geoNetwork.country, "") AS country,
  IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))


Unnamed: 0,precision,recall,accuracy,f1_score,log_loss,roc_auc
0,0.468504,0.110801,0.985343,0.179217,0.046242,0.982727


###Prediction

In [26]:
%%bigquery --project tutorial-medium-2020
SELECT
  country,
  SUM(predicted_label) as total_predicted_purchases
FROM ML.PREDICT(MODEL `bqml_tutorial.sample_model`, (
SELECT
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(totals.pageviews, 0) AS pageviews,
  IFNULL(geoNetwork.country, "") AS country
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY country
ORDER BY total_predicted_purchases DESC
LIMIT 10


Unnamed: 0,country,total_predicted_purchases
0,United States,220
1,Taiwan,8
2,Canada,7
3,India,2
4,Turkey,2
5,Japan,2
6,Australia,1
7,Guyana,1
8,St. Lucia,1
9,Thailand,1
