# Predict Visitor Purchases with a Classification Model in BQML
From https://www.cloudskillsboost.google/focuses/1794?parent=catalog

**For running this notebook** open GCP console and go to Vertex AI / Workbench / New Notebook . 
(With a small machine you'll be fine). 

### About BigQuery
BigQuery is Google's fully managed, NoOps, low cost analytics database. With BigQuery you can query terabytes and terabytes of data without having any infrastructure to manage or needing a database administrator. BigQuery uses SQL and can take advantage of the pay-as-you-go model. BigQuery allows you to focus on analyzing data to find meaningful insights.

### About BigQuery ML
BigQuery Machine Learning (BQML, product in beta) is a new feature in BigQuery where data analysts can create, train, evaluate, and predict with machine learning models with minimal coding.

### Using Google Analytics Merchandise Store Dataset
There is a newly available ecommerce dataset that has millions of Google Analytics records for the Google Merchandise Store loaded into BigQuery. In this lab you will use this data to run some typical queries that businesses would want to know about their customers' purchasing habits.
https://support.google.com/analytics/answer/3437719?hl=en

## Business Problem
Your data analyst team exported the Google Analytics logs for an ecommerce website into BigQuery and created a new table of all the raw ecommerce visitor session data for you to explore. Using this data, you'll try to answer a few questions.


In [1]:
%%bigquery
SELECT
  *
FROM
  `data-to-insights.ecommerce.web_analytics`
ORDER BY
  fullVisitorId
LIMIT
  5

Query complete after 0.21s: 100%|██████████| 1/1 [00:00<00:00,  6.06query/s]
Downloading: 100%|██████████| 5/5 [00:01<00:00,  3.31rows/s]


Unnamed: 0,visitorId,visitNumber,visitId,visitStartTime,date,totals,trafficSource,device,geoNetwork,customDimensions,hits,fullVisitorId,userId,channelGrouping,socialEngagementType
0,,2,1501536952,1501536952,20170731,"{'visits': 1, 'hits': 19, 'pageviews': 15, 'ti...","{'referralPath': '/', 'campaign': '(not set)',...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Americas', 'subContinent': 'Nor...","[{'index': 4, 'value': 'North America'}]","[{'hitNumber': 1, 'time': 0, 'hour': 14, 'minu...",334692759449,,Referral,Not Socially Engaged
1,,3,1501541392,1501541392,20170731,"{'visits': 1, 'hits': 19, 'pageviews': 11, 'ti...","{'referralPath': '/', 'campaign': '(not set)',...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Americas', 'subContinent': 'Nor...","[{'index': 4, 'value': 'North America'}]","[{'hitNumber': 1, 'time': 0, 'hour': 15, 'minu...",334692759449,,Referral,Not Socially Engaged
2,,1,1501106611,1501106611,20170726,"{'visits': 1, 'hits': 20, 'pageviews': 16, 'ti...","{'referralPath': '/', 'campaign': '(not set)',...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Americas', 'subContinent': 'Nor...","[{'index': 4, 'value': 'North America'}]","[{'hitNumber': 1, 'time': 0, 'hour': 15, 'minu...",334692759449,,Referral,Not Socially Engaged
3,,1,1477029466,1477029466,20161020,"{'visits': 1, 'hits': 11, 'pageviews': 8, 'tim...","{'referralPath': None, 'campaign': '(not set)'...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Oceania', 'subContinent': 'Aust...",[],"[{'hitNumber': 1, 'time': 0, 'hour': 22, 'minu...",10278554503158,,Organic Search,Not Socially Engaged
4,,1,1480578901,1480578901,20161130,"{'visits': 1, 'hits': 17, 'pageviews': 13, 'ti...","{'referralPath': None, 'campaign': '(not set)'...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Americas', 'subContinent': 'Sou...",[],"[{'hitNumber': 1, 'time': 0, 'hour': 23, 'minu...",20424342248747,,Organic Search,Not Socially Engaged


## Let's see one customer and his/her sessions

In [4]:
%%bigquery

SELECT
  *
FROM
  `data-to-insights.ecommerce.web_analytics`
WHERE
  fullVisitorId="5541178787326080377"
ORDER BY date

Query complete after 0.01s: 100%|██████████| 1/1 [00:00<00:00, 68.01query/s] 
Downloading: 100%|██████████| 21/21 [00:01<00:00, 13.85rows/s]


Unnamed: 0,visitorId,visitNumber,visitId,visitStartTime,date,totals,trafficSource,device,geoNetwork,customDimensions,hits,fullVisitorId,userId,channelGrouping,socialEngagementType
0,,1,1490471460,1490471460,20170325,"{'visits': 1, 'hits': 1, 'pageviews': 1, 'time...","{'referralPath': None, 'campaign': '(not set)'...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Europe', 'subContinent': 'Weste...","[{'index': 4, 'value': 'EMEA'}]","[{'hitNumber': 1, 'time': 0, 'hour': 12, 'minu...",5541178787326080377,,Organic Search,Not Socially Engaged
1,,2,1490821290,1490821290,20170329,"{'visits': 1, 'hits': 43, 'pageviews': 33, 'ti...","{'referralPath': None, 'campaign': '(not set)'...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Europe', 'subContinent': 'Weste...","[{'index': 4, 'value': 'EMEA'}]","[{'hitNumber': 1, 'time': 0, 'hour': 14, 'minu...",5541178787326080377,,Organic Search,Not Socially Engaged
2,,3,1492463313,1492463313,20170417,"{'visits': 1, 'hits': 8, 'pageviews': 6, 'time...","{'referralPath': None, 'campaign': '(not set)'...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Europe', 'subContinent': 'Weste...","[{'index': 4, 'value': 'EMEA'}]","[{'hitNumber': 1, 'time': 0, 'hour': 14, 'minu...",5541178787326080377,,Organic Search,Not Socially Engaged
3,,4,1493917093,1493917093,20170504,"{'visits': 1, 'hits': 8, 'pageviews': 6, 'time...","{'referralPath': None, 'campaign': '(not set)'...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Europe', 'subContinent': 'Weste...","[{'index': 4, 'value': 'EMEA'}]","[{'hitNumber': 1, 'time': 0, 'hour': 9, 'minut...",5541178787326080377,,Organic Search,Not Socially Engaged
4,,5,1493921698,1493921698,20170504,"{'visits': 1, 'hits': 5, 'pageviews': 3, 'time...","{'referralPath': None, 'campaign': '(not set)'...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Europe', 'subContinent': 'Weste...","[{'index': 4, 'value': 'EMEA'}]","[{'hitNumber': 1, 'time': 0, 'hour': 11, 'minu...",5541178787326080377,,Organic Search,Not Socially Engaged
5,,7,1494449721,1494449721,20170510,"{'visits': 1, 'hits': 17, 'pageviews': 13, 'ti...","{'referralPath': None, 'campaign': '(not set)'...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Europe', 'subContinent': 'Weste...","[{'index': 4, 'value': 'EMEA'}]","[{'hitNumber': 1, 'time': 0, 'hour': 13, 'minu...",5541178787326080377,,Organic Search,Not Socially Engaged
6,,6,1494445989,1494445989,20170510,"{'visits': 1, 'hits': 1, 'pageviews': 1, 'time...","{'referralPath': None, 'campaign': '(not set)'...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Europe', 'subContinent': 'Weste...","[{'index': 4, 'value': 'EMEA'}]","[{'hitNumber': 1, 'time': 0, 'hour': 12, 'minu...",5541178787326080377,,Organic Search,Not Socially Engaged
7,,8,1494607306,1494607306,20170512,"{'visits': 1, 'hits': 1, 'pageviews': 1, 'time...","{'referralPath': None, 'campaign': '(not set)'...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Europe', 'subContinent': 'Weste...","[{'index': 4, 'value': 'EMEA'}]","[{'hitNumber': 1, 'time': 0, 'hour': 9, 'minut...",5541178787326080377,,Organic Search,Not Socially Engaged
8,,9,1494690780,1494690780,20170513,"{'visits': 1, 'hits': 1, 'pageviews': 1, 'time...","{'referralPath': None, 'campaign': '(not set)'...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Europe', 'subContinent': 'Weste...","[{'index': 4, 'value': 'EMEA'}]","[{'hitNumber': 1, 'time': 0, 'hour': 8, 'minut...",5541178787326080377,,Organic Search,Not Socially Engaged
9,,10,1494797731,1494797731,20170514,"{'visits': 1, 'hits': 1, 'pageviews': 1, 'time...","{'referralPath': None, 'campaign': '(not set)'...","{'browser': 'Chrome', 'browserVersion': 'not a...","{'continent': 'Europe', 'subContinent': 'Weste...","[{'index': 4, 'value': 'EMEA'}]","[{'hitNumber': 1, 'time': 0, 'hour': 14, 'minu...",5541178787326080377,,Organic Search,Not Socially Engaged


### Out of the total visitors who visited our website, what % made a purchase?

In [3]:
%%bigquery df

WITH visitors AS(
SELECT
COUNT(DISTINCT fullVisitorId) AS total_visitors
FROM `data-to-insights.ecommerce.web_analytics`
),

purchasers AS(
SELECT
COUNT(DISTINCT fullVisitorId) AS total_purchasers
FROM `data-to-insights.ecommerce.web_analytics`
WHERE totals.transactions IS NOT NULL
)
SELECT
  total_visitors,
  total_purchasers,
  total_purchasers / total_visitors AS conversion_rate
FROM visitors, purchasers

Query complete after 0.01s: 100%|██████████| 1/1 [00:00<00:00, 93.66query/s]  
Downloading: 100%|██████████| 1/1 [00:01<00:00,  1.11s/rows]


In [4]:
df.head()

Unnamed: 0,total_visitors,total_purchasers,conversion_rate
0,741721,20015,0.026985


### What are the top 5 selling products?

In [6]:
%%bigquery

SELECT
  p.v2ProductName,
  p.v2ProductCategory,
  SUM(p.productQuantity) AS units_sold,
  ROUND(SUM(p.localProductRevenue/1000000),2) AS revenue
FROM
  `data-to-insights.ecommerce.web_analytics`,
  UNNEST(hits) AS h,  --To convert an ARRAY into a set of rows, also known as "flattening,
  UNNEST(h.product) AS p
GROUP BY
  1,
  2
ORDER BY
  revenue DESC
LIMIT
  5;

Query complete after 0.01s: 100%|██████████| 3/3 [00:00<00:00, 328.93query/s]                         
Downloading: 100%|██████████| 5/5 [00:01<00:00,  3.78rows/s]


Unnamed: 0,v2ProductName,v2ProductCategory,units_sold,revenue
0,Nest® Learning Thermostat 3rd Gen-USA - Stainl...,Nest-USA,17651,870976.95
1,Nest® Cam Outdoor Security Camera - USA,Nest-USA,16930,684034.55
2,Nest® Cam Indoor Security Camera - USA,Nest-USA,14155,548104.47
3,Nest® Protect Smoke + CO White Wired Alarm-USA,Nest-USA,6394,178937.6
4,Nest® Protect Smoke + CO White Battery Alarm-USA,Nest-USA,6340,178572.4


### How many visitors bought on subsequent visits to the website?

In [2]:
%%bigquery

WITH all_visitor_stats AS (

SELECT
  fullvisitorid,
  # 741,721 unique visitors
IF
  (COUNTIF(totals.transactions > 0 AND totals.newVisits IS NULL ) > 0, --booolean if they returned ang bought something -- Null if is not the first time.
    1,
    0) AS will_buy_on_return_visit -- to create a zero/one variable
FROM
  `data-to-insights.ecommerce.web_analytics`
GROUP BY
  fullvisitorid
)

SELECT
  COUNT(DISTINCT fullvisitorid) AS total_visitors,
  will_buy_on_return_visit
FROM all_visitor_stats
GROUP BY will_buy_on_return_visit

Query complete after 0.01s: 100%|██████████| 4/4 [00:00<00:00, 620.96query/s]                         
Downloading: 100%|██████████| 2/2 [00:01<00:00,  1.56rows/s]


Unnamed: 0,total_visitors,will_buy_on_return_visit
0,732524,0
1,9197,1


Analyzing the results, you can see that (11873 / 741721) = 1.6% of total visitors will return and purchase from the website. This includes the subset of visitors who bought on their very first session and then came back and bought again.

## Objective

### Business objective
Help to buy more! Specially for the ones that you are "almost sure" they are going to buy. They might need just a right incentive :)

### Data Science objective 
To predict whether or not a new user is likely to purchase in the future. <br>
The idea is that identifying these high-value users can help your marketing team to target them with special promotions and ad campaigns to ensure a conversion while they comparison shop between visits to your ecommerce site.

### Exploratory Data Analysis for Feature Engineering
Your team decides to test whether these two fields are good inputs for your classification model:

`totals.bounces` (whether the visitor left the website immediately)

`totals.timeOnSite` (how long the visitor was on our website)

While training a model on just these two fields is a start, you will see if they're good enough to produce an accurate model.

The value of building an ML model is to get the probability of future purchase based on the data gleaned about their first session.

In [6]:
%%bigquery

SELECT
  * EXCEPT(fullVisitorId)
FROM
  # Predictors (Xs) --give me how much time and if they bounced in the first visit
  (SELECT
    IFNULL(totals.timeOnSite, 0) AS time_on_site,
    IFNULL(totals.bounces, 0) AS bounces,
    fullVisitorId,
    
  FROM
    `data-to-insights.ecommerce.web_analytics`
  WHERE
    totals.newVisits = 1)
  JOIN
  (## Response (Ys) -- TRUE if they bought in a following session 
    SELECT
    fullvisitorid,
    IF(COUNTIF(totals.transactions > 0 AND totals.newVisits IS NULL) > 0, 1, 0) AS will_buy_on_return_visit
  FROM
      `data-to-insights.ecommerce.web_analytics`
  GROUP BY fullvisitorid)
  USING (fullVisitorId)
ORDER BY time_on_site DESC
LIMIT 10;

Query complete after 0.01s: 100%|██████████| 5/5 [00:00<00:00, 637.18query/s]                         
Downloading: 100%|██████████| 10/10 [00:01<00:00,  7.72rows/s]


Unnamed: 0,time_on_site,bounces,will_buy_on_return_visit
0,15047,0,0
1,12136,0,0
2,11201,0,0
3,10046,0,0
4,9974,0,0
5,9564,0,0
6,9520,0,0
7,9275,0,1
8,9138,0,0
9,8872,0,0


It's often too early to tell before training and evaluating the model, but at first glance out of the top 10 time_on_site, only 1 customer returned to buy, which isn't very promising

## Train a model

In [None]:
%%bigquery

CREATE SCHEMA ml_reco_system
OPTIONS(
 )

In [None]:
%%bigquery

CREATE OR REPLACE MODEL
  `ml_reco_system.classification_model` OPTIONS 
  ( model_type='logistic_reg',
    labels = ['will_buy_on_return_visit'] ) AS

SELECT
  * EXCEPT(fullVisitorId)
FROM
  (
  SELECT
    fullVisitorId,
    IFNULL(totals.bounces,
      0) AS bounces,
    IFNULL(totals.timeOnSite,
      0) AS time_on_site
  FROM
    `data-to-insights.ecommerce.web_analytics`
  WHERE
    totals.newVisits = 1
    AND date BETWEEN '20160801'
    AND '20170430') # train on first 9 months
JOIN (
  SELECT
    fullvisitorid,
  IF
    (COUNTIF(totals.transactions > 0
        AND totals.newVisits IS NULL) > 0,
      1,
      0) AS will_buy_on_return_visit
  FROM
    `data-to-insights.ecommerce.web_analytics`
  GROUP BY
    fullvisitorid)
USING
  (fullVisitorId) ;

### Select your performance criteria
For classification problems in ML, you want to minimize the False Positive Rate (predict that the user will return and purchase and they don't) and maximize the True Positive Rate (predict that the user will return and purchase and they do).

This relationship is visualized with a ROC (Receiver Operating Characteristic) curve like the one shown here, where you try to maximize the area under the curve or AUC:

In [8]:
%%bigquery

SELECT
  roc_auc,
  CASE
    WHEN roc_auc > .9 THEN 'good'
    WHEN roc_auc > .8 THEN 'fair'
    WHEN roc_auc > .7 THEN 'decent'
    WHEN roc_auc > .6 THEN 'not great'
  ELSE 'poor' END AS model_quality
FROM
  ML.EVALUATE(MODEL ml_reco_system.classification_model,  (
SELECT
  * EXCEPT(fullVisitorId)
FROM
  # features
  (SELECT
    fullVisitorId,
    IFNULL(totals.bounces, 0) AS bounces,
    IFNULL(totals.timeOnSite, 0) AS time_on_site
  FROM
    `data-to-insights.ecommerce.web_analytics`
  WHERE
    totals.newVisits = 1
    AND date BETWEEN '20170501' AND '20170630') # eval on 2 months
  JOIN
  (SELECT
    fullvisitorid,
    IF(COUNTIF(totals.transactions > 0 AND totals.newVisits IS NULL) > 0, 1, 0) AS will_buy_on_return_visit
  FROM
      `data-to-insights.ecommerce.web_analytics`
  GROUP BY fullvisitorid)
  USING (fullVisitorId)
));

Query complete after 0.01s: 100%|██████████| 9/9 [00:00<00:00, 793.94query/s]                         
Downloading: 100%|██████████| 1/1 [00:01<00:00,  1.47s/rows]


Unnamed: 0,roc_auc,model_quality
0,0.723861,decent


## Improve model performance with Feature Engineering

- How far the visitor got in the checkout process on their first visit
- Where the visitor came from (traffic source: organic search, referring site etc..)
- Device category (mobile, tablet, desktop)
- Geographic information (country)

In [None]:
%%bigquery
CREATE OR REPLACE MODEL `ecommerce.classification_model_2`
OPTIONS
  (model_type='logistic_reg', labels = ['will_buy_on_return_visit']) AS
WITH all_visitor_stats AS (
SELECT
  fullvisitorid,
  IF(COUNTIF(totals.transactions > 0 AND totals.newVisits IS NULL) > 0, 1, 0) AS will_buy_on_return_visit
  FROM `data-to-insights.ecommerce.web_analytics`
  GROUP BY fullvisitorid
)
# add in new features
SELECT * EXCEPT(unique_session_id) FROM (
  SELECT
      CONCAT(fullvisitorid, CAST(visitId AS STRING)) AS unique_session_id,
      # labels
      will_buy_on_return_visit,
      MAX(CAST(h.eCommerceAction.action_type AS INT64)) AS latest_ecommerce_progress,
      # behavior on the site
      IFNULL(totals.bounces, 0) AS bounces,
      IFNULL(totals.timeOnSite, 0) AS time_on_site,
      IFNULL(totals.pageviews, 0) AS pageviews,
      # where the visitor came from
      trafficSource.source,
      trafficSource.medium,
      channelGrouping,
      # mobile or desktop
      device.deviceCategory,
      # geographic
      IFNULL(geoNetwork.country, "") AS country
  FROM `data-to-insights.ecommerce.web_analytics`,
     UNNEST(hits) AS h
    JOIN all_visitor_stats USING(fullvisitorid)
  WHERE 1=1
    # only predict for new visits
    AND totals.newVisits = 1
    AND date BETWEEN '20160801' AND '20170430' # train 9 months
  GROUP BY
  unique_session_id,
  will_buy_on_return_visit,
  bounces,
  time_on_site,
  totals.pageviews,
  trafficSource.source,
  trafficSource.medium,
  channelGrouping,
  device.deviceCategory,
  country
);

In [10]:
%%bigquery
#standardSQL
SELECT
  roc_auc,
  CASE
    WHEN roc_auc > .9 THEN 'good'
    WHEN roc_auc > .8 THEN 'fair'
    WHEN roc_auc > .7 THEN 'decent'
    WHEN roc_auc > .6 THEN 'not great'
  ELSE 'poor' END AS model_quality
FROM
  ML.EVALUATE(MODEL ml_reco_system.classification_model_2,  (
WITH all_visitor_stats AS (
SELECT
  fullvisitorid,
  IF(COUNTIF(totals.transactions > 0 AND totals.newVisits IS NULL) > 0, 1, 0) AS will_buy_on_return_visit
  FROM `data-to-insights.ecommerce.web_analytics`
  GROUP BY fullvisitorid
)
# add in new features
SELECT * EXCEPT(unique_session_id) FROM (
  SELECT
      CONCAT(fullvisitorid, CAST(visitId AS STRING)) AS unique_session_id,
      # labels
      will_buy_on_return_visit,
      MAX(CAST(h.eCommerceAction.action_type AS INT64)) AS latest_ecommerce_progress,
      # behavior on the site
      IFNULL(totals.bounces, 0) AS bounces,
      IFNULL(totals.timeOnSite, 0) AS time_on_site,
      totals.pageviews,
      # where the visitor came from
      trafficSource.source,
      trafficSource.medium,
      channelGrouping,
      # mobile or desktop
      device.deviceCategory,
      # geographic
      IFNULL(geoNetwork.country, "") AS country
  FROM `data-to-insights.ecommerce.web_analytics`,
     UNNEST(hits) AS h
    JOIN all_visitor_stats USING(fullvisitorid)
  WHERE 1=1
    # only predict for new visits
    AND totals.newVisits = 1
    AND date BETWEEN '20170501' AND '20170630' # eval 2 months
  GROUP BY
  unique_session_id,
  will_buy_on_return_visit,
  bounces,
  time_on_site,
  totals.pageviews,
  trafficSource.source,
  trafficSource.medium,
  channelGrouping,
  device.deviceCategory,
  country
)
));

Query complete after 0.01s: 100%|██████████| 10/10 [00:00<00:00, 1073.95query/s]                       
Downloading: 100%|██████████| 1/1 [00:01<00:00,  1.59s/rows]


Unnamed: 0,roc_auc,model_quality
0,0.909488,good


In [None]:
SELECT
*
FROM
  ml.PREDICT(MODEL `ml_reco_system.classification_model_2`,
   (
WITH all_visitor_stats AS (
SELECT
  fullvisitorid,
  IF(COUNTIF(totals.transactions > 0 AND totals.newVisits IS NULL) > 0, 1, 0) AS will_buy_on_return_visit
  FROM `data-to-insights.ecommerce.web_analytics`
  GROUP BY fullvisitorid
)
  SELECT
      CONCAT(fullvisitorid, '-',CAST(visitId AS STRING)) AS unique_session_id,
      # labels
      will_buy_on_return_visit,
      MAX(CAST(h.eCommerceAction.action_type AS INT64)) AS latest_ecommerce_progress,
      # behavior on the site
      IFNULL(totals.bounces, 0) AS bounces,
      IFNULL(totals.timeOnSite, 0) AS time_on_site,
      totals.pageviews,
      # where the visitor came from
      trafficSource.source,
      trafficSource.medium,
      channelGrouping,
      # mobile or desktop
      device.deviceCategory,
      # geographic
      IFNULL(geoNetwork.country, "") AS country
  FROM `data-to-insights.ecommerce.web_analytics`,
     UNNEST(hits) AS h
    JOIN all_visitor_stats USING(fullvisitorid)
  WHERE
    # only predict for new visits
    totals.newVisits = 1
    AND date BETWEEN '20170701' AND '20170801' # test 1 month
  GROUP BY
  unique_session_id,
  will_buy_on_return_visit,
  bounces,
  time_on_site,
  totals.pageviews,
  trafficSource.source,
  trafficSource.medium,
  channelGrouping,
  device.deviceCategory,
  country
)
)
ORDER BY
  predicted_will_buy_on_return_visit DESC;


## Predictions!
- **predicted_will_buy_on_return_visit:** whether the model thinks the visitor will buy later (1 = yes)
- **predicted_will_buy_on_return_visit_probs.label** the binary classifier for yes / no
- **predicted_will_buy_on_return_visit.probs.prob:** the confidence the model has in it's prediction (1 = 100%)

## Results
Of the top 6% of first-time visitors (sorted in decreasing order of predicted probability), more than 6% make a purchase in a later visit.

These users represent nearly 50% of all first-time visitors who make a purchase in a later visit.

Overall, only 0.7% of first-time visitors make a purchase in a later visit.

Targeting the top 6% of first-time increases marketing ROI by 9x vs targeting them all!



## Additional information
Tip: add warm_start = true to your model options if you are retraining new data on an existing model for faster training times. Note that you cannot change the feature columns (this would necessitate a new model).

roc_auc is just one of the performance metrics available during model evaluation. Also available are accuracy, precision, and recall. Knowing which performance metric to rely on is highly dependent on what your overall objective or goal is.