# AB Test Case Study

### Applying what you have learned in a real Case Study
In this lesson, you will be led through the design process including :

* Building a funnel
* Deciding on your metrics,
* Thinking about the size and duration of your test
* Analyzing your collected data with inferential statistics
* Drawing conclusions about the experiment

### Learning Objectives
By the end of this lesson, you will be able to

* Describe how A/B testing works and identify its limitations
* Recognize and handle bias
* Apply multiple comparison techniques
* Analyze the results of an experiment

## Scenario Description

Let's say that you're working for a fictional productivity software company that is looking for ways to increase the number of people who pay for their software. The way that the software is currently set up, users can download and use the software free of charge, for a 7-day trial. After the end of the trial, users are required to pay for a license to continue using the software.

One idea that the company wants to try is to change the layout of the homepage to emphasize more prominently and higher up on the page that there is a 7-day trial available for the company's software. The current fear is that some potential users are missing out on using the software because of a lack of awareness of the trial period. If more people download the software and use it in the trial period, the hope is that this entices more people to make a purchase after seeing what the software can do.


## Building a Funnel
Before we do anything else, the first thing we should do is specify the objective or goal of our study:

> Revising the structure of the homepage will increase the number of people
that download the software and, ultimately, the number of people that purchase
a license.

Now, we should think about the activities that a user will take on the site that are relevant to measuring our objective. This path or funnel will help us figure out how we will create experimental condition groups and which metrics we'll need to track to measure the experiment's effect. To help you construct the funnel, here's some information about the way the company's website is structured, and how the software induces users to purchase a license.

The company's website has five main sections:

1. the homepage;
1. a section with additional information, gallery, and examples;
2. a page for users to download the software;
3. a page for users to purchase a license; and
4. a support sub-site with documentation and FAQs for the software.

For the software itself, the website requires that users create an account in order to download the software program. The program is usable freely for seven days after download. When the trial period is hit, the program will bring up a dialog box that takes the user to the license page. After purchasing a license, the user will receive a unique code associated with their site account. This code can then be used with the program to register it with that user, and the program can be used thereafter without issue.

## Building a Funnel Solution

> What steps do you expect typical visitors to take from their initial visit to the webpage through purchasing a license for continued use of the program? Are there any 'typical' steps that certain visitors might not take?

A straightforward flow might include the following steps:

* Visit homepage
* Visit download page
* Sign up for an account
* Download software
* After the 7-day trial, the software takes the user to a license-purchase page
* Purchase license

Note that it is possible for the visitor to drop from the flow after each step, forming a funnel. There might be additional steps that a user might take between visiting the homepage and visiting the download page that isn't accounted for in the above flow. For example, someone might want to check out the additional informational pages before visiting the download page, or even visit the license purchase page to check the license price before even deciding to download. Considering the amount of browsing that a visitor could perform on the page, it might be simplest just to track whether or not a user gets to the download page at some point, without worrying about the many paths that they could have taken to get there.


> Consider the webpage as a whole. What kinds of events might occur outside of the expected flow for the experiment that might interfere with measuring the effects of our manipulation?

There are a few events in the expected flow that might not correspond with the visitors we want to target. For example, there might be users on the homepage who aren't new users. Users who already have a license might just be visiting the homepage as a way to access the support sub-site. A user who wants to buy a license might also come into the license page through the homepage, rather than directly from the software.

When it comes to license purchasing, it's possible that users don't come back after exactly seven days. Some users might come back early and make their purchase during their trial period. Alternatively, a user might end up taking more than seven days to decide to make their purchase, coming back days after the end of the trial. Anticipating scenarios like this can be useful for planning the design, and coming up with metrics that come as close as possible to measuring desired effects.




## Deciding on Metrics
From our user funnel, we should consider two things:

Where and how we should split users into experiment groups
What metrics we will use to track the success or failure of the experimental manipulation.
The choice of unit of diversion (the point at which we divide observations into groups) may affect what metrics we can use, and whether the metrics we record should be considered invariant or evaluation metrics. To start, decide on a unit of diversion and brainstorm some ideas for metrics to capture.

To be clear, the overall plan is to test the effect of the new homepage using a true experiment; in particular, we'll be using an A/B testing framework. This means that prospective users should be split into two groups. The control, or 'A' group, will see the old homepage, while the experimental, or 'B' group, will see the new homepage that emphasizes the 7-day trial.

### Categories of diversion
Three main categories of diversion were presented in the course: event-based diversion, cookie-based diversion, and account-based diversion.

* An **event-based** diversion (like a pageview) can provide many observations to draw conclusions from but doesn't quite hit the mark for this case. If the condition changes on each pageview, then a visitor might get a different experience on each homepage visit. Event-based diversion is much better when the changes aren't as easily visible to users, to avoid disruption of experience. In addition, page view-based diversion would let us know how many times the download page was accessed from each condition, but can't go any further in tracking how many actual downloads were generated from each condition.
* Diverting based on **account or user ID** can be stable, but it's not the right choice in this case. Since visitors only register after getting to the download page, this is too late to introduce the new homepage to people who should be assigned to the experimental condition.
* So this leaves the consideration of **cookie-based** diversion, which feels like the right choice. We can assign a cookie to each visitor upon their first-page hit, which allows them to be separated into the control and experimental groups. Cookies also allow tracking of each visitor hitting each page, recording whether or not they eventually hit the download page and then whether or not they actually register an account and perform the download. That's not to say that the cookie-based diversion is perfect. The usual cookie-based diversion issues apply: we can get some inconsistency in counts if users enter the site via an incognito window, different browsers, or cookies that expire or get deleted before they make a download. This kind of assignment 'dilution' could dampen the true effect of our experimental manipulation. As a simplification, however, we'll assume that this kind of assignment dilution will be small, and ignore its potential effects.

> A cookie-based diversion seems best in this case for dividing visitors into experimental groups since we can split visitors on their initial visit and it's fairly reliable for tracking.

### Key metrics
In terms of metrics, we might want to keep track of the number of cookies that are recorded in different parts of the website. In particular, the number of cookies on the homepage, download page, and account registration page (in order to actually make the download) could prove useful. We can track the number of licenses purchased through the user accounts, each of which can be linked back to a particular condition. Though it hasn't been specified, it's also possible that the software includes usage statistics that we could track.

The above metrics are all based on absolute counts. We could instead perform our analysis on ratios of those counts. For example, we could be interested in the proportion of downloads out of all homepage visits. License purchases could be stated as a ratio against the number of registered users (downloads) or the original number of cookies.

## Selecting Invariant and Evaluation Metrics

Below, you will decide for each of the proposed metrics whether or not you would want to use them as an invariant metric or an evaluation metric. Remember

* An **invariant metric** is an objective measure that you should expect will not vary between conditions and that indicates equivalence between groups.
* **Evaluation metrics**, on the other hand, represent measures where you expect there will be differences between groups, and whose differences should say something meaningful about your experimental manipulation.

The one invariant metric that stands out is the number of cookies that hit the homepage. The two evaluation metrics are a bit trickier, but taking a ratio of the number of downloads and licenses to the number of cookies makes the most sense to us.

#### Invariant metric
There's one invariant metric that really stands out here, and that's the number of cookies that hit the homepage. If we've done things correctly, each visitor should have an equal chance of seeing each homepage, and that means that the number of cookies assigned to each group should be about the same. Since visitors come in without any additional information (e.g. account info) and the change affected by the experimental manipulation comes in right at the start, there aren't other invariant metrics we should worry about.

#### Evaluation metrics
Selecting evaluation metrics is a trickier proposition. Count-based metrics at other parts of the process seem like natural choices: the number of times the software was downloaded and the number of licenses purchased are exactly what we want to change with the new homepage. The issue is that even though we expect the number of cookies assigned to each group to be about the same, it's much more likely than not they won't be exactly the same. Instead, we should prefer using the download rate (# downloads / # cookies) and purchase rate (# licenses / # cookies) relative to the number of cookies as evaluation metrics. Using these ratios allows us to account for slight imbalances between groups.

As for the other proposed metrics, the ratio between the number of licenses and number of downloads is potentially interesting, but not as direct as the other two ratios discussed above. It's possible that the manipulation increases both the number of downloads and the number of licenses, but increases the former to a much higher rate. In this case, the licenses-to-downloads ratio might be worse off for the new homepage compared to the old, even though the new homepage has our desired effects. There's no such inconsistency issue with the ratios that use the number of cookies in the denominator.

Product usage statistics like the average time the software was used in the trial period are potentially interesting features but aren't directly related to our experiment. We might not have a strong feeling about what kind of effect the homepage will have on people that actually download the software. Stated differently, product usage isn't a direct target of homepage manipulation. Certainly, these statistics might help us dig deeper into the reasons for observed effects after an experiment is complete. They might even point toward future changes and experiments to conduct. But in terms of experiment success, product usage shouldn't be considered an invariant or evaluation metric.


## Experiment Sizing
Now that we have our main metrics selected: number of cookies as an invariant metric, and the download rate and license purchase rate (relative to the number of cookies) as evaluation metrics, we should take a look at the feasibility of the experiment in terms of the amount of time it will take to run. We can use historical data as a baseline to see what it might take to detect our desired levels of change.

Recent history shows that there are about 3250 unique visitors per day, with slightly more visitors on Friday through Monday than the rest of the week. There are about 520 software downloads per day (a .16 rate) and about 65 licenses purchased each day (a .02 rate). In an ideal case, both the download rate and license purchase rate should increase with the new homepage; a statistically significant negative change should be a sign to not deploy the homepage change. However, if only one of our metrics shows a statistically significant positive change we should be happy enough to deploy the new homepage.

Since we're willing to deploy the homepage with an increase in only one of our two metrics (download rate, license purchase rate), we need to apply the Bonferroni correction to avoid making too many false positives due to multiple testing.

In [3]:
# example of using statsmodels for sample size calculation
import numpy as np
from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.proportion import proportion_effectsize

# leave out the "nobs" parameter to solve for it
users_per_group = NormalIndPower().solve_power(effect_size = proportion_effectsize(.175, .16), 
                                               alpha = .025, 
                                               power = 0.8,
                                               alternative = 'larger')
users_per_day = 3250
groups = 2

print(f'days: {np.ceil(users_per_group/users_per_day * groups)}')

days: 6.0


## Validity, Bias, and Ethics Exercise Solution


Before getting to the data and its analysis, let's review a few of the conceptual points that go into the creation of an experiment: validity, bias, and ethics.

* We probably don't have too much to worry about in terms of **validity**. For conceptual validity, the evaluation metrics are directly aligned with the experimental goals; no abstraction is needed. Internal validity is maintained by performing an experiment with properly handled randomization and controls. We don't really need to answer to external validity since we're drawing from the full site population, and there's no other population we're looking to generalize to.

* As for **biases**, we might think of novelty bias as being a potential issue. However, we don't expect users to come back to the homepage regularly. Downloading and license purchasing are actions we expect to only occur once per user, so there's no real 'return rate' to worry about. One possibility, however, is that if more people download the software under the new homepage, the expanded user base is qualitatively different from the people who came to the page under the original homepage. This might cause more homepage hits from people looking for the support pages on the site, causing the number of unique cookies under each condition to differ. If we do see something wrong or out of place in the invariant metric (number of cookies), then this might be an area to explore in further investigations.

* Finally, for **ethical** issues, the changes to the homepage should be benign and present no risk to users. Our experiment objectives are also clearly stated. Considering the low risks of the experiment, informed consent is at worst a minor concern; a standard popup to let visitors know that cookies are used to track user experience on the site will likely suffice. The largest ethics principle we should be concerned about is data sensitivity. We shouldn't get any sensitive data out of the cookie assignment and collection, though some information will be collected from the user when they go to download the software. No sensitive data is required for the metrics we've laid out, so what we should do is just aggregate daily visits, downloads, and purchase counts without looking at any individual outcomes.