4. Let’s assume an ​unengaged​ user is a churned user. Now suppose we use your model to identify unengaged users and implement some business actions try to convert them to engaged users (commonly known as reducing churn)     
a. How would you set up a test/experiment to check whether we are actually reducing churn?      
b. What metrics and techniques would you use to assess the impact of the business action?

## Intro notes

Supose that like in the last period of data that I've used to build model I've identify 13952 unengaged user.     
I know from previouse periods analysis that around 9% of them will convert to engaged users with no extra acions.    

To check if business action is working I'm going to use A/B test method.    

1. First I'll try to identify a number of users (from my population) that are not included in any tests or special actions.

2. Then I'll perform a simiulation to select sample size and check all asumptions. 

3. Then run experiment and apply my conclusion.



# Identify users that are not involved in any other test/experiment

Suppose that there are **6000** of users that are not involved in any other experiments.

# Experiment simulation

**Experiment setup and initial assumptions**:      
* method - A/B testing, 
* sample size (initially) 2000: 1000 randomly selected as champion (no action), 1000 randomly selected as challenger (applied business action), 
* H0: null hypothesis - engagement rates for the champion and challenger are equal, business action has no effect, 
* H1: alternative hypothesis - engagement rate for challenger is bigger than for challenger, business action has a positive effect. 
* significance level for the experiment - 5%, which is 95% of confidence for alternative hypothesis.   
* minimum detectable effect - engagement rate will be higher at least + 2% point (which is ~20% rise),


### a class which performing an experiments 

In [1]:
import math
import scipy.stats


class Experiment:

    def __init__(self,
                 significance_lvl: float,
                 champion_size_A: int,
                 challenger_size_B: int,
                 expected_champion_A: float,
                 minimum_detectable_effect: float):
        """Performs AB test hipotetical experiment

        :param significance_lvl: Significance level for the experiment
        :type significance_lvl: float
        :param champion_size_A: Champion size A
        :type champion_size_A: int
        :param challenger_size_B: Challenger size B
        :type challenger_size_B: int
        :param expected_champion_A: Expected champion rate (A)
        :type expected_champion_A: float
        :param minimum_detectable_effect: Minimum detectable effect
            (expected_champion_A + minimum_detectable_effect)
        :type minimum_detectable_effect: float
        """

        self.significance_lvl = significance_lvl
        self.champion_size_A = champion_size_A
        self.challenger_size_B = challenger_size_B
        self.expected_champion_A = expected_champion_A
        self.minimum_detectable_effect = minimum_detectable_effect

    def perform(self):
        """Perform calculations"""
        # calculate expected challenger rate by addind expected increase 
        # at challenger to champion rate 
        self.expected_challenger_B = self.expected_champion_A + \
            self.minimum_detectable_effect
        # calculate standard error wchich is equivalent to the standard 
        # deviation of the sampling
        self.StandardErrorA = math.sqrt(
            self.expected_champion_A*(1-self.expected_champion_A) /
            self.champion_size_A)
        self.StandardErrorB = math.sqrt(
            self.expected_challenger_B*(1-self.expected_challenger_B) /
            self.challenger_size_B)
        # calculate z statistic 
        self.z = abs(self.expected_challenger_B-self.expected_champion_A) /\
            math.sqrt(self.StandardErrorA**2+self.StandardErrorB**2)
        # calculate p-value - the probability of gettin 
        self.p_value = scipy.stats.norm(0, 1).cdf(self.z)

    def report(self):
        """Print raport"""

        print("Significance level for the experiment - %0.2f" % \
              self.significance_lvl)
        print("Champion size A - %d" % self.champion_size_A)
        print("Challenger size B - %d" % self.challenger_size_B)
        print("Expected champion rate (A) - %0.2f" % self.expected_champion_A)
        print("Minimum detectable effect on challenger is + %0.2f -> expected "
              "challenger rate (B) %0.2f" % (self.minimum_detectable_effect,
                                             self.expected_challenger_B))
        print("StandardErrorA %0.4f (The standard deviation of the sampling) "
              "distribution)" % self.StandardErrorA)
        print("StandardErrorB %0.4f (The standard deviation of the sampling) "
              "distribution)" % self.StandardErrorB)
        print("Z statistic %0.6f" % self.z)
        print("P-value %0.3f" % self.p_value)

        param = {
            "expected_challenger_B": self.expected_challenger_B,
            "expected_champion_A": self.expected_champion_A,
            "p_value": self.p_value,
            "significance_lvl": self.significance_lvl
        }
        if self.p_value > (1-self.significance_lvl):
            print("Challenger made {expected_challenger_B} engagement rate, "
                  "champion {expected_champion_A}.\n"
                  "There are {p_value:.0%} certainty that engagement rate "
                  "differs between champion and challenger\n"
                  "Test is statistically significant at level "
                  "{significance_lvl:.0%}".format(**param))
        else:
            print("Test is no evidence to claim that champion and "
                  "challenger has different engagement rate "
                  "(at significance level {:.0%}.".\
                  format(self.significance_lvl))


In [2]:
experiment_1_test = Experiment(
    significance_lvl = 0.05, 
    champion_size_A = 1000, 
    challenger_size_B = 1000, 
    expected_champion_A = 0.09, 
    minimum_detectable_effect = 0.02)
experiment_1_test.perform()
experiment_1_test.report()

Significance level for the experiment - 0.05
Champion size A - 1000
Challenger size B - 1000
Expected champion rate (A) - 0.09
Minimum detectable effect on challenger is + 0.02 -> expected challenger rate (B) 0.11
StandardErrorA 0.0090 (The standard deviation of the sampling) distribution)
StandardErrorB 0.0099 (The standard deviation of the sampling) distribution)
Z statistic 1.491541
P-value 0.932
Test is no evidence to claim that champion and challenger has different engagement rate (at significance level 5%.


**Since my sample is too small to "observe" the effect that I consider as as success (+2%) at confidence 5% I need to increase the sample.**

In [3]:
experiment_2_test = Experiment(
    significance_lvl = 0.05, 
    champion_size_A = 2000, 
    challenger_size_B = 2000, 
    expected_champion_A = 0.09, 
    minimum_detectable_effect = 0.02)
experiment_2_test.perform()
experiment_2_test.report()

Significance level for the experiment - 0.05
Champion size A - 2000
Challenger size B - 2000
Expected champion rate (A) - 0.09
Minimum detectable effect on challenger is + 0.02 -> expected challenger rate (B) 0.11
StandardErrorA 0.0064 (The standard deviation of the sampling) distribution)
StandardErrorB 0.0070 (The standard deviation of the sampling) distribution)
Z statistic 2.109357
P-value 0.983
Challenger made 0.11 engagement rate, champion 0.09.
There are 98% certainty that engagement rate differs between champion and challenger
Test is statistically significant at level 5%


**I don't know what exactly will be my result so I increase samples up to 2000 (from 1300 required to pass test with assumed values)**

# Experiment run

After I choose and check all parameters required to responsibly run AB test, I start my experiment.


## scenario 1 - success


**In first theoretical scenario, challenger engagement rate is 0.1 and challenger engagement rate is 0.05 higher (0.15).**
With samples 2 x 2000 of users there is enough evidence to claim that there is statistically significant different between sub populations, and business actions has meaningful effects on customer engagement.   
**Now we can apply business action to all customers.**


In [4]:
experiment_1_prod = Experiment(
    significance_lvl = 0.05, 
    champion_size_A = 2000, 
    challenger_size_B = 2000, 
    expected_champion_A = 0.1, 
    minimum_detectable_effect = 0.05)
experiment_1_prod.perform()
experiment_1_prod.report()

Significance level for the experiment - 0.05
Champion size A - 2000
Challenger size B - 2000
Expected champion rate (A) - 0.10
Minimum detectable effect on challenger is + 0.05 -> expected challenger rate (B) 0.15
StandardErrorA 0.0067 (The standard deviation of the sampling) distribution)
StandardErrorB 0.0080 (The standard deviation of the sampling) distribution)
Z statistic 4.794633
P-value 1.000
Challenger made 0.15000000000000002 engagement rate, champion 0.1.
There are 100% certainty that engagement rate differs between champion and challenger
Test is statistically significant at level 5%


## scenario 2 -  no success

**In first theoretical scenario, challenger engagement rate is 0.1 and challenger engagement rate is 0.01 higher (0.11).**
With samples 2 x 2000 of users there is **no** enough evidence to claim that there is statistically significant different between sub populations, and business actions has meaningful effects on customer engagement - at desired significance level.   
**We and lower the significance level and accept the higher risk that 1% increase will not be observed at whole population, increase a sample size or try different business actions.**


In [5]:
experiment_2_prod = Experiment(
    significance_lvl = 0.05, 
    champion_size_A = 2000, 
    challenger_size_B = 2000, 
    expected_champion_A = 0.1, 
    minimum_detectable_effect = 0.01)
experiment_2_prod.perform()
experiment_2_prod.report()

Significance level for the experiment - 0.05
Champion size A - 2000
Challenger size B - 2000
Expected champion rate (A) - 0.10
Minimum detectable effect on challenger is + 0.01 -> expected challenger rate (B) 0.11
StandardErrorA 0.0067 (The standard deviation of the sampling) distribution)
StandardErrorB 0.0070 (The standard deviation of the sampling) distribution)
Z statistic 1.031696
P-value 0.849
Test is no evidence to claim that champion and challenger has different engagement rate (at significance level 5%.
