In [None]:
import numpy as np
from scipy.stats import uniform, poisson, geom, ttest_ind

## Problem

An app with microtransactions offers a free but limited tier. The vast majority of customers do not spend any money on the app, a small group spends a small amount, and another, even smaller group consists of very heavy spenders. Let's suppose all spending is integer valued.

We can model the pmf of this as follows:

$$p(x) = p_1 \cdot 1_{\{x=0\}} + p_2 \cdot g_2(x) + p_3 \cdot g_3(x)$$

where $p_1 + p_2 + p_3 = 1$, $g_2$ is the pmf of a poisson distribution with parameter $\lambda$, and $g_3$ is the pmf of $5+Y$ where $Y\sim geo(q)$.

Currently, the parameters here are:

$$p_1 = 0.98, p_2=0.019, p_3 = .001, \lambda=3, q=0.1$$

We are never able to observed what group a person is in, just their revenue.

Product management has proposed a change in the conversion funnel that they believe will increase conversions, particularly amongst the second group. They want to perform an A/B test of this with $n$ customers. As the distribution of revenues is quite skewed, they wish to explicitly compute the power of this test in the event that the test group has the same values of $\lambda$ and $q$, but $p_1 = 0.975, p_2=0.024, p_3=.001$ (i.e. the probability of getting a significant result if this particular alternative is true). Assume they are doing a one-sided two-sample t-test with level $\alpha=0.01$. 

Write the following functions.

- generate_counterfactuals(n, lam, q) which returns an n x 2 array, where the first column is a poisson random variable with parameter lam for each person, and the second column is a geometric random variable with parameter q for each person.

- generate_revenues(n, lam, q, p) which calls generate_counterfactuals, and then generates a revenue for each individual. 

- run_experiment(n, lam, q, p1, p2) which creates the two test groups of size n each, calculates their mean revenue, and performs a two sample t-test on them. It should return a 1 if we reject the null hypothesis that the groups have the same, and 0 if otherwise. Let p1 be the mixing parameters for the control group and p2 be the mixing parameters for experimental group. For the t-test use scipy.stats.ttest_ind, do not assume equal variance between the two groups.

- calc_power(n, m, lam, q, p1, p2) which runs the experiment m times to calculate the power. 

In [None]:
def generate_counterfactuals(n, lam, q):
    X = np.zeros((n, 2))
    X[:,0] = poisson.rvs(lam, size=n)
    X[:,1] = 5 + geom.rvs(q, size=n)
    return X
    
def generate_revenues(n, lam, q, p):
    X = generate_counterfactuals(n, lam, q)
    Y = np.zeros(n)
    U = uniform.rvs(size=n)
    ind_1 = U < p[1]
    ind_2 = U < p[1] + p[2]
    Y[ind_1] = X[ind_1, 0]
    Y[~ind_1 & ind_2] = X[~ind_1 & ind_2, 1]
    return Y
    
def run_experiment(n, lam, q, p1, p2):
    grp1 = generate_revenues(n, lam, q, p1)
    grp2 = generate_revenues(n, lam, q, p2)
    result = ttest_ind(grp1, grp2, alternative='less', equal_var=False)
    if result.pvalue < 0.01:
        return 1
    else:
        return 0

def calc_power(n, m, lam, q, p1, p2):
    X = [run_experiment(n, lam, q, p1, p2) for i in range(m)]
    return np.mean(X)