
# PARLA

## Problem
How many mutually independent experiments can be run simultaneously if 10,000 observations are collected during the experiment?

If we decide to run 10 experiments, then each experiment can be allocated 1,000 observations, meaning the group size would be 500.

**Experiment parameters:**
* We are testing the hypothesis of equal means;
* Significance level — 0.05;
* Acceptable probability of Type II error — 0.1;
* Expected effect — a 3% increase in values;
* The method for adding the effect in synthetic A/B tests — multiplication by a constant.

We will assume that the distribution of the measured values follows a **normal distribution** with a **mean of 100** and a **standard deviation of 10**.

As your answer, enter the **maximum possible number of experiments** that can be run with the parameters specified above.

## Action
- calculated minimal required sample size for a single AB-test
- calculated maximal amount of experiments that can run simultaneously
- used multiple AA-tests to empirically assess the probability of type I error
- used multiple AB-tests to empirically assess the probability of type II error
- calculated confidence intervals for errors:
    - each iteration of AA-test and AB-test has a binary outcome ('True' for error, 'False' for no error)
    - therefore, it is a Bernoulli random variable
    - since each outcome is a Bernoulli random variable, their sum follows a Binomial distribution
    - therefore, Bernoulli-proportion 'n_errors / n' (i.e. 'mean') should approximately follow Normal distribution
    - therefore, I calculated confidence intervals, using asymptotic normal approximation

## Result
- Successfully calculated maximum possible number of experiments
- Successfully assessed whether probabilities of type-I and type-II errors are controlled
- Successfully calculated confidence interval for probabilities of type-I and type-II errors

## Learning
- I revised relevant Python, Numpy, Scipy, and StatsModels functionality
- I learned how to calculate maximum possible number of experiments given a number of observations collected
- I learned how to assess whether probabilities of type-I and type-II errors are controlled
- I learned how to calculate confidence intervals for probabilities of type-I and type-II errors, using normal approximation

## Application
- I can apply relevant Python, Numpy, Scipy, and StatsModels functionality for similar data-related problems
- I can calculate number of parallel experiments using real-world data
- I can assess whether probabilities of type-I and type-II errors are controlled using real-world data
- I can assess confidence intervals for errors


In [2]:

import numpy as np
import scipy as sp
import statsmodels.api as sm


In [3]:

alpha = 0.05  # probability of type-I error
beta = 0.1  # probability of type-II error
confidence_level = 1 - alpha  # probability of correctly NOT REJECTING null hypothesis
power = 1 - beta  # probability of correctly REJECTING null hypothesis
effect = 0.03  # expected effect size
mu = 100  # mean of normal distribution in synthetic tests
sigma = 10  # standard deviation of normal distribution in synthetic test
population_size = 10 ** 4  # total number of observations


def get_minimal_sample_size(effect, std, alpha, beta):
    """
    Get minimal sample size for A and B groups

    :param effect: expected effect during experiment
    :param std: standard deviation
    :param alpha: probability of type-I error
    :param beta: probability of type-II error
    :return: minimal sample size
    """
    ppf_alpha = sp.stats.norm.ppf(1 - alpha / 2, loc=0, scale=1)
    ppf_beta = sp.stats.norm.ppf(1 - beta, loc=0, scale=1)
    var = 2 * std ** 2
    sample_size = np.ceil((ppf_alpha + ppf_beta) ** 2 * var / (effect ** 2))
    return sample_size

# calculate minimal required sample size
sample_size = get_minimal_sample_size(mu * effect, sigma, alpha, beta)
print(f'sample_size = {sample_size}')

# calculate maximal amount of experiments that can run simultaneously
max_experiments = population_size / (sample_size * 2)
print(f'max_experiments = {max_experiments:0.1f}')
max_experiments = int(max_experiments)

# estimate:
# - empirical probability of type-I error
# - empirical probability of type-II error
# - confidence intervals for errors
for n_experiments in [max_experiments - 1, max_experiments, max_experiments + 1]:
    aa_type_1_errors = []
    ab_type_2_errors = []
    sample_size = int(population_size / (n_experiments * 2))

    for _ in range(10 ** 4):
        # generate groups A and B from normal distribution
        a = np.random.normal(loc=mu, scale=sigma, size=sample_size)
        b = np.random.normal(loc=mu, scale=sigma, size=sample_size)
        b_effect = b * (1 + effect)

        # we use AA-tests to empirically assess type I error
        # we use AB-tests to empirically assess type II error
        # each iteration has a binary outcome ('True' for error, 'False' for no error),
        # therefore, it is a Bernoulli random variable
        aa_type_1_errors.append(sp.stats.ttest_ind(a, b).pvalue < alpha)
        ab_type_2_errors.append(sp.stats.ttest_ind(a, b_effect).pvalue >= alpha)

    # estimate empirical probabilities of errors:
    # since each outcome is a Bernoulli random variable, their sum follows a Binomial distribution
    # therefore, Bernoulli-proportion 'n_errors / n' (i.e. 'mean') should approximately follow Normal distribution
    p_type_1_error = np.mean(aa_type_1_errors)
    p_type_2_error = np.mean(ab_type_2_errors)

    # calculate confidence intervals, using asymptotic normal approximation
    ci_type_1_error = sm.stats.proportion_confint(
        np.sum(aa_type_1_errors),
        len(aa_type_1_errors),
        alpha=alpha,
        method='normal'
    )
    ci_type_2_error = sm.stats.proportion_confint(
        np.sum(ab_type_2_errors),
        len(ab_type_2_errors),
        alpha=alpha,
        method='normal'
    )

    # print
    print('')
    print(f'n_experiments = {n_experiments}')
    print(f'sample_size = {sample_size}')
    print(f'empirical probability of type I error = {p_type_1_error:0.4f}')
    print(f'confidence interval = [{ci_type_1_error[0]:0.4f}, {ci_type_1_error[1]:0.4f}]')
    print(f'empirical probability of type II error = {p_type_2_error:0.4f}')
    print(f'confidence interval = [{ci_type_2_error[0]:0.4f}, {ci_type_2_error[1]:0.4f}]')


sample_size = 234.0
max_experiments = 21.4

n_experiments = 20
sample_size = 250
empirical probability of type I error = 0.0510
confidence interval = [0.0467, 0.0553]
empirical probability of type II error = 0.0890
confidence interval = [0.0834, 0.0946]

n_experiments = 21
sample_size = 238
empirical probability of type I error = 0.0514
confidence interval = [0.0471, 0.0557]
empirical probability of type II error = 0.1061
confidence interval = [0.1001, 0.1121]

n_experiments = 22
sample_size = 227
empirical probability of type I error = 0.0525
confidence interval = [0.0481, 0.0569]
empirical probability of type II error = 0.1191
confidence interval = [0.1128, 0.1254]
