
# PARLA

## Problem
- Write a function to estimate probabilities of type-I and type-II errors for different methods of adding effect
    - Two methods for adding an effect should be implemented:
        - 'all_const': increase all values in group B by a constant (mean * effect / 100).
        - 'all_percent': multiply all values in group B by (1 + effect / 100).
    - The function should return:
        - pvalues_aa: list of p-values comparing group A to A (no effect).
        - pvalues_ab: list of p-values comparing group A to modified group B (with effect).
        - type_one_error: estimated Type I error probability.
        - type_two_error: estimated Type II error probability.
    - For more info see function's docstring below

## Action
- for each control and experimental group generator provided:
    - calculated p-value for AA-test
    - for 'all_const' method:
        - added effect to experimental group by adding a constant to all values
        - calculated p-value for AB-test
    - for 'all_percent' method:
        - added effect to experimental group, by multiplying all values by a constant
        - calculated p-value for AB-test
- calculated type-I error, by taking mean of false positive flags
- calculated type-II error, by taking mean of false negative flags

## Result
- The function was implemented and successfully passed all tests

## Learning
- I revised relevant Python and Pandas functionality
    - I learned how to specify a type hint for generator-function
- I learned how to compare different methods of adding an effect in synthetic AB-testing
- I learned how to calculate type-I and type-II errors

## Application
- I can apply relevant Python and Pandas functionality for similar data-related problems
- I can compare how different methods of adding an effect in synthetic AB-testing behave on real-world data
- I can assess type-I and type-II errors on real-world data


In [2]:

from typing import Generator, Tuple, List

import numpy as np
import pandas as pd
from scipy import stats


In [3]:

def create_group_generator(
    metrics: pd.DataFrame,
    sample_size: int,
    n_iter: int
    # Returns generator-function, i.e. a function that uses 'yield' to produce values lazily
    # YieldType: type of the value that will be returned
    # SendType: None, meaning generator doesn't use .send() method, only .next()
    # ReturnType: None, meaning that generator stops with StopIteration, not with a value
) -> Generator[Tuple[np.ndarray, np.ndarray], None, None]:
    """
    Generator of random groups.

    :param metrics: DataFrame containing user metrics. Must include columns ['user_id', 'metric'].
    :param sample_size: Number of users in each group.
    :param n_iter: Number of iterations to generate random groups.
    :return: Generator yielding tuples of two arrays with metric values for groups A and B.
    """

    # get all unique user_ids
    user_ids = metrics['user_id'].unique()

    for _ in range(n_iter):
        # select user_ids for control and experimental groups without repetitions
        a_user_ids, b_user_ids = np.random.choice(user_ids, size=(2, sample_size), replace=False)

        # generate control and experimental groups
        a_metric_values = metrics.loc[metrics['user_id'].isin(a_user_ids), 'metric'].values
        b_metric_values = metrics.loc[metrics['user_id'].isin(b_user_ids), 'metric'].values

        # save function state and return groups
        yield a_metric_values, b_metric_values


In [4]:

# testing create_group_generator()
metrics = pd.DataFrame({'user_id': [1, 2, 3, 4], 'metric': [5, 6, 8, 9.1] })
sample_size = 2
n_iter = 3
group_generator = create_group_generator(metrics, sample_size, n_iter)

# the loop will run three times (n_iter)
for metrics_a_group, metrics_b_group in group_generator:
    print(metrics_a_group, metrics_b_group)

# generator should produce two lists (control and experimental), with two values each (sample_size), three times (n_iter)
# output should look like this (but with different random values):
# >>> [8.  9.1] [5. 6.]
# >>> [5.  9.1] [6. 8.]
# >>> [5. 6.] [8.  9.1]


[6.  9.1] [5. 8.]
[8.  9.1] [5. 6.]
[8.  9.1] [5. 6.]


In [8]:

def estimate_errors(
    group_generator: Generator[Tuple[np.ndarray, np.ndarray], None, None],
    effect_add_type: str,
    effect: float,
    alpha: float
) -> Tuple[List[float], List[float], float, float]:
    """
    Estimate Type I and Type II error probabilities.

    :param group_generator: Generator that yields tuples of metric values for two groups.
    :param effect_add_type: How to apply the effect to group B:
        - 'all_const': increase all values in group B by a constant (mean * effect / 100).
        - 'all_percent': multiply all values in group B by (1 + effect / 100).
    :param effect: Effect size in percent.
        For example, effect=3 means a 3% expected increase in the mean.
    :param alpha: Significance level (e.g., 0.05).
    :return:
        - pvalues_aa: list of p-values comparing group A to A (no effect).
        - pvalues_ab: list of p-values comparing group A to modified group B (with effect).
        - type_one_error: estimated Type I error probability.
        - type_two_error: estimated Type II error probability.
    """

    pvalues_aa = []
    pvalues_ab = []

    # for each control and experimental group generator provided
    for metrics_a_group, metrics_b_group in group_generator:

        # calculated p-value for AA-test
        pvalue_aa = np.round(
            stats.ttest_ind(metrics_a_group, metrics_b_group).pvalue,
            decimals=3
        )

        # added effect to experimental group by adding a constant to all values
        # calculated p-value for AB-test
        if effect_add_type == 'all_const':
            pvalue_ab = np.round(
                stats.ttest_ind(metrics_a_group, metrics_b_group + metrics_b_group.mean() * effect / 100).pvalue,
                decimals=3
            )
        # added effect to experimental group, by multiplying all values by a constant
        # calculated p-value for AB-test
        elif effect_add_type == 'all_percent':
            pvalue_ab = np.round(
                stats.ttest_ind(metrics_a_group, metrics_b_group * (1 + effect / 100)).pvalue,
                decimals=3
            )
        else:
            raise ValueError("parameter 'effect_add_type' can only be 'all_const' or 'all_percent'.")

        # Add p-values
        pvalues_aa.append(pvalue_aa)
        pvalues_ab.append(pvalue_ab)

    # calculate type-I error, by taking mean of false positive flags
    type_one_error = (np.array(pvalues_aa) < alpha).astype(int).mean()

    # calculate type-II error, by taking mean of false negative flags
    type_two_error = (np.array(pvalues_ab) > alpha).astype(int).mean()

    return pvalues_aa, pvalues_ab, type_one_error, type_two_error



In [17]:

# testing function estimate_errors()
sample_size = 100
n_iter = 10
effect = 6
alpha = 0.05


# test case 01 #########################################################################
# test 'add constant' method
effect_add_type = 'all_const'

# deterministic generator for testing purposes only
group_generator = (
    (np.arange(sample_size, dtype=float), np.arange(sample_size, dtype=float) + x,)
    for x in range(n_iter)
)

# get values
pvalues_aa, pvalues_ab, type_one_error, type_two_error = estimate_errors(
    group_generator, effect_add_type, effect, alpha
)

# test values
if (pvalues_aa == [1.0, 0.808, 0.626, 0.466, 0.331, 0.224, 0.145, 0.09, 0.053, 0.029] and
    pvalues_ab == [0.47, 0.327, 0.216, 0.135, 0.08, 0.045, 0.024, 0.012, 0.006, 0.003] and
    type_one_error == 0.1 and
    type_two_error == 0.5):
    print('test_01: passed')
else:
    print('test_01: failed')


# test case 02 #########################################################################
# test 'multiply by effect' method
effect_add_type = 'all_percent'

# deterministic generator for testing purposes only
group_generator = (
    (np.arange(sample_size, dtype=float), np.arange(sample_size, dtype=float) + x,)
    for x in range(n_iter)
)

# get values
pvalues_aa, pvalues_ab, type_one_error, type_two_error = estimate_errors(
    group_generator, effect_add_type, effect, alpha
)

# test values
if (pvalues_aa == [1.0, 0.808, 0.626, 0.466, 0.331, 0.224, 0.145, 0.09, 0.053, 0.029] and
        pvalues_ab == [0.483, 0.342, 0.23, 0.147, 0.09, 0.052, 0.028, 0.015, 0.007, 0.003] and
        type_one_error == 0.1 and
        type_two_error == 0.6):
    print('test_02: passed')
else:
    print('test_02: failed')


test_01: passed
test_02: passed
