
# PARLA

## Problem
On one hand, removing outliers reduces variance, which increases the sensitivity of the test.
On the other hand, removing outliers reduces the sample size, which decreases sensitivity.
Compare the power of tests with different proportions of removed data:
- Use the backend response time data from `2022-04-01T12_df_web_logs.csv` for the period from March 1 to March 8, 2022.
- The significance level is 0.05.
- Group sizes are 1,000 users (the sample size will be larger because each user has many data points).
- We are testing the hypothesis of equal means using Student’s t-test.
- **The expected effect is a 1% increase in processing time.**
- **But instead of adding the effect to the whole experimental group, add it to 1% of experimental group.**
- **The effect in synthetic A/B tests must be added by addition of a constant to 1% of the experimental group.**

As your answer, enter the option numbers sorted by **decreasing test power**.
For example, "12345" means that option 1 has the highest power, and option 5 the lowest:
1. Remove 0.02% of outliers
2. Remove 0.2% of outliers
3. Remove 2% of outliers
4. Remove 10% of outliers
5. Remove 20% of outliers

Removing 2% of outliers means removing 1% of the lowest and 1% of the highest values from the sample.
That is, keep the values that lie between `np.quantile(values, 0.01)` and `np.quantile(values, 0.99)`.
Quantiles should be computed separately for each group.

## Action
To calculate power for different percentage of removed outliers, I:
- loaded, transformed, and filtered the web-logs dataset
- for each percentage of outliers:
    - calculated left and right percentiles
    - filtered out outliers from the web-logs data
    - performed synthetic AB-testing 10^3 times:
        - sampled control and experimental groups
        - added effect to the experimental group
        - calculated p-value, by performing t-test
    - found empirical probability of type-II error
    - found empirical power
- ordered powers in descending order

## Result
Correctly assessed empirical powers:
- of independent t-tests
- when the effect is added by adding a constant to 1% of the experimental group
- for different number of removed outliers

## Learning
- I revised relevant Python, Numpy, Pandas, and Scipy functionality
- I learned how to remove a certain percentage of outliers from both sides of a distribution by using percentiles
- I learned that the method of adding effect to experimental group has influence on statistical power
    - In the previous study, where effect was added by multiplying the whole experimental group by a constant, there was clear pattern, showing that the more outliers we remove, the higher the power will be
    - In the current study, where we add a constant to 1% of the experimental group, there is no such pattern
- Therefore, I learned that in real-world one has to perform a thorough synthetic AB-testing, to find based on historical data how effect is added and how this effect influences removal of outliers and statistical power

## Application
- I can apply relevant Python, Numpy, Pandas and Scipy functionality for similar data-related problems
- I can apply this technique of removing outliers to a real-world data-related problems


In [81]:

from datetime import datetime

import numpy as np
import pandas as pd
import scipy as sp


In [86]:

# load, transform, and filter the web-logs dataset
df = pd.read_csv('data/2022-04-01T12_df_web_logs.csv')
df.date = pd.to_datetime(df.date)
df = df[(df.date >= datetime(2022, 3, 1)) & (df.date < datetime(2022, 3, 8))]
df = df[['user_id', 'load_time']]
print(len(df))
df.head()


245851


Unnamed: 0,user_id,load_time
885082,434cf2,69.8
885083,80fa93,86.3
885084,434cf2,58.0
885085,a0aaab,85.2
885086,a22f92,92.5


In [83]:

alpha = 0.05  # significance level, probability of type-I error
sample_size = 1000  # size of control and experimental groups
percentage = [0.0002, 0.002, 0.02, 0.1, 0.2]  # what percentage of outliers to remove
pvalues_dict = {p: [] for p in percentage}  # dict of lists, containing p-values for each percentage
powers = {}  # resulting powers for different percentage of outliers

# add effect in synthetic AB-testing to a small percent of data
percent_of_data_with_effect = 0.01  # percent of data in experimental group to add effect to
effect = 0.01

# calculate unique user-ids for sampling control and experimental groups later
user_ids = df.user_id.unique()
print(f'len(user_ids) = {len(user_ids)}')

for i in range(10**4):
    # SAMPLE CONTROL AND EXPERIMENTAL GROUPS ######################################################
    # sample user-ids for control and experimental groups without repetitions
    a_user_ids, b_user_ids = np.random.choice(user_ids, size=(2, sample_size), replace=False)

    # get metric values for control and experimental groups
    a_values = df.loc[df['user_id'].isin(a_user_ids), 'load_time'].values
    b_values = df.loc[df['user_id'].isin(b_user_ids), 'load_time'].values

    # get number of values for control and experimental groups
    # this number can be different each time,
    # since different users have different number of metric-values
    a_size = len(a_values)
    b_size = len(b_values)

    # calculate mean for control and experimental groups
    a_mean = np.mean(a_values)
    b_mean = np.mean(b_values)

    # ADD EFFECT TO THE EXPERIMENTAL GROUP ########################################################
    # get size of indexes with effect
    b_indexes_with_effect_size = int(b_size * percent_of_data_with_effect)

    # get random indexes to add effect to
    b_indexes_with_effect = np.random.choice(
        np.arange(b_size),
        size=b_indexes_with_effect_size,
        replace=False
    )

    # add effect to the indexes in the experimental group
    # since we must add effect not to the whole experimental-group,
    # but only to a small percent of values,
    # we must scale the effect value,
    # i.e. we must add much more weight to each value
    effect_value = b_mean * effect
    effect_scale = b_size / b_indexes_with_effect_size
    b_values[b_indexes_with_effect] += effect_value * effect_scale

    # FILTER OUTLIERS FROM CONTROL AND EXPERIMENTAL GROUPS ########################################
    for p in percentage:
        pvalues = []
        # filter control group
        half = p / 2
        left = np.quantile(a_values, half)
        right = np.quantile(a_values, 1 - half)
        a_values_clean = a_values[(left < a_values) & (a_values < right)]

        # filter experimental group
        left = np.quantile(b_values, half)
        right = np.quantile(b_values, 1 - half)
        b_values_clean = b_values[(left < b_values) & (b_values < right)]

        # calculate p-value, by performing t-test
        pvalue = sp.stats.ttest_ind(a_values_clean, b_values_clean).pvalue
        pvalues_dict[p].append(pvalue)

# calculate power
for p in percentage:
    pvalues = np.array(pvalues_dict[p])
    pvalues = pvalues > alpha  # find type-II errors (false negatives)
    beta = pvalues.astype(int).mean()  # find empirical probability of type-II error
    power = 1 - beta  # find empirical power
    power = np.round(power, decimals=2)
    powers[p] = power

for i in powers.items():
    print(i)


len(user_ids) = 35086
(0.0002, np.float64(0.09))
(0.002, np.float64(0.35))
(0.02, np.float64(0.46))
(0.1, np.float64(0.3))
(0.2, np.float64(0.33))


In [84]:

# sort powers in descending order
powers_sorted = sorted(powers.items(), key=lambda i: i[1], reverse=True)
powers_sorted = dict(powers_sorted)
for i in powers_sorted.items():
    print(i)


(0.02, np.float64(0.46))
(0.002, np.float64(0.35))
(0.2, np.float64(0.33))
(0.1, np.float64(0.3))
(0.0002, np.float64(0.09))
