
# PARLA

## Problem

1. Implement a function that calculates profit for an online store:
    - formula for the metric (profit): M = R - C - 110 * S
    - where:
        - price (R)
        - cost (C)
        - whether the customer contacted support (S)
        - all variables R, C, and S are non-negative
        - S equals 0 if there was no support request, and 1 if there was
    - on average, each support request costs 110 rubles — this includes operator time and compensation for customer inconvenience

2. Using the function from Step-1 and provided code for `df_control` and `df-pilot`:
- **Determine the threshold value of the support request cost**, at which a slight deviation changes the result of the hypothesis test.
- In other words, you need to find the value of support-request cost at which:
    - the p-value equals the significance level
    - the mean in the experimental group is greater than in the control group
    - round your answer to the nearest whole number
- We are testing the hypothesis of equal means for the metric calculated in the previous task:
    - The test used is **Student’s t-test**
    - The **significance level is 0.05**

## Action
1. Implemented the function that calculates the required profit-metric
2. Calculated the threshold value of the support request cost:
    - to do that, for each tested support-cost, I:
        - generated control group, containing control-profit values
        - generated experimental group, containing experimental-profit values
        - calculated delta between experimental-mean and control-mean
        - performed independent t-test using control and experimental groups

## Result
1. The implemented function successfully passed all tests.
2. Successfully calculated threshold value (317) for the support request cost.

## Learning
- I revised relevant Python, and Pandas functionality
- I learned that changing of parameters that influence studied metric, naturally influence the metric, deltas, and p-value, i.e. difference between control and experimental groups and significance of that difference

## Application
- I can apply relevant Python, Numpy, and Scipy functionality for similar data-related problems
- I a real-world one can iterate through parameter-values (like support-cost) and figure out when profit becomes significantly better or significantly worse


In [21]:

import numpy as np
import pandas as pd
import scipy as sp


In [22]:

def get_metric(
    df: pd.DataFrame,
    support_cost: float
) -> pd.Series:
    """
    Calculates metric values, using the formula: M = R - С - 110 * S

    :param df: A DataFrame with order data containing the following columns:
        - revenue: revenue
        - cost_price: cost of goods sold
        - support: number of support requests

    :param support_cost: Average cost of a single support request.

    :return: A Series containing the computed metric values.
    """

    # M = R - С - 110 * S
    return df.apply(lambda row: row.revenue - row.cost_price - support_cost * row.support, axis=1)


In [23]:

# testing function get_metric()
df = pd.DataFrame({
    'revenue': [1500, 1800, 2100],
    'cost_price': [1300, 1200, 1600],
    'support': [1, 0, 0],
})
support_cost = 110

answer = pd.Series([90, 600, 500])
result = get_metric(df, support_cost)
if answer.equals(result):
    print('test case 01: passed')
else:
    print('test case 01: failed')


test case 01: passed


In [24]:

sample_size = 1000

df_control = pd.DataFrame({
    'revenue': [int(np.sin(x / 12) * 600 + 1200) for x in range(sample_size)],
    'cost_price': [int(np.sin(x / 12) * 400 + 700) for x in range(sample_size)],
    'support': (np.arange(sample_size) < sample_size - 400).astype(int),
})

df_pilot = pd.DataFrame({
    'revenue': [int(np.sin(x / 11 + 1) * 650 + 1250) for x in range(sample_size)],
    'cost_price': [int(np.sin(x / 11 + 1) * 400 + 700) for x in range(sample_size)],
    'support': (np.arange(sample_size) < sample_size - 300).astype(int),
})

for cost in range(310, 330):
    # generate control group, containing control-profit values
    a_profit = get_metric(df_control, cost)

    # generate experimental group, containing experimental-profit values
    b_profit = get_metric(df_pilot, cost)

    # calculate delta between experimental-mean and control-mean
    delta = round(b_profit.mean() - a_profit.mean(), ndigits=2)

    # perform independent t-test using control and experimental groups
    pvalue = round(sp.stats.ttest_ind(a_profit, b_profit).pvalue, 3)

    # the threshold value is 317
    print(f'cost: {cost}; delta: {delta} pvalue: {pvalue}')


cost: 310; delta: 20.09 pvalue: 0.04
cost: 311; delta: 19.99 pvalue: 0.041
cost: 312; delta: 19.89 pvalue: 0.043
cost: 313; delta: 19.79 pvalue: 0.044
cost: 314; delta: 19.69 pvalue: 0.046
cost: 315; delta: 19.59 pvalue: 0.047
cost: 316; delta: 19.49 pvalue: 0.048
cost: 317; delta: 19.39 pvalue: 0.05
cost: 318; delta: 19.29 pvalue: 0.051
cost: 319; delta: 19.19 pvalue: 0.053
cost: 320; delta: 19.09 pvalue: 0.055
cost: 321; delta: 18.99 pvalue: 0.056
cost: 322; delta: 18.89 pvalue: 0.058
cost: 323; delta: 18.79 pvalue: 0.06
cost: 324; delta: 18.69 pvalue: 0.061
cost: 325; delta: 18.59 pvalue: 0.063
cost: 326; delta: 18.49 pvalue: 0.065
cost: 327; delta: 18.39 pvalue: 0.067
cost: 328; delta: 18.29 pvalue: 0.069
cost: 329; delta: 18.19 pvalue: 0.071
