# Simpson's Paradox

We checked the overall approval rate over several products. It was 82% last week but it went down to 79% this week. But when we see the individual approval rate by product, they stayed the same or even went up from last week to this week. What could be the cause of the decrease of the overall approval rate? The individual approval rates by product are the following:

- Product 1: 84% -> 85%
- Product 2: 77% -> 77%
- Product 3: 81% -> 82%
- Product 4: 88% -> 88%

We are only given approval rates, but we don't know the numerators and denominators of each approval rate. Also the denominator could change from last week to this week. Suppose we had equal applications for each product last week. The overall approval rate last week is the equally weighted average of each approval rate. But if the application for the product 2 increased this week, the overall approval rate is computed by the weighted average putting more weight on product 2 due to more application. In that case, overall approval rate can decrease even if the individual approval rates stayed the same or increased. This phenomenon is **Simpson's Paradox**. We observe certain trend or pattern in groups but when aggregating them, it disappears or gets reversed. Example computation is the below.

In [33]:
n1 = 100
a1_before = 84
a1_after = 85

n2_before = 100
n2_after = 1000
a2_before = 77
a2_after = 770

n3 = 100
a3_before = 81
a3_after = 82

n4 = 100
a4_before = 88
a4_after = 88

print('Approval rate by individual product')
print(f'Product 1: {a1_before / n1:.0%} -> {a1_after / n1:.0%}')
print(f'Product 2: {a2_before / n2_before:.0%} -> {a2_after / n2_after:.0%}')
print(f'Product 3: {a3_before / n3:.0%} -> {a3_after / n3:.0%}')
print(f'Product 4: {a4_before / n4:.0%} -> {a4_after / n4:.0%}')
print()

num_before = a1_before + a2_before + a3_before + a4_before
den_before = n1 + n2_before + n3 + n4
num_after = a1_after + a2_after + a3_after + a4_after
den_after = n1 + n2_after + n3 + n4


print('Overall approval rate')
print(f'From {num_before / den_before:.0%} with numerator: {num_before}, denominator: {den_before}')
print(f'To {num_after / den_after:.0%} with numerator: {num_after}, denominator: {den_after}')

Approval rate by individual product
Product 1: 84% -> 85%
Product 2: 77% -> 77%
Product 3: 81% -> 82%
Product 4: 88% -> 88%

Overall approval rate
From 82% with numerator: 330, denominator: 400
To 79% with numerator: 1025, denominator: 1300
