In [1]:
import wmfdata as wmf

# Data collection

In [30]:
QUERY = """
SELECT
  FIRST(country) AS country,
  LAST(app_version) as app_version,
  AVG(pageviews) AS avg_daily_pageviews,
  COUNT(*) AS days_active
FROM (
  SELECT
    event.user_id AS user_id,
    FIRST(geocoded_data['country']) AS country,
    LAST(event.app_version) AS app_version,
    COUNT(DISTINCT event.pageview_token) AS pageviews
  FROM event.inukapageview
  WHERE
    year = 2021
    AND (
      month = 1 AND day >= 19
      OR month = 2
      OR month = 3 AND day < 2
    )
    AND event.client_type = "kaios-app"
    AND NOT event.is_main_page
    AND geocoded_data['country'] NOT IN ("India", "United States")
  GROUP BY
    event.user_id,
    month, day
) user_days
GROUP BY user_id
"""

avg_daily_pageviews_per_user = wmf.spark.run(QUERY, session_type="yarn-large")

In [31]:
avg_daily_pageviews_per_user.groupby('app_version').size()

app_version
0.0.0          1309
1.0.0           643
1.0.0.1231       20
1.1.0            63
1.2.0         17853
dtype: int64

In [33]:
avg_daily_pageviews_per_user.groupby("country").size().sort_values(ascending=False).head(20)

country
Uganda          3448
Pakistan        3396
Nigeria         2863
Tanzania        2358
Puerto Rico      802
Portugal         694
Cameroon         570
Canada           406
Egypt            347
Mexico           321
Ivory Coast      293
DR Congo         250
Russia           239
Germany          230
South Africa     218
Madagascar       213
Rwanda           188
Zambia           185
Mali             151
Benin            146
dtype: int64

In [35]:
target_users = (
  avg_daily_pageviews_per_user
  .query(
    "country in ('Uganda', 'Pakistan', 'Nigeria', 'Tanzania') &"
    "app_version == '1.2.0'"
  )
)

In [38]:
len(target_users)

12033

In [39]:
target_users['avg_daily_pageviews'].mean()

3.81807134387336

In [28]:
target_users['avg_daily_pageviews'].std()

4.256881261397582

# Power analysis
The included users will be split 50-50 into control and trending articles groups. Our main metric will likely be users' average pageviews per day active, and we decided that smallest change in it that we would consider meaningful is 10%.

Looking at the previous 6 weeks of data (2020-01-08 ≤ t < 2020-03-02), the top 4 countries besides India and the United States had the following number of unique users running the most recent version of the app (1.2.0):
```
Uganda          3448
Pakistan        3396
Nigeria         2863
Tanzania        2358
```

(We are interested in emerging markets, which rules out the United States, and had to abandon plans to run the experiment with Indian users because of our complete inability to get Jio to release a new version of the app).

This totals 12,033 users who would have been included in our test, or roughly 6,016 for each group. These users averaged 3.818 pageviews per active day (PPAD), with a standard deviation of 4.257. A 10% increase in PPAD would give 4.200 in the experimental group. 

My current plan is to use a [Mann–Whitney U test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test) to test the difference in PPAD. I used [G\*Power](https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower.html) 3.1 to calculate the power achieved assuming a (1) a two-tailed test, (2) a 0.01 significance level, and (3) the minimum asymptotic relative efficiency ("min ARE") method (the power varies with the distribution of response variable, but this method uses the worst case scenario).

Under these conservative assumptions, we get a power (the probability of the test returning a false negative) of 0.977.
![2021-03-09 GPower calculation](2021-03-09_GPower_calculation.png)

After we release the app version with the experiment, it should quickly reach our users. It seems that KaiOS devices automatically install app updates every day by default, and indeed we see very few users running old version of the app (there has never been an app with version 0.0.0, so those users are not real):
```
app_version
0.0.0          1309
1.0.0           643
1.0.0.1231       20
1.1.0            63
1.2.0         17853
```

I may change the analysis methodology later (e.g. replacing the arbitrary calendar day with our already-existing concept of sessions or adding more predictor variables beyond group membership), but these should only increase the power.

Therefore, the experiment should have sufficient power if we run it for **6 weeks** on app users in **Nigeria, Uganda, Pakistan and Tanzania**.
