# Topic Notification AB Test QA

Date: 8 June 2022

# References
* [Schema Spec](https://docs.google.com/document/d/1kCkU2k82_7NYaIvlY5vlJvOz5pM859seOjz3KSJlriI/edit)
* [Schema](https://schema.wikimedia.org/repositories//secondary/jsonschema/analytics/mediawiki/talk_page_edit/current.yaml)
* [Task](https://phabricator.wikimedia.org/T304029)
* [Bucketing criteria](https://phabricator.wikimedia.org/T304030)


# Summary of Checks Performed
QA to confirm that the buckets appear balanced for the Topic Notification AB test. 

Checks to perform:
* Balanced as expected (CONFIRMED)
* People who are editing at the Wikipedias listed in the ===Candidate wikis  (CONFIRMED)
* People are logged in (read: people who are logged out should be excluded from the A/B test) (ISSUE - SOME events logged)
* Within the Topic Subscriptions A/B test, we are able to distinguish all events logged for the control group and the test group (CONFIRMED)
* People should remain the same group they were bucketed in for the duration of the test, even if they explicitly turn on or off the Enable topic subscription setting within Special:Preferences (CONFIRMED)
* People who have or have NOT used [i] Topic Subscriptions prior to the A/B test beginning. (Confirmed)
* Confirm everything needed to calculate KPI: "For all comments and new topics with a response, the average time duration from "Person A" posting on a talk page and "Person B" posting a response, grouped by experience level".  (CONFIRMED ALL INSTRUMENTATION IS AVAILABLE - Further work on the final query and groupsing still needed)

In [1]:
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
    library(magrittr); library(zeallot); library(glue); library(tidyverse); library(zoo); library(lubridate)
    library(scales)
})

In [71]:
#collect all test events
query <-
"
SELECT
  date_format(dt, 'yyyy-MM-dd') as attempt_dt,
  event.editing_session_id as edit_attempt_id,
  wiki As wiki,
  event.bucket AS experiment_group,
  event.editor_interface as interface,
  event.integration as integration,
  event.user_id as user_id,
  if(event.anonymous_user_token is NULL, false, true) as user_is_anonymous_bytoken, 
  if(event.page_ns % 2 = 1, true, false) as is_talk_page,
  event.user_id != 0 as user_is_registered, 
  event.platform as platform, 
-- review participating wikis
  IF( wiki IN ('amwiki', 'arzwiki', 'bnwiki', 'eswiki', 'fawiki', 'frwiki', 'hewiki',  'hiwiki',  'idwiki', 'itwiki',  'jawiki',  
 'kowiki', 'nlwiki', 'omwiki', 'plwiki', 'ptwiki',  'thwiki',
    'ukwiki', 'viwiki','zhwiki'), 'TRUE', 'FALSE'
) AS is_AB_test_wiki,
  event.is_oversample AS is_oversample
FROM event.editattemptstep
WHERE
-- since deployment
  Year = 2022
  AND month = 06 
  AND day >= 02
  -- remove bots
  AND useragent.is_bot = false
-- only test events
  AND event.bucket in ('test', 'control')
"

In [72]:
edit_sessions <- wmfdata::query_hive(query)

Don't forget to authenticate with Kerberos using kinit



# Confirm buckets are balanced

## Overall

In [73]:
#check overall user bucket number to confirm if buckets are balanced

sessions_by_bucket <- edit_sessions %>%
  group_by(experiment_group, is_ab_test_wiki) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

sessions_by_bucket

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,is_ab_test_wiki,users,attempts
<chr>,<lgl>,<int>,<int>
control,True,7168,31776
test,True,7208,30738


* Confirmed bucketing is balanced. Across all wikis, thre is an expected close to 50/50 split of users between the control and test groups. 7168 users that made an edit attempt in the control and 7208 users that made an edit attempt in the test group. The number of associated edit attempts within each bucket also appear as expected if the buckets are balanced.
* Confirmed that only attempts in the AB test wikis have been logged.

## By Wiki

In [74]:

sessions_by_bucket_wiki <- edit_sessions %>%
  group_by(experiment_group, wiki) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id)) %>%
  arrange(wiki)

sessions_by_bucket_wiki

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,wiki,users,attempts
<chr>,<chr>,<int>,<int>
test,amwiki,2,3
control,arzwiki,4,50
test,arzwiki,7,35
control,bnwiki,59,199
test,bnwiki,66,410
control,eswiki,1132,4246
test,eswiki,1089,3841
control,fawiki,284,1540
test,fawiki,270,1135
control,frwiki,1259,5874


## Are all events on desktop

In [75]:
sessions_by_bucket_platform <- edit_sessions %>%
  group_by(experiment_group, platform) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

sessions_by_bucket_platform

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,platform,users,attempts
<chr>,<chr>,<int>,<int>
control,desktop,7168,31776
control,other,1,1
test,desktop,7208,30738
test,other,1,1


There have been just two event logged as coming from the other platform but the majority of attempts are from desktop as expected.

## Across all interface types

In [77]:
sessions_by_bucket_interface <- edit_sessions %>%
  group_by(experiment_group, interface) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

sessions_by_bucket_interface

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,interface,users,attempts
<chr>,<chr>,<int>,<int>
control,visualeditor,2462,7157
control,wikitext,5804,21093
control,wikitext-2017,1303,6356
test,visualeditor,2479,6603
test,wikitext,5859,21493
test,wikitext-2017,1204,5169


* Confirmed events occur across all interface types for both control and test groups.

# Only logged-in users are bucketed

In [78]:
# check by user_is_anonymous_byid

any_anon_sessions <- edit_sessions %>%
  filter(user_id == 0) %>% # find anon session
  group_by(experiment_group) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

any_anon_sessions

`summarise()` ungrouping output (override with `.groups` argument)



experiment_group,users,attempts
<chr>,<int>,<int>
control,1,5
test,1,2


ISSUE: There are 7 total edit attempts (5 control and 2 test) that are in the AB test but indicate they are from a logged out user (user_id = 0). These have been logged started on June 2nd through June 8th. Just 1 or two events a day or so. All occured on VE with page integration and across several different wikis. 

# People should remain the same group they were bucketed in for the duration of the test

In [80]:
# check for any duplicate user_ids across groups


users_across_groups <- edit_sessions %>%
    group_by(user_id) %>%
    summarise(control_events = sum(experiment_group == 'test'),
              test_events = sum(experiment_group == 'control'))  %>%
    filter(control_events > 0  & test_events > 0)


users_across_groups

`summarise()` ungrouping output (override with `.groups` argument)



user_id,control_events,test_events
<int>,<int>,<int>
0,3,6
2770397,2,28


* Confirmed with the small exception of 6 total logged-out attempts, which is expected since we can't differentiate those users. 

# People who have or have NOT used [i] Topic Subscriptions prior to the A/B test beginning.

Note: This will use [prefUpdate](https://schema.wikimedia.org/repositories//secondary/jsonschema/analytics/legacy/prefupdate/current.yaml) instrument. Topic subscription properties added in https://phabricator.wikimedia.org/T307733. Query below confirms this is working as expected and includes userID so joins can be made. 

In [62]:
# Check prefUpdate

query <-

"SELECT
    event.saveTimestamp,
    event.userID,
    event.property,
    event.value
FROM
  event.prefupdate
WHERE
    year = 2022
    AND month = 06
    AND day >= 02
    AND event.property IN ('discussiontools-topicsubscription','discussiontools-autotopicsub')
"

In [63]:
topic_preferences <- wmfdata::query_hive(query)

Don't forget to authenticate with Kerberos using kinit



In [70]:
# Numbers of users for each selection
topic_preferences_selections <- topic_preferences  %>%
    group_by(property, value) %>%
    summarise(n_users = n_distinct(userid))
              
topic_preferences_selections

`summarise()` regrouping output by 'property' (override with `.groups` argument)



property,value,n_users
<chr>,<chr>,<int>
discussiontools-autotopicsub,0,9
discussiontools-autotopicsub,true,33
discussiontools-topicsubscription,1,2
discussiontools-topicsubscription,false,28
discussiontools-topicsubscription,true,2


**CONFIRMED**: Topic subscription properties are recorded as expected. Need to investigate the meaning for various values (false, true, 0 and 1). 

# Identify comments and topics posted by bucketed users

In [43]:
# rough query to confirm all needed data is logged as expected. Will be refined for analysis
query <-

"SELECT
 date_format(tpe.meta.dt, 'yyyy-MM-dd') as attempt_dt,
  tpe.session_id,
  eas.event.bucket AS test_group,
  eas.event.editing_session_id,
  tpe.performer.user_id,
  eas.event.user_id,
  tpe.component_type AS topic_type,
  tpe.topic_id AS topic_id,
  tpe.comment_id AS comment_id,
  tpe.comment_parent_id As comment_parent_id,
  tpe.performer.user_edit_count AS edit_count
FROM
  event.mediawiki_talk_page_edit tpe
LEFT JOIN 
  event.editattemptstep eas
  ON session_id = eas.event.editing_session_id
  AND eas.year = 2022  
  and eas.month = 06
  AND eas.day >= 02
  WHERE
  tpe.year = 2022
  and tpe.month = 06
  AND tpe.day >= 02
  AND useragent.is_bot = false
  AND event.bucket in ('test', 'control')
  AND tpe.performer.user_id > 0
"


In [44]:
topic_events <- wmfdata::query_hive(query)

Don't forget to authenticate with Kerberos using kinit



## Types of comments posted

In [46]:
sessions_by_comment_post <- topic_events %>%
  group_by(test_group, topic_type) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(editing_session_id))

sessions_by_comment_post

`summarise()` regrouping output by 'test_group' (override with `.groups` argument)



test_group,topic_type,users,attempts
<chr>,<chr>,<int>,<int>
control,comment,53,65
control,response,643,2169
control,topic,646,1543
test,comment,60,83
test,response,617,1819
test,topic,661,1306


* Number of comments, responses and topics within each group seem balanced.

## Confirm instrumentation needed to calculate the KPI is available

For all comments and new topics with a response, the average time duration from "Person A" posting on a talk page and "Person B" posting a response, grouped by experience level". 

Steps to calculate:
- Find all sessions where comment_id is in the comment_parent_id list. These are all topics or new comments that have received a response. FEASIBLE
- Select the user_id and dt and comment_parent_id and timestamp for all these comments. FEASIBLE
- Find the user_id and timestamp for these comments. FEASIBLE
- Final Query is to subtract the two timestamps FEASIBLE

In [50]:
# Find all all topics or new comments that have received a response. 
# Topics with response - commment parent id = topic_id
# Comments with a response - comment_id = comment_parent id
comments_w_response <- topic_events %>%
    filter(comment_id %in% comment_parent_id |
          topic_id %in% comment_parent_id ) %>%  # comments and topics that recieved a response
 group_by(test_group, topic_type) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(editing_session_id))

comments_w_response

`summarise()` regrouping output by 'test_group' (override with `.groups` argument)



In [51]:
head(comments_w_response)

test_group,topic_type,users,attempts
<chr>,<chr>,<int>,<int>
control,comment,41,47
control,response,323,1045
control,topic,646,1543
test,comment,47,64
test,response,294,754
test,topic,661,1306


Number appear as expected when limited to topics and comments that received a response. Note all topics will always have a top line comment that is also added so I might be able to remove that condition.

Need to add a second row with the parent id

In [None]:
# Find time difference
comments_w_response_time <- topic_events %>%
    filter(comment_id %in% comment_parent_id |
          topic_id %in% comment_parent_id ) %>% 
    group_by(comment_id)  %>% 
    mutate(response_time = difftime(attempt_dt, attempt_dt[comment_parent_id == comment_id], units = "mins"))
