# New Discussion Tool AB Test Post-Deployment QA

22 February 2022
[Task](https://phabricator.wikimedia.org/T291308)

The purpose of this post-deployment QA is to confirm that events are logging as expected and needed to run the AB test.

In [7]:
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
    library(magrittr); library(zeallot); library(glue); library(tidyverse); library(zoo); library(lubridate)
    library(scales)
})

# QA Checks
- Test and control events recorded. Confirmed.
- Buckets Balanced: Confirmed for Logged-In. Logged-Out now confirmed after patches below:
  - Update: Patch backported on 3 Feb to fix bucketing for logged-out users
  - Update: Patch deployed on 17 Feb to fix bucketing issue identifed for full-text logged out users. Fixes included providing anonymous user token and working for users that use  full-page
wikitext editing. https://phabricator.wikimedia.org/T301497
- Only at participating wikis. Confirmed. 
- Only desktop events. Confirmed.
- New section events recored in test groups. Confirmed


# Review of all edit sessions by AB test bucket

In [23]:
#collect all test events
query <-
"
SELECT
  date_format(dt, 'yyyy-MM-dd') as attempt_dt,
  event.editing_session_id as edit_attempt_id,
  wiki As wiki,
  event.bucket AS experiment_group,
  event.editor_interface as interface,
  event.integration as integration,
  if(event.user_id != 0, concat(wiki, '-', event.user_id), event.anonymous_user_token) as user_id,
  event.user_id = 0 as user_is_anonymous_byid, 
  if(event.anonymous_user_token is NULL, false, true) as user_is_anonymous_bytoken, 
  if(event.page_ns % 2 = 1, true, false) as is_talk_page,
  event.user_id != 0 as user_is_registered, 
  event.platform as platform, 
-- review participating wikis
  IF( wiki IN ('amwiki', 'bnwiki', 'zhwiki', 'nlwiki', 'arzwiki', 'frwiki', 'hewiki', 'hiwiki',
    'idwiki', 'itwiki', 'jawiki', 'kowiki', 'omwiki', 'fawiki', 'plwiki', 'ptwiki', 'eswiki', 'thwiki',
    'ukwiki', 'viwiki'), 'TRUE', 'FALSE'
) AS is_AB_test_wiki,
  event.is_oversample AS is_oversample
FROM event.editattemptstep
WHERE
-- since deployment
  Year = 2022
  AND ((month = 01 and day >= 27) OR (month = 02))
  -- remove bots
  AND useragent.is_bot = false
-- only test events
  AND event.bucket in ('test', 'control')
"

In [24]:
edit_sessions <- wmfdata::query_hive(query)

Don't forget to authenticate with Kerberos using kinit



# Sessions by Buckets Overall

In [92]:
#check overall user bucket number to confirm if buckets are balanced

sessions_by_bucket <- edit_sessions %>%
  filter(is_ab_test_wiki == 'TRUE',
        ) %>% # some test and control events recorded non ab_test_wikis
  group_by(experiment_group) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

sessions_by_bucket

`summarise()` ungrouping output (override with `.groups` argument)



experiment_group,users,attempts
<chr>,<int>,<int>
control,3407,15729
test,5301,17826


Across all wikis, only 39.1% of users logged attempts in the control group. This is a little lower than expected based on a 50/50 split of users. 

# Sessions by Buckets and Wiki

In [63]:
#check bucket numbers by wiki

sessions_by_bucket_wiki <- edit_sessions %>%
  filter(is_ab_test_wiki == 'TRUE') %>% # some test and control events recorded non ab_test_wikis
  group_by(experiment_group, wiki) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id)) %>%
  arrange(wiki)

sessions_by_bucket_wiki

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,wiki,users,attempts
<chr>,<chr>,<int>,<int>
test,amwiki,3,7
control,arzwiki,2,2
test,arzwiki,5,47
control,bnwiki,21,38
test,bnwiki,61,92
control,eswiki,445,1899
test,eswiki,670,2159
control,fawiki,88,403
test,fawiki,224,668
control,frwiki,514,2074


No attempts by bucketed users recorded for omwiki - expected as it's a smaller wiki and only been a few days. 
No control events recorded on am wiki. 
Control groups has fewer events across all the wikis.

## Are all events recorded on participating wikis

In [64]:
# check only AB test wikis

sessions_by_bucket_notestwikis <- edit_sessions %>%
  filter(is_ab_test_wiki == 'FALSE')  %>%
  group_by(experiment_group) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

sessions_by_bucket_notestwikis

`summarise()` ungrouping output (override with `.groups` argument)



experiment_group,users,attempts
<chr>,<int>,<int>
control,5,54
test,4,31


ISSUE: We are recording a few AB test and control events on non AB wikis. This might be leftover from a previous test. Need to investigate but the scope of these events won't impact the test. 

## Are all events on desktop?

In [66]:
sessions_by_bucket_platform <- edit_sessions %>%
  group_by(experiment_group, platform) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

sessions_by_bucket_platform

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,platform,users,attempts
<chr>,<chr>,<int>,<int>
control,desktop,3388,15641
test,desktop,5264,17655


PASS: Confirmed only desktop users included in AB test

## Are events recorded on all interface types?

In [67]:
sessions_by_bucket_interface <- edit_sessions %>%
  group_by(experiment_group, interface) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

sessions_by_bucket_interface

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,interface,users,attempts
<chr>,<chr>,<int>,<int>
control,visualeditor,1001,2108
control,wikitext,2299,11857
control,wikitext-2017,687,2488
test,visualeditor,2264,4045
test,wikitext,2454,10508
test,wikitext-2017,1347,3977


Confirmed that edit attempts in the test are recorded across all interfaces. 
For the control group, most attempts (58%) were recorded from the wikitext interface. 

## Are buckets balanced for logged-in and logged-out users?

In [25]:
# check by user_is_anonymous_byid

sessions_by_bucket_anon_byid <- edit_sessions %>%
  filter(is_ab_test_wiki == 'TRUE')  %>%
  group_by(experiment_group, user_is_anonymous_byid) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

sessions_by_bucket_anon_byid 

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,user_is_anonymous_byid,users,attempts
<chr>,<chr>,<int>,<int>
control,False,14274,99911
control,True,6175,11265
test,False,15113,95935
test,True,11210,18145


In [103]:
# by token

sessions_by_bucket_anon_bytoken <- edit_sessions %>%
  filter(is_ab_test_wiki == 'TRUE')  %>%
  group_by(experiment_group, user_is_anonymous_bytoken) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

sessions_by_bucket_anon_bytoken

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,user_is_anonymous_bytoken,users,attempts
<chr>,<chr>,<int>,<int>
control,False,6175,36268
control,True,790,1114
test,False,6860,33905
test,True,3040,4028


In [71]:
# by user is registered
sessions_by_bucket_anon_registered <- edit_sessions %>%
  filter(is_ab_test_wiki == 'TRUE')  %>%
  group_by(experiment_group, user_is_registered) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

sessions_by_bucket_anon_registered

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,user_is_registered,users,attempts
<chr>,<chr>,<int>,<int>
control,False,377,555
control,True,3006,15033
test,False,1627,2156
test,True,3633,15468


Confirmed the attempts in the test group appear as expected based on a 50/50 split.
ISSUE: It look like the imbalance is caused by the logged-out group. Only 19% of bucketedlogged-out users are included in the control group across all participating wikis. 

Update: Now 40% of users are in the control group

In [73]:
# confirm if this is happening on 1 particular wiki

sessions_by_bucket_anon_wiki <- edit_sessions %>%
  filter(is_ab_test_wiki == 'TRUE')  %>%
  group_by(experiment_group, user_is_anonymous_byid, wiki) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id)) %>%
  arrange(wiki)

sessions_by_bucket_anon_wiki

`summarise()` regrouping output by 'experiment_group', 'user_is_anonymous_byid' (override with `.groups` argument)



experiment_group,user_is_anonymous_byid,wiki,users,attempts
<chr>,<chr>,<chr>,<int>,<int>
test,false,amwiki,2,6
test,true,amwiki,1,1
control,false,arzwiki,2,2
test,false,arzwiki,4,46
test,true,arzwiki,1,1
control,false,bnwiki,17,33
control,true,bnwiki,4,5
test,false,bnwiki,22,52
test,true,bnwiki,39,40
control,false,eswiki,404,1837


In [None]:
Discrepancy appears to be happening across all wikis.

# Check Oversampling impact

In [93]:
sessions_by_oversample <- edit_sessions %>%
  filter(is_ab_test_wiki == 'TRUE') %>% # some test and control events recorded non ab_test_wikis
  group_by(experiment_group, is_oversample) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

sessions_by_oversample

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,is_oversample,users,attempts
<chr>,<chr>,<int>,<int>
control,False,2485,12190
control,True,1297,3540
test,False,2649,10986
test,True,3172,6840


In [None]:
Fewer users in the control group that are oversampled. Might be something more to investigate here.

## Check attempts by integration

In [None]:
# by integration
sessions_by_bucket_integration <- edit_sessions %>%
  filter(is_ab_test_wiki == 'TRUE',
        is_talk_page == 'true')  %>%
  group_by(experiment_group, integration, user_is_registered) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

sessions_by_bucket_integration

Only discussion tool events are being used by logged-out users.

## Check oversampling impact

In [89]:
# by integration
sessions_by_bucket_oversample <- edit_sessions %>%
  filter(is_ab_test_wiki == 'TRUE',
        is_talk_page == 'true')  %>%
  group_by(experiment_group, integration, is_oversample, user_is_registered) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

sessions_by_bucket_oversample

`summarise()` regrouping output by 'experiment_group', 'integration', 'is_oversample' (override with `.groups` argument)



experiment_group,integration,is_oversample,user_is_registered,users,attempts
<chr>,<chr>,<chr>,<chr>,<int>,<int>
control,discussiontools,False,False,28,29
control,discussiontools,False,True,92,139
control,discussiontools,True,False,289,423
control,discussiontools,True,True,639,1887
control,page,False,True,277,530
control,page,True,True,33,56
test,discussiontools,False,False,125,134
test,discussiontools,False,True,187,248
test,discussiontools,True,False,1450,1891
test,discussiontools,True,True,1380,3778


# Review Logged-Out User Bucketing

In [10]:
#collect all test events
query <-
"
SELECT
  event.editing_session_id as session_id,
  wiki As wiki,
  event.bucket AS experiment_group,
  event.editor_interface as interface,
  event.integration as integration,
  event.anonymous_user_token as anon_token,
-- check to make sure all anons have token assigned
  if(event.anonymous_user_token is NULL, false, true) as user_is_anonymous_bytoken, 
  event.platform as platform, 
  event.is_oversample AS is_oversample
FROM event.editattemptstep
WHERE
-- since deployment of patch
  Year = 2022
  AND (month = 02 and day >= 18) 
  -- remove bots
  AND useragent.is_bot = false
  AND event.user_id = 0 
AND event.user_class = 'IP'
-- only test events
  AND event.bucket in ('test', 'control')
-- only talk pages
  AND event.page_ns % 2 = 1
-- only anon user
  AND event.platform = 'desktop'
-- need to check bucketing on ready action as WikiEditor's server-side logging doesn't have access to the bucket or anonymous user ID for them
  AND event.action = 'ready'
-- partcipating wikis
  AND wiki IN ('amwiki', 'bnwiki', 'zhwiki', 'nlwiki', 'arzwiki', 'frwiki', 'hewiki', 'hiwiki',
    'idwiki', 'itwiki', 'jawiki', 'kowiki', 'omwiki', 'fawiki', 'plwiki', 'ptwiki', 'eswiki', 'thwiki',
    'ukwiki', 'viwiki')
"

In [11]:
edit_sessions_anon <- wmfdata::query_hive(query)

Don't forget to authenticate with Kerberos using kinit



## Check that overall buckets are balanced

In [12]:

sessions_by_bucket_anon <- edit_sessions_anon %>%
  group_by(experiment_group) %>%
  summarise(users = n_distinct(anon_token),
        attempts = n_distinct(session_id))

sessions_by_bucket_anon

`summarise()` ungrouping output (override with `.groups` argument)



experiment_group,users,attempts
<chr>,<int>,<int>
control,1195,1302
test,1886,2183


In [13]:
#check overall user bucket number to confirm if buckets are balanced

sessions_by_bucket_anon <- edit_sessions_anon %>%
  group_by(experiment_group, integration, user_is_anonymous_bytoken, interface) %>%
  summarise(users = n_distinct(anon_token),
        attempts = n_distinct(session_id))

sessions_by_bucket_anon

`summarise()` regrouping output by 'experiment_group', 'integration', 'user_is_anonymous_bytoken' (override with `.groups` argument)



experiment_group,integration,user_is_anonymous_bytoken,interface,users,attempts
<chr>,<chr>,<chr>,<chr>,<int>,<int>
control,discussiontools,True,visualeditor,128,166
control,discussiontools,True,wikitext-2017,73,102
control,page,True,wikitext,1010,1034
test,discussiontools,True,visualeditor,716,847
test,discussiontools,True,wikitext,104,122
test,discussiontools,True,wikitext-2017,174,252
test,page,True,wikitext,933,962


Update: After deployment of patch on 17 February 2022, I confirmed that the anonymous_user_token is being recorded for all logged-out users.


There are a number of attempts assigned by users without a token in the test in both the test and control group. Do we know where these might be coming from?

## Review by interface

In [30]:
sessions_by_bucket_anon_interface <- edit_sessions_anon %>%
  group_by(experiment_group, interface) %>%
  summarise(users = n_distinct(anon_token),
        attempts = n_distinct(session_id))

sessions_by_bucket_anon_interface

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,interface,users,attempts
<chr>,<chr>,<int>,<int>
control,visualeditor,128,166
control,wikitext,1010,1034
control,wikitext-2017,73,102
test,visualeditor,716,847
test,wikitext,1035,1084
test,wikitext-2017,174,252


Data seems as expected. We'd expect more wikitext attempts in the control group.

# Review New Section Events by users in New Discussion Tool AB Test

In [94]:
# collect all desktop edit attempts to create a new sectino by bucket/test group 
query <-
"
SELECT
  date_format(dt, 'yyyy-MM-dd') as attempt_dt,
  event.editing_session_id as edit_attempt_id,
  event.bucket AS experiment_group,
  wiki As wiki,
  event.integration AS event_type,
if(event.user_id != 0, concat(wiki, '-', event.user_id), event.anonymous_user_token) as user_id,
  event.user_id = 0 as user_is_anonymous_byid, 
  if(event.anonymous_user_token is NULL, false, true) as user_is_anonymous_bytoken, 
-- review participating wikis
  IF( wiki IN ('amwiki', 'bnwiki', 'zhwiki', 'nlwiki', 'arzwiki', 'frwiki', 'hewiki', 'hiwiki',
    'idwiki', 'itwiki', 'jawiki', 'kowiki', 'omwiki', 'fawiki', 'plwiki', 'ptwiki', 'eswiki', 'thwiki',
    'ukwiki', 'viwiki'), 'TRUE', 'FALSE'
) AS is_AB_test_wiki,
  event.is_oversample AS is_oversample,
  event.editor_interface AS editor_interface
FROM event.editattemptstep
WHERE
-- Review data starting a few days prior to the AB test deployment on Feb 11th
-- since deployment
  Year = 2022
  AND ((month = 01 and day >= 27) OR (month = 02))
-- look at only desktop init section events
  AND event.platform = 'desktop'
  AND event.action = 'init'
  AND event.init_type = 'section'
-- only create new section events
  -- remove bots
  AND useragent.is_bot = false
-- only talk page events
  AND event.page_ns % 2 = 1
AND event.bucket in ('test', 'control')
"

#   AND event.init_mechanism IN ('url-new', 'new')

In [95]:
collect_new_section_attempts <- wmfdata::query_hive(query)

Don't forget to authenticate with Kerberos using kinit



# New Section Events By Bucket

In [96]:
#check overall user bucket number to confirm if buckets are balanced

new_sections_by_bucket <- collect_new_section_attempts %>%
  filter(is_ab_test_wiki == 'TRUE') %>% # some test and control events recorded non ab_test_wikis
  group_by(experiment_group) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

new_sections_by_bucket

`summarise()` ungrouping output (override with `.groups` argument)



experiment_group,users,attempts
<chr>,<int>,<int>
control,199,402
test,2814,4925


In [97]:
# By Logged in
new_sections_by_bucket_anon <- collect_new_section_attempts %>%
  filter(is_ab_test_wiki == 'TRUE') %>% # some test and control events recorded non ab_test_wikis
  group_by(experiment_group, user_is_anonymous_bytoken) %>%
  summarise(users = n_distinct(user_id),
        attempts = n_distinct(edit_attempt_id))

new_sections_by_bucket_anon

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,user_is_anonymous_bytoken,users,attempts
<chr>,<chr>,<int>,<int>
control,False,199,402
test,False,1307,3128
test,True,1507,1797


No new section events by logged out users currently. Might be just an artifcat of user behavior - Are logged out users less likely to create a new section?

# Review New Section Events by Logged-Out Users Following Patch

# Notes
* AB test details are only getting logged on the ones with init_type = section. For DT events, that includes section and reply usage (Not sure that's correct). For non DT events, that only includes section edits not page edits.

In [16]:
# Collect all desktop ready events following section init event session
query <-
"

WITH init_events AS (

SELECT
  event.editing_session_id as edit_attempt_id,
  event.init_type as init_type,
  wiki As init_wiki,
  COUNT(*) as init_events
FROM
  event.editattemptstep
WHERE
-- following deployment of patch
    YEAR = 2022
    AND month = 02
    AND day >= 18
-- inits events on desktop
    AND event.platform = 'desktop'
    AND event.action = 'init'
-- only talk pages
    AND event.page_ns % 2 = 1
--by anon not bot users
    AND useragent.is_bot = false
    AND event.user_id = 0 
   AND event.user_class = 'IP'
--- test wikis
    AND wiki IN ('amwiki', 'bnwiki', 'zhwiki', 'nlwiki', 'arzwiki', 'frwiki', 'hewiki', 'hiwiki',
    'idwiki', 'itwiki', 'jawiki', 'kowiki', 'omwiki', 'fawiki', 'plwiki', 'ptwiki', 'eswiki', 'thwiki',
    'ukwiki', 'viwiki')
GROUP BY
    event.editing_session_id,
    event.init_type,
    wiki 
)

SELECT
  event.editing_session_id as edit_attempt_id,
  event.bucket AS experiment_group,
  wiki As wiki,
  event.integration AS integration,
  init_events.init_type AS init_type,
  event.anonymous_user_token,
  IF (event.anonymous_user_token IS NULL, 'false', 'true') AS user_is_anonymous_bytoken,
  event.editor_interface AS editor_interface,
COUNT(*) as ready_events
FROM event.editattemptstep eas
INNER JOIN init_events
ON eas.event.editing_session_id = init_events.edit_attempt_id
AND eas.wiki = init_events.init_wiki
WHERE
  YEAR = 2022
    AND month = 02
    AND day >= 18
-- look at only desktop ready events
  AND event.platform = 'desktop'
  AND event.action = 'ready'
--by anon not bot users
    AND useragent.is_bot = false
    AND event.user_id = 0 
   AND event.user_class = 'IP'
-- only talk page events
  AND event.page_ns % 2 = 1
-- bucketing applied on ready events
AND event.bucket in ('test', 'control')
-- test wikis
AND wiki IN ('amwiki', 'bnwiki', 'zhwiki', 'nlwiki', 'arzwiki', 'frwiki', 'hewiki', 'hiwiki',
    'idwiki', 'itwiki', 'jawiki', 'kowiki', 'omwiki', 'fawiki', 'plwiki', 'ptwiki', 'eswiki', 'thwiki',
    'ukwiki', 'viwiki')
GROUP BY
event.editing_session_id,
event.bucket,
wiki,
event.integration,
init_events.init_type,
event.anonymous_user_token,
IF (event.anonymous_user_token IS NULL, 'false', 'true'),
  event.editor_interface
"
 

In [17]:
collect_new_section_anon <- wmfdata::query_hive(query)

Don't forget to authenticate with Kerberos using kinit



In [26]:
#check overall user bucket number to confirm if buckets are balanced

new_sections_anon_bucket <- collect_new_section_anon  %>%
  group_by(experiment_group, init_type) %>%
  summarise(
        attempts = n_distinct(edit_attempt_id),
        users = n_distinct(anonymous_user_token) )

new_sections_anon_bucket

`summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)



experiment_group,init_type,attempts,users
<chr>,<chr>,<int>,<int>
control,page,1197,1109
control,section,85,83
test,page,1218,1096
test,section,951,826
