# Check scale of Reply Tool Usage

[Task](https://phabricator.wikimedia.org/T263050)

## Overview

The reply tool was deployed as an opt-out preference on Arabic, Czech and Hungraian Wikipedias on 24 September 2020. This task is about determining the above so we can decide whether the data we've gathered "thus far" is representative enough to begin work on the workflow engagement metrics.

## Metrics
* Number of events: a count of init events grouped by wiki and user edit count (read: experience level); logged out users should be included as well
* Number of people: a count of the unique people who triggered an init event in the Reply Tool, grouped by wiki and user edit count (read: experience level); logged out users should be included as well

Notes: Review mediawiki tags as well to compate
Questions: Only reviewing ar, czech and hungarian wikis

In [36]:
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
    library(magrittr); library(zeallot); library(glue); library(tidyverse); library(zoo); library(lubridate)
    library(scales)
})

In [100]:
## Collect Dicussion tool init events for opt-out wikis since deployment date

query<- "
SELECT
  event.user_editcount AS edit_count,
  event.user_id AS user,
  wiki AS wiki,
  COUNT(*) AS events
FROM event.editattemptstep
WHERE
  year = 2020 
-- events since deployment date
  AND ((month = 09 and day >= 24) OR (month >= 10))
-- review wikis where deployed as opt-out
  AND wiki IN ('arwiki', 'cswiki', 'huwiki')
  AND event.integration= 'discussiontools'
  AND event.action = 'init'
GROUP BY 
  event.user_editcount,
  event.user_id,
  wiki"

In [101]:
collect_events_optout_wikis <- wmfdata::query_hive(query)

Don't forget to authenticate with Kerberos using kinit



In [102]:
# add column with user edit count buckets

optout_events_with_editcount <- collect_events_optout_wikis %>%
    mutate(edit_count_bucket = case_when(
            edit_count == 0 ~ '0 edits',
            edit_count >=1 & edit_count <= 4 ~ '1-4 edits',
            edit_count >=5 & edit_count <= 99 ~ '5-99 edits',
            edit_count >=100 & edit_count <= 999 ~ '100-999 edits',
            edit_count >=1000 ~ '1000+ edits'))


In [103]:
# Order edit counts
optout_events_with_editcount$edit_count_bucket %<>% 
factor(levels= c("0 edits", "1-4 edits", "5-99 edits", "100-999 edits", "1000+ edits"))


# Number of Events

## Total Number of Events Overall

In [104]:
 #find number of init events by edit count bucket
event_counts_overall <- optout_events_with_editcount %>%
    group_by(edit_count_bucket) %>%
    summarise(num_events = sum(events))

event_counts_overall

`summarise()` ungrouping output (override with `.groups` argument)



edit_count_bucket,num_events
<fct>,<int>
0 edits,58
1-4 edits,80
5-99 edits,75
100-999 edits,42
1000+ edits,364


There have been 619 events to date, most by senior contributors (those with 1000 or more edits). We're only seeing about 213 total events from editors with 100 or less edits.

Note: We do not know the edit count for logged out users. They are all recorded with an edit count of 0 in the data. 

## Number of Events By Wiki 

In [105]:
 #find number of init events by edit count bucket
event_counts_overall <- optout_events_with_editcount %>%
    group_by(wiki) %>%
    summarise(num_events = sum(events))

event_counts_overall

`summarise()` ungrouping output (override with `.groups` argument)



wiki,num_events
<chr>,<int>
arwiki,330
cswiki,97
huwiki,192


Over half of the events come from Arabic Wikipedia. Czech Wikipedia has the lowest number of events (only 97)

## Number of Events By Wiki and Edit Count

In [106]:
 #find number of init events by wiki and user edit count
event_counts <- optout_events_with_editcount %>%
    group_by(wiki, edit_count_bucket) %>%
    summarise(num_events = sum(events)) %>%
    arrange(wiki, edit_count_bucket)
 
event_counts


`summarise()` regrouping output by 'wiki' (override with `.groups` argument)



wiki,edit_count_bucket,num_events
<chr>,<fct>,<int>
arwiki,0 edits,19
arwiki,1-4 edits,28
arwiki,5-99 edits,34
arwiki,100-999 edits,22
arwiki,1000+ edits,227
cswiki,0 edits,7
cswiki,1-4 edits,29
cswiki,5-99 edits,17
cswiki,100-999 edits,9
cswiki,1000+ edits,35


We the see the same trend of most events coming from senior contributors for each target wikipedia.

# Number of People

Note: We cannot determine the distinct number of logged out users as we do not have a unique identifier for them recorded for discussion tool related events. The below numbers represent the total number of logged in users. 

## Total Number of People By Wiki and User Edit Count

In [120]:
## Total Number of People By User Edit Count

user_counts_byeditcount <- optout_events_with_editcount %>%
# unable to identify distinct logged out users so removing from this analysis
    filter(user != 0) %>%
    group_by(wiki, edit_count_bucket) %>%
    summarise(num_users = n_distinct(user))

user_counts_byeditcount

`summarise()` regrouping output by 'wiki' (override with `.groups` argument)



wiki,edit_count_bucket,num_users
<chr>,<fct>,<int>
arwiki,0 edits,5
arwiki,1-4 edits,25
arwiki,5-99 edits,23
arwiki,100-999 edits,13
arwiki,1000+ edits,43
cswiki,0 edits,3
cswiki,1-4 edits,19
cswiki,5-99 edits,11
cswiki,100-999 edits,8
cswiki,1000+ edits,16


There have been 226 distinct users of the reply tool since released as op-out. The majority of users come from Arabic Wikipedia.


Except for Czech Wikipedia, the majority of reply tool users in each target wiki are Senior Contributors with 1000 or more cumulative edits. On Czech Wikipedia, those with 1-4 edits have just a couple more (19 compared to 16 users).

## Logged Out Events

We cannot determine the distinct number of logged out users as we do not have a unique identifier for them; however, we can determine the number of events that come from logged out users

In [121]:
## Total Number of Logged Out Events

user_events_byloggedout <- optout_events_with_editcount %>%
    filter(user == 0) %>%
    group_by(wiki, edit_count_bucket) %>%
    summarise(num_events = sum(events))

user_events_byloggedout

`summarise()` regrouping output by 'wiki' (override with `.groups` argument)



wiki,edit_count_bucket,num_events
<chr>,<fct>,<int>
arwiki,0 edits,11
cswiki,0 edits,4
huwiki,0 edits,30
