# Newcomer Edit Types

We are trying to understand to what extent newcomers switch to making other types of edits, primarily from Add a Link to other types of suggested edits.

In order to understand this, we need to define what we're measuring more specifically and establish a baseline.

The easiest way to do this is to look at tagged edits during the period which all newcomer task types where labelled with edit tags, because then we can distinguish between them. This started in late March 2022. This analysis is done in late July, which means that using MediaWiki history we have data available until the end of June 2022. Working with those constraints, we'll look at article edits within a week after registration, and subsequent article edits within 30 days of the first edit. We'll use all registrations in April 2022 on wikis where Add a Link was deployed, with some of the usual limitations.

## Requiring Additional Sessions

The phab task for this is [T1311888](https://phabricator.wikimedia.org/T311888). The description of the task mentions "another session", but I think trying to determine the number of edit sessions a user had and the number of *article* edits they made in those is broadening the scope of this too much. We will instead make it simple and say they had to make their first article edit within a week of registration, and then we will count all additional article edits over the next 30 days and see if those were only Add a Link, or they moved on to other tasks.

After having gone through and not required a second edit session, I noticed the proportions of users editing again is very high, and I think that might be because users are making additional article edits in quick succession. Based on this, I went back and redefined this so that it only counts additional edits if the user has a *second edit session*.

## Definition Summary

For all wikis where Add a Link is deployed:

* Limit it to accounts registered on that wiki during April 2022.
* Limit it to non-bot accounts.
* Remove any known test accounts using our usual method.
* A "newcomer" is within one week of registration (this will catch 90–95% of newcomers who edit and is a little bit more relaxed than our defintion of activation)
* We will only concern ourselves with newcomers who edited an article within that first week.
* We will tag if the user's first article edit was a suggested edit.
* We will tag if the user's first article edit was an Add a Link edit.
* We will tag if the user's first article edit was a copy edit.

We then count additional edits as follows:

* The set of eligibile users are all users identified using the first set of criteria.
* We are only concerning ourselves with article edits.
* The user has to have a second article edit session, and it has to start within 37 days of registration (7 + 30 days)
* Subsequent edits have to happen within 30 days of the first edit (see above).
* We will tag if the edit was an Add a Link edit.
* We will tag if the edit was a suggested copy edit.
* We will tag if the edit was a suggested edit that was not also an Add a Link Edit or a copy edit.

In [2]:
import datetime as dt

import pandas as pd
import numpy as np

from collections import defaultdict

from wmfdata import mariadb, spark

## Configuration Variables

In [3]:
## First and last date of user registrations, per definition above
start_date = dt.date(2022, 4, 1)
end_date = dt.date(2022, 5, 1)

## List of wikis that have Add a Link deployed during the registration window
wikis = ['arwiki', 'bnwiki', 'cswiki', 'viwiki', 'fawiki',
         'frwiki', 'huwiki', 'plwiki', 'rowiki', 'ruwiki',
         'eswiki']

## The MediaWiki history snapshot we use for any data gathering
mwh_snapshot = '2022-06'

## Lists of known users to ignore (e.g. test accounts and experienced users)
known_users = defaultdict(set)
known_users['cswiki'].update([14, 127629, 303170, 342147, 349875, 44133, 100304, 307410, 439792, 444907,
                              454862, 456272, 454003, 454846, 92295, 387915, 398470, 416764, 44751, 132801,
                              137787, 138342, 268033, 275298, 317739, 320225, 328302, 339583, 341191,
                              357559, 392634, 398626, 404765, 420805, 429109, 443890, 448195, 448438,
                              453220, 453628, 453645, 453662, 453663, 453664, 440694, 427497, 272273,
                              458025, 458487, 458049, 59563, 118067, 188859, 191908, 314640, 390445,
                              451069, 459434, 460802, 460885, 79895, 448735, 453176, 467557, 467745,
                              468502, 468583, 468603, 474052, 475184, 475185, 475187, 475188, 294174,
                              402906, 298011])

known_users['kowiki'].update([303170, 342147, 349875, 189097, 362732, 384066, 416362, 38759, 495265,
                              515553, 537326, 566963, 567409, 416360, 414929, 470932, 472019, 485036,
                              532123, 558423, 571587, 575553, 576758, 360703, 561281, 595100, 595105,
                              595610, 596025, 596651, 596652, 596653, 596654, 596655, 596993, 942,
                              13810, 536529])

known_users['viwiki'].update([451842, 628512, 628513, 680081, 680083, 680084, 680085, 680086, 355424,
                              387563, 443216, 682713, 659235, 700934, 705406, 707272, 707303, 707681, 585762])

known_users['arwiki'].update([237660, 272774, 775023, 1175449, 1186377, 1506091, 1515147, 1538902,
                              1568858, 1681813, 1683215, 1699418, 1699419, 1699425, 1740419, 1759328, 1763990])

## Grab the user IDs of known test accounts so they can be added to the exclusion list

def get_known_users(wiki):
    '''
    Get user IDs of known test accounts and return a set of them.
    '''
    
    username_patterns = ["MMiller", "Zilant", "Roan", "KHarlan", "MWang", "SBtest",
                         "Cloud", "Rho2019", "Test"]

    known_user_query = '''
SELECT user_id
FROM user
WHERE user_name LIKE "{name_pattern}%"
    '''
    
    known_users = set()
    
    for u_pattern in username_patterns:
        new_known = mariadb.run(known_user_query.format(
            name_pattern = u_pattern), wiki)
        known_users = known_users | set(new_known['user_id'])

    return(known_users)
        
for wiki in wikis:
    known_users[wiki] = known_users[wiki] | get_known_users(wiki)

## Helper Functions

In [4]:
def make_known_users_sql(kd, wiki_column, user_column):
    '''
    Based on the dictionary `kd` mapping wiki names to sets of user IDs of known users,
    create a SQL expression to exclude users based on the name of the wiki matching `wiki_column`
    and the user ID not matching `user_column`
    '''
    
    wiki_exp = '''({w_column} = '{wiki}' AND {u_column} NOT IN ({id_list}))'''
    
    expressions = list()

    ## Iteratively build the expression for each wiki
    for wiki_name, wiki_users in kd.items():
        expressions.append(wiki_exp.format(
            w_column = wiki_column,
            wiki = wiki_name,
            u_column = user_column,
            id_list = ','.join([str(u) for u in wiki_users])
        ))
    
    ## We then join all the expressions with an OR, and we're done.
    return(' OR '.join(expressions))
    

## Newcomer Edit Activity Query

In [5]:
edit_query = '''
WITH article_edits AS (
    SELECT
        wiki_db,
        event_user_id,
        event_user_creation_timestamp,
        event_timestamp,
        revision_tags,
        row_number() OVER (PARTITION BY wiki_db, event_user_id ORDER BY event_timestamp) AS edit_number,
        LAG(event_timestamp, 1) OVER (PARTITION BY wiki_db, event_user_id ORDER BY event_timestamp)
            AS prev_edit_timestamp
    FROM wmf.mediawiki_history
    WHERE snapshot = "{wmh_snapshot}"
    AND wiki_db IN ({wiki_list})
    AND event_entity = "revision"
    AND event_type = "create"
    AND event_user_registration_timestamp >= "{start_ts}" -- registration within our time span
    AND event_user_registration_timestamp < "{end_ts}"
    -- one week + 30 days is the max time between edits
    AND unix_timestamp(event_timestamp) - unix_timestamp(event_user_creation_timestamp) < 86400*37
    AND size(event_user_is_bot_by_historical) = 0 -- not a bot
    AND ({known_event_userid_expression})
    AND page_namespace = 0 -- article edits
),
edit_sessions AS(
    SELECT
        wiki_db,
        event_user_id,
        event_timestamp,
        row_number() OVER (PARTITION BY wiki_db, event_user_id ORDER BY event_timestamp) AS session_number
    FROM article_edits
    WHERE (prev_edit_timestamp IS NULL -- first edit is always a new session
           OR unix_timestamp(event_timestamp) - unix_timestamp(prev_edit_timestamp) >= 3600) -- one hour is a new session
),
second_sessions AS (
    SELECT
        *
    FROM edit_sessions
    WHERE session_number = 2
),
first_edits AS (
    SELECT
        article_edits.wiki_db,
        article_edits.event_user_id,
        article_edits.event_timestamp,
        1 AS first_article_edit,
        IF(ARRAY_CONTAINS(revision_tags, "newcomer task"), 1, 0) AS newcomer_task_edit,
        IF(ARRAY_CONTAINS(revision_tags, "newcomer task add link"), 1, 0) AS add_link_edit,
        IF(ARRAY_CONTAINS(revision_tags, "newcomer task copyedit"), 1, 0) AS copyedit_edit
    FROM article_edits
    WHERE edit_number = 1
    -- first edit has to be within a week of registration
    AND unix_timestamp(event_timestamp) - unix_timestamp(event_user_creation_timestamp) < 86400*7 
),
other_edits AS (
    SELECT
        wmh.wiki_db,
        wmh.event_user_id,
        COUNT(1) AS num_additional_article_edits,
        COUNT(IF(ARRAY_CONTAINS(revision_tags, "newcomer task"), 1, NULL)) AS num_additional_newcomer_task_edits,
        COUNT(IF(ARRAY_CONTAINS(revision_tags, "newcomer task add link"), 1, NULL)) AS num_additional_add_link_edits,
        COUNT(IF(ARRAY_CONTAINS(revision_tags, "newcomer task copyedit"), 1, NULL)) AS num_additional_copyedit_edits,
        COUNT(IF(ARRAY_CONTAINS(revision_tags, "newcomer task")
                 AND NOT ARRAY_CONTAINS(revision_tags, "newcomer task add link")
                 AND NOT ARRAY_CONTAINS(revision_tags, "newcomer task copyedit"), 1, NULL)) AS num_additional_other_task_edits
    FROM wmf.mediawiki_history AS wmh
    JOIN first_edits
    ON wmh.wiki_db = first_edits.wiki_db
    AND wmh.event_user_id = first_edits.event_user_id
    JOIN second_sessions
    ON wmh.wiki_db = second_sessions.wiki_db
    AND wmh.event_user_id = second_sessions.event_user_id
    WHERE snapshot = "{wmh_snapshot}"
    AND wmh.wiki_db IN ({wiki_list})
    AND event_entity = "revision"
    AND event_type = "create"
    -- only count edits up to 30 days after the first edit
    AND unix_timestamp(wmh.event_timestamp) - unix_timestamp(first_edits.event_timestamp) < 86400*30
    -- only count edits starting from the second session
    AND wmh.event_timestamp >= second_sessions.event_timestamp
    -- no bots and known users, and only count article edits
    AND size(event_user_is_bot_by_historical) = 0 -- not a bot
    AND ({known_wmh_event_userid_expression})
    AND page_namespace = 0 -- article edits
    GROUP BY wmh.wiki_db, wmh.event_user_id
)
SELECT
    first_edits.*,
    IF(other_edits.event_user_id IS NOT NULL, 1, 0) AS edited_again,
    COALESCE(other_edits.num_additional_article_edits, 0) AS num_additional_article_edits,
    COALESCE(other_edits.num_additional_newcomer_task_edits, 0) AS num_additional_newcomer_task_edits,
    COALESCE(other_edits.num_additional_add_link_edits, 0) AS num_additional_add_link_edits,
    COALESCE(other_edits.num_additional_copyedit_edits, 0) AS num_additional_copyedit_edits,
    COALESCE(other_edits.num_additional_other_task_edits, 0) AS num_additional_other_task_edits
FROM first_edits
LEFT JOIN other_edits
ON first_edits.wiki_db = other_edits.wiki_db
AND first_edits.event_user_id = other_edits.event_user_id
'''

In [6]:
user_edits = spark.run(
    edit_query.format(
        wmh_snapshot = mwh_snapshot,
        start_ts = start_date.strftime('%Y-%m-%d %H:%M:%S'),
        end_ts = end_date.strftime('%Y-%m-%d %H:%M:%S'),
        wiki_list = ','.join(['"{}"'.format(w) for w in wikis]),
        known_event_userid_expression = make_known_users_sql(known_users, 'wiki_db', 'event_user_id'),
        known_wmh_event_userid_expression = make_known_users_sql(known_users, 'wmh.wiki_db', 'wmh.event_user_id')
    ),
    session_type = 'yarn-regular',
    extra_settings = {'spark.executor.memory': '12g'}
)

PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.


PYSPARK_PYTHON=/usr/lib/anaconda-wmf/bin/python3


SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/08/08 20:32:59 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
22/08/08 20:32:59 WARN Utils: Service 'sparkDriver' could not bind on port 12000. Attempting port 12001.
22/08/08 20:32:59 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
22/08/08 20:33:08 WARN U

## Data Quality Check

In [None]:
user_edits.loc[(user_edits['wiki_db'] == 'frwiki') & (user_edits['newcomer_task_edit'] == 1)].head()

I've verified the data by comparing it to MediaWiki's databases, fixed a few bugs in the query, and made an update to the specification. As far as I can tell, the numbers now make sense.

## Baseline Editing

We want to know, for users whose first article edit is not a suggested task edit, what is the probability that they make at least one more article edit that is also not a suggested task?

Number of users whose first article edit is not a suggested edit:

In [8]:
len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 0)
    ]
)

17534

And of those, how many made additional edits that were also not suggested edits?

In [28]:
len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 0) &
        (user_edits['num_additional_article_edits'] > 0)
    ]
)

4118

What percentage does that correspond to?

In [29]:
round(100 * 
      len(
          user_edits.loc[
              (user_edits['newcomer_task_edit'] == 0) &
              (user_edits['num_additional_article_edits'] > 0)
          ]
      ) / len(
          user_edits.loc[
              (user_edits['newcomer_task_edit'] == 0)
          ]

      ), 1
)

23.5

## Newcomer Tasks Editing

In [30]:
len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1)
    ]
)

1905

How many of those make additional edits?

In [9]:
len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0)
    ]
)

483

What proportion is that?

In [32]:
round(100 * len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0)
    ]
) / len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1)
    ]
), 1)      

25.4

That's a "roughly comparable" proportion. Let's get the total, and also look at users who started with an Add a Link Edit.

In [33]:
len(user_edits)

19439

In [35]:
len(
    user_edits.loc[
        (user_edits['num_additional_article_edits'] > 0)
    ]
)

4601

In [34]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['num_additional_article_edits'] > 0)
    ]
)
      
      / len(user_edits
), 1)

23.7

In [37]:
len(user_edits.loc[(user_edits['add_link_edit'] == 1)])

984

In [38]:
len(user_edits.loc[(user_edits['add_link_edit'] == 1) &
                   (user_edits['num_additional_article_edits'] > 0)])

246

In [39]:
round(100 * 
      len(user_edits.loc[(user_edits['add_link_edit'] == 1) &
                   (user_edits['num_additional_article_edits'] > 0)])
      / len(user_edits.loc[(user_edits['add_link_edit'] == 1)]),
      1)

25.0

What about users who didn't start out with a Newcomer Tasks edit. How many of those did or did not make Newcomer Tasks edits later? Since we didn't split this on treatment/control, users who start out with a different edit might still have access to the Newcomer Homepage and make Newcomer Tasks edits later.

In [90]:
len(user_edits.loc[(user_edits['newcomer_task_edit'] == 0) &
                   (user_edits['num_additional_article_edits'] > 0) &
                   (user_edits['num_additional_newcomer_task_edits'] > 0)])

189

In [91]:
round(100 * 
      len(user_edits.loc[(user_edits['newcomer_task_edit'] == 0) &
                   (user_edits['num_additional_article_edits'] > 0) &
                   (user_edits['num_additional_newcomer_task_edits'] > 0)])
      / len(user_edits.loc[(user_edits['newcomer_task_edit'] == 0) &
                   (user_edits['num_additional_article_edits'] > 0)]),
      1)

4.6

## Add a Link Editing

In [40]:
len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1)
    ]
)

984

In [42]:
len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == 0)
    ]
)

85

In [43]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == 0)
    ]
)
      
      / len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0)
    ]
), 1)

34.6

The remaining users made some newcomer task edits:

In [57]:
len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_article_edits'] == user_edits['num_additional_newcomer_task_edits'])
    ]
)

69

In [60]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_article_edits'] == user_edits['num_additional_newcomer_task_edits'])
    ]
)
      
      / len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0)
    ]
), 1)

28.0

A combination of Newcomer Tasks edits and other article edits:

In [62]:
len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] > 0) &
        (user_edits['num_additional_article_edits'] != user_edits['num_additional_newcomer_task_edits'])
    ]
)

92

In [63]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] > 0) &
        (user_edits['num_additional_article_edits'] != user_edits['num_additional_newcomer_task_edits'])
    ]
)
      / len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0)
    ]
), 1)

37.4

## Newcomers Who Only Made Newcomer Tasks Edits 

What kind of tasks did they do? Did they only do Add a Link edits? Only copy edits? Only other types of edits? Some combination of these?

In [65]:
len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_article_edits'] == user_edits['num_additional_add_link_edits'])
    ]
)

31

In [66]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_article_edits'] == user_edits['num_additional_add_link_edits'])
    ]
)     
      / len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits'])
    ]
)     , 1)

44.9

Only copy edits:

In [67]:
len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_article_edits'] == user_edits['num_additional_copyedit_edits'])
    ]
)

5

In [68]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_article_edits'] == user_edits['num_additional_copyedit_edits'])
    ]
)     
      / len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits'])
    ]
)     , 1)

7.2

Only other types of Newcomer Tasks:

In [69]:
len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_article_edits'] == user_edits['num_additional_other_task_edits'])
    ]
)

1

In [70]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_article_edits'] == user_edits['num_additional_other_task_edits'])
    ]
)     
      / len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits'])
    ]
)     , 1)

1.4

A combination of Add a Link and copy editing, the two default tasks:

In [78]:
len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_add_link_edits'] > 0) &
        (user_edits['num_additional_article_edits'] != user_edits['num_additional_add_link_edits']) &
        (user_edits['num_additional_article_edits'] == (user_edits['num_additional_add_link_edits'] +
                                                        user_edits['num_additional_copyedit_edits']))
    ]
)

20

In [79]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_add_link_edits'] > 0) &
        (user_edits['num_additional_article_edits'] != user_edits['num_additional_add_link_edits']) &
        (user_edits['num_additional_article_edits'] == (user_edits['num_additional_add_link_edits'] +
                                                        user_edits['num_additional_copyedit_edits']))
    ]
)
      / len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits'])
    ]
)     , 1)

29.0

A combination of Add a Link and other tasks:

In [80]:
len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_add_link_edits'] > 0) &
        (user_edits['num_additional_article_edits'] != user_edits['num_additional_add_link_edits']) &
        (user_edits['num_additional_article_edits'] == (user_edits['num_additional_add_link_edits'] +
                                                        user_edits['num_additional_other_task_edits']))
    ]
)

5

In [81]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_add_link_edits'] > 0) &
        (user_edits['num_additional_article_edits'] != user_edits['num_additional_add_link_edits']) &
        (user_edits['num_additional_article_edits'] == (user_edits['num_additional_add_link_edits'] +
                                                        user_edits['num_additional_other_task_edits']))
    ]
)
      / len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits'])
    ]
)     , 1)

7.2

A combination of copy editing and other tasks:

In [82]:
len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_copyedit_edits'] > 0) &
        (user_edits['num_additional_article_edits'] != user_edits['num_additional_copyedit_edits']) &
        (user_edits['num_additional_article_edits'] == (user_edits['num_additional_copyedit_edits'] +
                                                        user_edits['num_additional_other_task_edits']))
    ]
)

0

In [83]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_copyedit_edits'] > 0) &
        (user_edits['num_additional_article_edits'] != user_edits['num_additional_copyedit_edits']) &
        (user_edits['num_additional_article_edits'] == (user_edits['num_additional_copyedit_edits'] +
                                                        user_edits['num_additional_other_task_edits']))
    ]
)
      / len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits'])
    ]
)     , 1)

0.0

A combination of all three:

In [84]:
len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_add_link_edits'] > 0) &
        (user_edits['num_additional_copyedit_edits'] > 0) &
        (user_edits['num_additional_other_task_edits'] > 0) &
        (user_edits['num_additional_article_edits'] == (user_edits['num_additional_add_link_edits'] +
                                                        user_edits['num_additional_copyedit_edits'] +
                                                        user_edits['num_additional_other_task_edits'])) 
    ]
)

7

In [85]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits']) &
        (user_edits['num_additional_add_link_edits'] > 0) &
        (user_edits['num_additional_copyedit_edits'] > 0) &
        (user_edits['num_additional_other_task_edits'] > 0) &
        (user_edits['num_additional_article_edits'] == (user_edits['num_additional_add_link_edits'] +
                                                        user_edits['num_additional_copyedit_edits'] +
                                                        user_edits['num_additional_other_task_edits'])) 
    ]
)
      / len(
    user_edits.loc[
        (user_edits['add_link_edit'] == 1) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == user_edits['num_additional_article_edits'])
    ]
)     , 1)

10.1

## Non-Add a Link Newcomer Tasks Editing

In [10]:
len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1) &
        (user_edits['add_link_edit'] == 0)
    ]
)

921

In [11]:
len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1) &
        (user_edits['add_link_edit'] == 0) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == 0)
    ]
)

106

In [12]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1) &
        (user_edits['add_link_edit'] == 0) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] == 0)
    ]
)
      
      / len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1) &
        (user_edits['add_link_edit'] == 0) &
        (user_edits['num_additional_article_edits'] > 0)
    ]
), 1)

44.7

The remaining users made some newcomer task edits:

In [13]:
len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1) &
        (user_edits['add_link_edit'] == 0) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_article_edits'] == user_edits['num_additional_newcomer_task_edits'])
    ]
)

44

In [14]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1) &
        (user_edits['add_link_edit'] == 0) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_article_edits'] == user_edits['num_additional_newcomer_task_edits'])
    ]
)
      
      / len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1) &
        (user_edits['add_link_edit'] == 0) &
        (user_edits['num_additional_article_edits'] > 0)
    ]
), 1)

18.6

A combination of Newcomer Tasks edits and other article edits:

In [15]:
len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1) &
        (user_edits['add_link_edit'] == 0) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] > 0) &
        (user_edits['num_additional_article_edits'] != user_edits['num_additional_newcomer_task_edits'])
    ]
)

87

In [16]:
round(100 * 
      len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1) &
        (user_edits['add_link_edit'] == 0) &
        (user_edits['num_additional_article_edits'] > 0) &
        (user_edits['num_additional_newcomer_task_edits'] > 0) &
        (user_edits['num_additional_article_edits'] != user_edits['num_additional_newcomer_task_edits'])
    ]
)
      / len(
    user_edits.loc[
        (user_edits['newcomer_task_edit'] == 1) &
        (user_edits['add_link_edit'] == 0) &
        (user_edits['num_additional_article_edits'] > 0)
    ]
), 1)

36.7