# Re test mediawiki_edit_attempt
Following up on the task Instrumentation Data-QA for event.mediawiki_edit_attempt [T320281](https://phabricator.wikimedia.org/T320281).   
The first phase of this testing was conducted in Nov 2022. Several issues were highlighted and discussed. I re tested the data again in January 2023 and found that   

On Desktop    
- server side wikitext editor (all Save and Init) events were not being logged  
- only client side (ready, loaded, firstchange, abort) events are logged 

On Mobile
- some wiki editor Mobile events were being logged using `wt` and `mf` mediawiki extensions i.e. the same events were being logged twice and double counted. After discussing with @phuedx, this seems to be an issue affecting both Metrics Platform edit_attempt instrument as well as legacy editattemptstep instrument and happens when the user navigates to `/wiki/$title?action=edit` on a mobile device.. `/wiki/$title?action=edit` is the url when editing on desktop. on mobile the editing url looks something like .. `/wiki/$title#/editor/section#` so this issue happens with anyone who edits on a mobile device using a desktop edit link.

## Action items from QA test
- get server side events logging on Metrics Platform  
- enable wikitext logging on Metrics platform  

In [1]:
from wmfdata import hive, spark

The check for a newer release of wmfdata failed to complete. Consider checking manually.


### Compare between editattemptstep and mediawiki_edit_attempt
#### 1. wikitext events

In [11]:
wt_eas_query=spark.run('''
SELECT 
  DISTINCT event.action AS action, 
  event.platform AS platform,
  count(1) AS wikitext_events
  FROM event.editattemptstep
  WHERE event.editor_interface = 'wikitext'
  AND year = 2023
  AND month= 1
  AND wiki = 'testwiki' 
  GROUP BY action, platform
''')

PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [100]:
wt_eas_query.sort_values(by=['wikitext_events','action'],ascending=False)

Unnamed: 0,action,platform,wikitext_events
13,init,desktop,3575
3,saveAttempt,desktop,1541
6,saveFailure,desktop,1073
10,ready,desktop,744
9,loaded,desktop,743
16,saveSuccess,desktop,464
1,firstChange,desktop,337
8,abort,desktop,252
4,init,phone,39
14,ready,phone,38


In [9]:
wt_eas_query.wikitext_events.sum()

8882

In [5]:
wt_mea_query= spark.run('''
SELECT 
  DISTINCT name, 
  count(1) AS wikitext_events
  FROM event.mediawiki_edit_attempt
  WHERE custom_data["editor_interface"].value = "wikitext"
  AND year = 2023
  AND month= 1
  AND mediawiki.database = "testwiki"
  GROUP BY name
''')


PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [6]:
wt_mea_query.sort_values(by=['wikitext_events','name'],ascending=False)

Unnamed: 0,name,wikitext_events
3,eas.wt.ready,744
0,eas.wt.loaded,743
6,eas.wt.first_change,337
7,eas.wt.abort,252
4,eas.mf.init,39
2,eas.mf.ready,38
12,eas.mf.loaded,38
11,eas.mf.abort,12
10,eas.mf.save_attempt,7
8,eas.mf.save_success,6


In [10]:
wt_mea_query.wikitext_events.sum()

2229

`event.mediawiki_edit_attemp` has fewer wikitext events than `editattemptstep`. We are missing the following action events from desktop wikitext editor `eas.wt` in `mediawiki_edit_attempt`
- init
- save_attempt
- save_intent
- save_success  

From @cjming and @phuedx: "metrics platform client is only wired in the js for WikiEditor. since the PHP version of the Metrics Platform Client still hasn't been merged, we can't offer perfect parity for EAS"  
From @EChetty: we are blocked by [T281762](https://phabricator.wikimedia.org/T281762) to track the sever-side events.  

Also, `eas.mf` is the mobile frontend extension on mediawiki and `mediawiki_edit_attempt` is capturing all the mobile events correctly. 

### 2. VE events

In [18]:
ve_eas_query=spark.run('''
SELECT 
  DISTINCT event.action AS action, 
  count(1) AS ve_events
  FROM event.editattemptstep
  WHERE event.editor_interface = 'visualeditor'
  AND year = 2023
  AND month= 1
  AND wiki = 'testwiki' 
  GROUP BY action
''')

PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
23/03/02 22:57:31 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
23/03/02 22:57:31 WARN Utils: Service 'sparkDriver' could not bind on port 12000. Attempting port 12001.
23/03/02 22:57:31 WARN Utils: Service 'sparkDriver' could not bind on port 12001. Attempting port 12002.
23/03/02 22:57:31 WARN Utils: Service 'sparkDriver' could not bind on port 12002. Attempting port 12003.
23/03/02 22:57:31 WARN Utils: Service 'sparkDriver' could not bind on port 12003. Attempting port 12004.
23/03/02 22:57:32 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
23/03/02 22:57:32 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
23/03/02 22:57:32 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
23/03/02 22:57:32 WARN U

In [19]:
ve_eas_query.ve_events.sum()

996

In [23]:
ve_mea_query= spark.run('''
SELECT 
  DISTINCT name, 
  count(1) AS ve_events
  FROM event.mediawiki_edit_attempt
  WHERE custom_data["editor_interface"].value = "visualeditor"
  AND year = 2023
  AND month= 1
  AND mediawiki.database = "testwiki"
  GROUP BY name
''')


PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [24]:
ve_mea_query.ve_events.sum()

997

In [27]:
ve_eas_query.sort_values(by=['ve_events','action'],ascending=False)

Unnamed: 0,action,ve_events
2,init,193
3,ready,187
1,loaded,187
5,abort,100
4,firstChange,89
6,saveAttempt,81
0,saveIntent,79
7,saveSuccess,72
8,saveFailure,8


In [28]:
ve_mea_query.sort_values(by=['ve_events','name'],ascending=False)

Unnamed: 0,name,ve_events
18,eas.ve.init,173
22,eas.ve.ready,167
24,eas.ve.loaded,167
9,eas.ve.abort,92
3,eas.ve.first_change,78
23,eas.ve.save_attempt,71
20,eas.ve.save_intent,68
15,eas.ve.save_success,63
7,eas.dt.init,14
19,eas.dt.ready,11


All Visual Editor events are being logged successfully

### 3. Discusstion Tools events

In [30]:
dt_eas_query=spark.run('''
SELECT 
  DISTINCT event.action AS action, 
  count(1) AS dt_events
  FROM event.editattemptstep
  WHERE event.editor_interface = 'visualeditor'
  AND event.integration='discussiontools'
  AND year = 2023
  AND month= 2
  AND wiki = 'testwiki' 
  GROUP BY action
''')

PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [34]:
dt_eas_query.sort_values(by=['dt_events','action'], ascending=False)

Unnamed: 0,action,dt_events
4,init,9
3,ready,7
6,loaded,7
5,firstChange,6
8,saveIntent,4
2,saveAttempt,4
0,abort,4
7,saveSuccess,3
1,saveFailure,1


In [38]:
dt_mea_query= spark.run('''
SELECT 
  DISTINCT name, 
  count(1) AS dt_events
  FROM event.mediawiki_edit_attempt
  WHERE custom_data["editor_interface"].value = "visualeditor"
  AND name like 'eas.dt%'
  AND year = 2023
  AND month= 2
  AND mediawiki.database = "testwiki"
  GROUP BY name
''')

PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [39]:
dt_mea_query.sort_values(by=['dt_events','name'], ascending=False)

Unnamed: 0,name,dt_events
2,eas.dt.init,9
8,eas.dt.ready,7
0,eas.dt.loaded,7
6,eas.dt.first_change,6
7,eas.dt.save_intent,4
1,eas.dt.save_attempt,4
3,eas.dt.abort,4
5,eas.dt.save_success,3
4,eas.dt.save_failure,1


Discussion Tools events are logged successfully 

### 4. Mobile events

In [41]:
mf_eas_query=spark.run('''
SELECT 
  DISTINCT event.action AS action, 
  count(1) AS mobile_events
  FROM event.editattemptstep
  WHERE event.platform = 'phone'
  AND year = 2023
  AND month= 2
  AND wiki = 'testwiki' 
  GROUP BY action
''')

PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
	at java.net.PlainSocketImpl.socketAccept(Native Method)
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
	at java.net.ServerSocket.implAccept(ServerSocket.java:560)
	at java.net.ServerSocket.accept(ServerSocket.java:528)
	at org.apache.spark.api.python.PythonServer$$anon$1.run(PythonRDD.scala:906)
                                                                                

In [44]:
mf_mea_query= spark.run('''
SELECT 
  DISTINCT name, 
  count(1) AS mobile_events
  FROM event.mediawiki_edit_attempt
  WHERE agent.client_platform_family='mobile_browser'
  AND year = 2023
  AND month= 2
  AND mediawiki.database = "testwiki"
  GROUP BY name
''')


PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [46]:
mf_mea_query.sort_values(by=['mobile_events','name'], ascending=False)

Unnamed: 0,name,mobile_events
4,eas.mf.init,76
2,eas.mf.ready,72
11,eas.mf.loaded,71
5,eas.mf.save_intent,42
8,eas.mf.first_change,40
9,eas.mf.save_attempt,36
7,eas.mf.save_success,26
10,eas.mf.abort,22
1,eas.mf.save_failure,6
3,eas.wt.ready,4


In [47]:
mf_eas_query.sort_values(by=['mobile_events','action'], ascending=False)

Unnamed: 0,action,mobile_events
7,init,74
5,ready,69
0,loaded,69
3,saveIntent,42
4,firstChange,40
8,saveAttempt,36
6,saveSuccess,26
1,abort,21
2,saveFailure,6


In [51]:
mf_eas_query.mobile_events.sum()

383

In [101]:
mf_mea_query.mobile_events.sum()

402

Mobile events on `mediawiki_edit_attempt` are slightly higher. This is due to wiki editor events that are logged twice - once from `wt` extension and once from `mf` extension. This is an issue on both instrument platforms and will need to be fixed.  

### 4.1 Mobile wiki editor events

In [56]:
mf_wt_eas_query=spark.run('''
SELECT 
  DISTINCT event.action AS action, 
  count(1) AS mobile_events
  FROM event.editattemptstep
  WHERE event.platform = 'phone'
  AND event.editor_interface='wikitext'
  AND year = 2023
  AND month= 2
  AND wiki = 'testwiki' 
  GROUP BY action
''')

PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [69]:
mf_wt_eas_query.mobile_events.sum()

261

In [70]:
mf_wt_mea_query.mobile_events.sum()

280

In [58]:
mf_wt_mea_query= spark.run('''
SELECT 
  DISTINCT name, 
  count(1) AS mobile_events
  FROM event.mediawiki_edit_attempt
  WHERE agent.client_platform_family='mobile_browser'
  AND custom_data["editor_interface"].value = "wikitext"
  AND year = 2023
  AND month= 2
  AND mediawiki.database = "testwiki"
  GROUP BY name
''')


PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [60]:
mf_wt_mea_query.sort_values(by=['mobile_events','name'], ascending=False)

Unnamed: 0,name,mobile_events
4,eas.mf.init,53
2,eas.mf.ready,51
11,eas.mf.loaded,50
5,eas.mf.save_intent,29
8,eas.mf.first_change,25
9,eas.mf.save_attempt,22
7,eas.mf.save_success,19
10,eas.mf.abort,17
3,eas.wt.ready,4
0,eas.wt.loaded,4


In [71]:
mf_wt_eas_query.sort_values(by=['mobile_events','action'], ascending=False)

Unnamed: 0,action,mobile_events
3,init,51
4,ready,48
8,loaded,48
2,saveIntent,29
7,firstChange,25
0,saveAttempt,22
1,saveSuccess,19
6,abort,16
5,saveFailure,3


We can see that on Mobile, wikitext editor events are slightly higher on Metrics platform table `mediawiki_edit_attempt` than on legacy editattemptstep.   


On looking further we also observed that some Mobile events were being logged using `wt` and `mf` mediawiki extensions i.e. the same events were being logged twice and double counted. After discussing with @phuedx, this seems to be an issue affecting both Metrics Platform edit_attempt instrument as well as legacy editattemptstep instrument. `/wiki/$title?action=edit` this is the url when editing on desktop. on mobile the editing url looks something like `/wiki/$title#/editor/section#` so this issue happens with anyone who edits on a mobile device using a desktop edit link.


### 4.2 Mobile VE events

In [62]:
mf_ve_eas_query=spark.run('''
SELECT 
  DISTINCT event.action AS action, 
  count(1) AS mobile_ve_events
  FROM event.editattemptstep
  WHERE event.platform = 'phone'
  AND event.editor_interface='visualeditor'
  AND event.integration='page' -- restricting to non discussion tools events only
  AND year = 2023
  AND month= 2
  AND wiki = 'testwiki' 
  GROUP BY action
''')

PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [64]:
mf_ve_eas_query.sort_values(by=['mobile_ve_events','action'], ascending=False)

Unnamed: 0,action,mobile_ve_events
4,init,23
1,ready,21
8,loaded,21
6,firstChange,15
3,saveAttempt,14
2,saveIntent,13
0,saveSuccess,7
7,abort,5
5,saveFailure,3


In [67]:
mf_ve_mea_query= spark.run('''
SELECT 
  DISTINCT name, 
  count(1) AS mobile_ve_events
  FROM event.mediawiki_edit_attempt
  WHERE agent.client_platform_family='mobile_browser'
  AND custom_data["editor_interface"].value = "visualeditor"
  --AND name like 'eas.ve%'
  AND year = 2023
  AND month= 2
  AND mediawiki.database = "testwiki"
  GROUP BY name
''')


PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [68]:
mf_ve_mea_query.sort_values(by=['mobile_ve_events','name'], ascending=False)

Unnamed: 0,name,mobile_ve_events
2,eas.mf.init,23
1,eas.mf.ready,21
8,eas.mf.loaded,21
5,eas.mf.first_change,15
6,eas.mf.save_attempt,14
3,eas.mf.save_intent,13
4,eas.mf.save_success,7
7,eas.mf.abort,5
0,eas.mf.save_failure,3


In [72]:
mf_ve_eas_query.mobile_ve_events.sum()

122

In [73]:
mf_ve_mea_query.mobile_ve_events.sum()

122

On Mobile, Visual Editor events between the two tables are consistent (this comparison does not include discussion tool events)

### 5. Desktop events

In [79]:
desktop_eas_query=spark.run('''
SELECT 
  DISTINCT event.action AS action, 
  count(1) AS desktop_events
  FROM event.editattemptstep
  WHERE event.platform = 'desktop'
  AND year = 2023
  AND month= 2
  AND wiki = 'testwiki' 
  GROUP BY action
''')

PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [80]:
desktop_eas_query.sort_values(by=['desktop_events','action'], ascending=False)

Unnamed: 0,action,desktop_events
4,init,2742
0,saveAttempt,1261
2,ready,1040
3,loaded,1039
5,saveSuccess,725
6,saveFailure,536
1,abort,482
8,firstChange,421
7,saveIntent,122


In [75]:
desktop_mea_query= spark.run('''
SELECT 
  DISTINCT name, 
  count(1) AS desktop_events
  FROM event.mediawiki_edit_attempt
  WHERE agent.client_platform_family='desktop_browser'
  AND year = 2023
  AND month= 2
  AND mediawiki.database = "testwiki"
  GROUP BY name
''')


	at java.net.PlainSocketImpl.socketAccept(Native Method)
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
	at java.net.ServerSocket.implAccept(ServerSocket.java:560)
	at java.net.ServerSocket.accept(ServerSocket.java:528)
	at org.apache.spark.api.python.PythonServer$$anon$1.run(PythonRDD.scala:906)
                                                                                

In [76]:
desktop_mea_query.sort_values(by=['desktop_events','name'], ascending=False)

Unnamed: 0,name,desktop_events
3,eas.wt.ready,801
0,eas.wt.loaded,801
6,eas.wt.abort,344
5,eas.wt.first_change,291
19,eas.ve.ready,216
21,eas.ve.loaded,215
16,eas.ve.init,213
9,eas.ve.abort,123
4,eas.ve.first_change,121
18,eas.ve.save_intent,114


In [77]:
desktop_mea_query.desktop_events.sum()

3552

In [81]:
desktop_eas_query.desktop_events.sum()

8368

As expected, number of Desktop events on `editattemptstep` are much higher because Metrics Platform does not log wiki editor events on desktop

### 5.1 Desktop VE events

In [82]:
desktop_ve_eas_query=spark.run('''
SELECT 
  DISTINCT event.action AS action, 
  count(1) AS desktop_ve_events
  FROM event.editattemptstep
  WHERE event.platform = 'desktop'
  AND event.editor_interface='visualeditor'
  AND event.integration='page' -- restricting to non discussion tools events only
  AND year = 2023
  AND month= 2
  AND wiki = 'testwiki' 
  GROUP BY action
''')

PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [83]:
desktop_ve_eas_query.sort_values(by=['desktop_ve_events','action'], ascending=False)

Unnamed: 0,action,desktop_ve_events
4,ready,192
2,loaded,192
1,init,189
5,abort,112
0,firstChange,105
3,saveIntent,96
8,saveAttempt,93
6,saveSuccess,90
7,saveFailure,3


In [84]:
desktop_ve_eas_query.desktop_ve_events.sum()

1072

In [90]:
desktop_ve_mea_query= spark.run('''
SELECT 
  DISTINCT name, 
  count(1) AS desktop_ve_events
  FROM event.mediawiki_edit_attempt
  WHERE agent.client_platform_family='desktop_browser'
  AND custom_data["editor_interface"].value = "visualeditor"
  AND name like 'eas.ve%'
  AND year = 2023
  AND month= 2
  AND mediawiki.database = "testwiki"
  GROUP BY name
''')


PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [91]:
desktop_ve_mea_query.sort_values(by=['desktop_ve_events','name'], ascending=False)

Unnamed: 0,name,desktop_ve_events
6,eas.ve.ready,192
8,eas.ve.loaded,192
4,eas.ve.init,189
2,eas.ve.abort,113
0,eas.ve.first_change,107
5,eas.ve.save_intent,96
7,eas.ve.save_attempt,93
3,eas.ve.save_success,90
1,eas.ve.save_failure,3


In [92]:
desktop_ve_mea_query.desktop_ve_events.sum()

1075

Visual Editor events on desktop are the same in both instruments

### 5.2 Desktop Wikitext events

In [94]:
desktop_wt_eas_query=spark.run('''
SELECT 
  DISTINCT event.action AS action, 
  count(1) AS desktop_events
  FROM event.editattemptstep
  WHERE event.platform = 'desktop'
  AND event.editor_interface='wikitext'
  AND year = 2023
  AND month= 2
  AND wiki = 'testwiki' 
  GROUP BY action
''')

PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
	at java.net.PlainSocketImpl.socketAccept(Native Method)
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
	at java.net.ServerSocket.implAccept(ServerSocket.java:560)
	at java.net.ServerSocket.accept(ServerSocket.java:528)
	at org.apache.spark.api.python.PythonServer$$anon$1.run(PythonRDD.scala:906)
                                                                                

In [95]:
desktop_wt_eas_query.sort_values(by=['desktop_events','action'], ascending=False)

Unnamed: 0,action,desktop_events
5,init,2508
6,saveAttempt,1150
3,ready,805
1,loaded,805
0,saveSuccess,618
4,saveFailure,532
7,abort,347
2,firstChange,291


In [96]:
desktop_wt_eas_query.desktop_events.sum()

7056

In [97]:
desktop_wt_mea_query= spark.run('''
SELECT 
  DISTINCT name, 
  count(1) AS desktop_events
  FROM event.mediawiki_edit_attempt
  WHERE agent.client_platform_family='desktop_browser'
  AND custom_data["editor_interface"].value = "wikitext"
  AND year = 2023
  AND month= 2
  AND mediawiki.database = "testwiki"
  GROUP BY name
''')


PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.
                                                                                

In [98]:
desktop_wt_mea_query.sort_values(by=['desktop_events','name'], ascending=False)

Unnamed: 0,name,desktop_events
1,eas.wt.ready,801
0,eas.wt.loaded,801
3,eas.wt.abort,344
2,eas.wt.first_change,291


In [99]:
desktop_wt_mea_query.desktop_events.sum()

2237

As expected, wikitext events on Desktop dont match because Metrics Platform is not coded for PHP clients. Metrics platform is blocked by [T281762](https://phabricator.wikimedia.org/T281762) to track the sever-side events.   