## Overall event counts
There are a number of obvious problems that are obvious right away from looking at the overall event counts:
* Essentially no events of any kind were logged from phone editors.
* Large number of visual editor (VE) `init` events with platform `other`, which seem to be misclassified `desktop` events.
* Large number of 2017 wikitext editor (2017 WTE) `init` and `ready` events with platform `other`, which are probably misclassified `desktop` events.

The 2010 wikitext editor didn't log any `saveIntent` events, but that's intentional, since it doesn't have a corresponding step in the save workflow.

In [None]:
events.pivot_table("id", index="action", columns=["editor", "platform"], aggfunc=len, fill_value=0).sort_index()

## Phone events
The phone events seem to have stopped in late June or early July 2018.

In [None]:
phone = mariadb.run("""
select
    left(timestamp, 6) as month,
    sum(event_editor = "wikitext") as wikitext_events,
    sum(event_editor = "visualeditor") as visualeditor_events
from log.Edit_17541122
where
    event_platform = "phone"
group by left(timestamp, 6)
""", host = "logs")

In [None]:
phone

We thought this was fixed by [T202786](https://phabricator.wikimedia.org/T202786), the fix for which started rolling out on 28 August. However, we haven't seen any events start flowing in since.

In [None]:
mariadb.run("""
select
    left(timestamp, 8) as day,
    sum(event_platform = "phone") as phone_events,
    sum(event_platform = "desktop") as desktop_events
from log.Edit_17541122
where
    timestamp >= "20180828" 
group by left(timestamp, 8)
""", host="logs")

There have been a few recent validation errors, but not nearly enough to account for the roughly 500 000 missing events every month. So it seems like the mobile events are just not getting sent.

In [None]:
edit_errors = hive.run("""
select
    date_format(from_unixtime(timestamp), "YYYY-MM") as month,
    count(*) as errors
from event.eventerror
where
    year = 2018 and
    month >= 5 and
    event.schema = "Edit"
group by date_format(from_unixtime(timestamp), "YYYY-MM")
""")

edit_errors.sort_values("month").head()

If we go back to before the events dropped off, we can also identify any underlying problems in the mobile edit data.

In [None]:
may_phone_r = mariadb.run("""
select *
from log.Edit_17541122
where
    timestamp between "201805" and "201806" and
    event_platform = "phone"
""", host="logs")

In [None]:
# Output hidden for privacy.
may_phone_r.head()

* No `loaded` or `saveFailure` events recorded for mobile VE.
* No `loaded` events recorded for the mobile WTE.
* There are unusually few `abort` events for both mobile editors. On the desktop editors, the combined number of `abort` and `saveAttempt` events roughly match the number of `ready` events, but that isn't the case here. 

In [None]:
may_phone_r.pivot_table("id", index="event_action", columns="event_editor", aggfunc=len, fill_value=0)

## Non-logged fields
A lot of fields have mostly null values. These all seem to be as expected.

In [None]:
null_prop = lambda ser: ser.isnull().sum() / len(ser) 

events.apply(null_prop).sort_values(ascending=False).loc[lambda x: x >= 0.1]

For example, `inittype` and `initmechanism` only apply to `init` events, but are present for all of them.

In [None]:
inits = events.query("action == 'init'")

In [None]:
null_prop(inits["inittype"])

In [None]:
null_prop(inits["initmechanism"])

Likewise, the proportion of `ready` events without a `readytiming` value is extremely low.

In [None]:
readies = events.query("action == 'ready'")

In [None]:
readytiming_null_prop = null_prop(readies["readytiming"])
"{:,.5f}%".format(readytiming_null_prop * 100)

## Number of events by session
The distribution of events per session is generally as expected, with one exception:

Roughly 3% of sessions have more than 1 `loaded` and `ready` event.

In [None]:
def calc_dist(ser):
    bins = [0, 1, 2, 10, 100, 1000]
    cut_ser = pd.cut(ser, bins, right=False)
    return cut_ser.value_counts(normalize=True).sort_index().apply(
        lambda x:"{:,.2f}%".format(x * 100)
    )

action_names = ["init", "loaded", "ready", "abort", "saveIntent", "saveAttempt", "saveSuccess", "saveFailure"]
dists = [calc_dist(sessions[name + "_count"]) for name in action_names]
pd.concat(dists, axis=1)

This is the same 3%; the number of `ready` and `loaded` events in a session are extremely well correlated.

In [None]:
plt.scatter(sessions["loaded_count"], sessions["ready_count"], s=1, alpha=0.25);
plt.xlabel("number of loaded events")
plt.ylabel("number of ready events");

The reason seems to be that sessions frequently feature repeated pairs of `loaded` and `ready` events. 3 of the top 20 session patterns feature at least two `loaded`–`ready` cycles

In [None]:
pd.DataFrame(sessions["actions"].value_counts()).head(20)

## Editor switch sessions
There are very few sessions that involve editor switches, because only switches to or from the 2017 wikitext editor (currently in opt-in beta) are actually recorded within a single session. Switching to or from the 2010 wikitext editor involves a page reload and therefore the initiation of an entirely new session.

In [None]:
sessions["editor"].value_counts()