# Introduction


[Instrumentation Ticket](https://phabricator.wikimedia.org/T307572) | [QA Ticket](https://phabricator.wikimedia.org/T348613)

# Instrumentation note


Inuka team deployed the instrumentation to track user activities for Wiki Highlights experiment in November 30 2023. The related events will be stored in `event.inuka_wiki_highlights_experiment` schema.

In [1]:
from wmfdata import hive, spark
import wmfdata 

import math
import pandas as pd
import numpy as np

from datetime import datetime, timedelta, date




You are using Wmfdata v2.0.0, but v2.0.1 is available.

To update, run `pip install --upgrade git+https://github.com/wikimedia/wmfdata-python.git@release`.

To see the changes, refer to https://github.com/wikimedia/wmfdata-python/blob/release/CHANGELOG.md.


In [2]:
spark_session = wmfdata.spark.create_session(type='yarn-large')  

SPARK_HOME: /usr/lib/spark3
Using Hadoop client lib jars at 3.2.0, provided by Spark.
PYSPARK_PYTHON=/opt/conda-analytics/bin/python3


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/12/04 08:34:34 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
23/12/04 08:34:34 WARN Utils: Service 'sparkDriver' could not bind on port 12000. Attempting port 12001.
23/12/04 08:34:34 WARN Utils: Service 'sparkDriver' could not bind on port 12001. Attempting port 12002.
23/12/04 08:34:34 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
23/12/04 08:34:34 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
23/12/04 08:34:43 WARN Utils: Service 'org.apache.spark.network.netty.NettyBlockTransferService' could not bind on port 13000. Attempting port 13001.
23/12/04 08:34:43 WARN Utils: Service 'org.apache.spark.network.netty.NettyBlockTransferService' could not bind on 

## QA on 2023-11-30

# Events by event types

In [3]:
spark.run("""
    SELECT event_type, count(1)
    FROM event.inuka_wiki_highlights_experiment
    GROUP BY event_type
"""
)

23/12/04 08:53:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
                                                                                

Unnamed: 0,event_type,count(1)
0,pageLoaded,64
1,pageUnloaded,42


2 types of events: pageLoaded and pageUnloaded are captured. In each session, there are more or equal numbers of pageLoaded than pageUnloaded events.

# Events by experiment groups

In [4]:
spark.run("""
    SELECT experiment_group, count(1)
    FROM event.inuka_wiki_highlights_experiment
    GROUP BY experiment_group
"""
)

                                                                                

Unnamed: 0,experiment_group,count(1)
0,control,32
1,experiment,74


In [None]:
spark.run("""
    SELECT session_id, COUNT(DISTINCT experiment_group) AS group
    FROM event.inuka_wiki_highlights_experiment
    GROUP BY session_id
    HAVING group > 1
"""
)

2 types of experiment groups are captured. There's no session assigned to both groups.

# Events that reached page bottom

In [7]:
spark.run("""
    SELECT page_bottom_was_visible, event_type, count(1)
    FROM event.inuka_wiki_highlights_experiment
    GROUP BY page_bottom_was_visible, event_type
"""
)

                                                                                

Unnamed: 0,page_bottom_was_visible,event_type,count(1)
0,True,pageUnloaded,7
1,,pageLoaded,64
2,False,pageUnloaded,35


page_bottom_was_visible applied only to pageUnloaded events.

# Check time length 

In [12]:
spark.run("""
    SELECT event_type, time_length_ms, count(1)
    FROM event.inuka_wiki_highlights_experiment
    GROUP BY event_type, time_length_ms
"""
)

                                                                                

Unnamed: 0,event_type,time_length_ms,count(1)
0,pageUnloaded,1833.0,1
1,pageUnloaded,5268.0,1
2,pageUnloaded,6060.0,1
3,pageUnloaded,12353.0,1
4,pageUnloaded,694.0,1
5,pageLoaded,,64
6,pageUnloaded,73863.0,1
7,pageUnloaded,35401.0,1
8,pageUnloaded,38676.0,1
9,pageUnloaded,3234.0,1


Time length captured. Applied only to pageUnloaded events.

# Events by page name

In [11]:
spark.run("""
    SELECT event_type, session_id, page_name, count(1)
    FROM event.inuka_wiki_highlights_experiment
    GROUP BY event_type, session_id, page_name
    ORDER BY session_id
"""
)

                                                                                

Unnamed: 0,event_type,session_id,page_name,count(1)
0,pageLoaded,0db1318037934b888fc89d4cf0d62fc0,categories_highlights,1
1,pageLoaded,0fbde894ecc74ec5ace571a14df75c02,categories_articles,1
2,pageUnloaded,0fbde894ecc74ec5ace571a14df75c02,Nelson Mandela,1
3,pageLoaded,0fbde894ecc74ec5ace571a14df75c02,Elephant,1
4,pageLoaded,0fbde894ecc74ec5ace571a14df75c02,Nelson Mandela,1
...,...,...,...,...
71,pageUnloaded,feb63ae75d8545b6877809e6cf6aa511,categories_highlights,1
72,pageLoaded,feb63ae75d8545b6877809e6cf6aa511,categories_highlights,1
73,pageUnloaded,feb63ae75d8545b6877809e6cf6aa511,Comics,1
74,pageLoaded,feb63ae75d8545b6877809e6cf6aa511,Baseball,1


Page names are captured. Each pageUnloaded event has corresponded pageloaded event with the same page name within the same session.