Data Analysis Task
Discovery (and other teams within the Foundation) rely on event logging (EL) to track a variety of performance and usage metrics to help us make decisions. Specifically, Discovery is interested in:
- clickthrough rate: the proportion of search sessions where the user clicked on one of the results displayed
- zero results rate: the proportion of searches that yielded 0 results
You must create a reproducible report* answering the following questions:
- What is our daily overall clickthrough rate? How does it vary between the groups?
- Which results do people tend to try first? How does it change day-to-day?
- What is our daily overall zero results rate? How does it vary between the groups?
- Let session length be approximately the time between the first event and the last event in a session. Choose a variable from the dataset and describe its relationship to session length. Visualize the relationship.
- Summarize your findings in an executive summary.
* Given dependencies and other instructions, we should be able to re-run your source code with the dataset in the same directory and obtain the same results and figures. Popular formats for this include RMarkdown and Jupyter Notebook (formerly IPython).
Note: if you submit your report as a Jupyter/IPython notebook on Greenhouse, please upload a copy to GitHub and include the link when you submit it on Greenhouse.
The dataset comes from a tracking schema that we use for assessing user satisfaction. Desktop users are randomly sampled to be anonymously tracked by this schema which uses a "I'm alive" pinging system that we can use to estimate how long our users stay on the pages they visit. The dataset contains just a little more than a week of EL data.
|uuid||string||Universally unique identifier (UUID) for backend event handling.|
|timestamp||integer||The date and time (UTC) of the event, formatted as YYYYMMDDhhmmss.|
|session_id||string||A unique ID identifying individual sessions.|
|group||string||A label ("a" or "b").|
|action||string||Identifies in which the event was created. See below.|
|checkin||integer||How many seconds the page has been open for.|
|page_id||string||A unique identifier for correlating page visits and check-ins.|
|n_results||integer||Number of hits returned to the user. Only shown for searchResultPage events.|
|result_position||integer||The position of the visited page's link on the search engine results page (SERP).|
The following are possible values for an event's action field:
- searchResultPage: when a new search is performed and the user is shown a SERP.
- visitPage: when the user clicks a link in the results.
- checkin: when the user has remained on the page for a pre-specified amount of time.
This user's search query returned 7 results, they clicked on the first result, and stayed on the page between 40 and 50 seconds. (The next check-in would have happened at 50s.)