This notebook computes a quarterly report on core metrics.

In [6]:
import numpy as np
import pandas as pd
import src.content as content

import src.utils as utils

In [7]:
metrics = utils.load_all_metric_files()

To Pandas, "Q-JUN" means quarters in a year ending in June and "2023Q4" means Q4 in the 2022-23 fiscal year, _not_ in the 2023-24 fiscal year. This is consistent with the guidance on [officewiki:Quarters](https://office.wikimedia.org/wiki/Quarters).

In [4]:
# Pad the ends of the metrics with months to ensure that only full quarters are represented.
# This way, when we resample to quarterly averages, we can get null values for quarters where
# some months have null data.
first_quarter = metrics.index[0].asfreq("Q-JUN")
last_quarter = metrics.index[-1].asfreq("Q-JUN")
new_index = pd.period_range(first_quarter.start_time, last_quarter.end_time, freq="M")

quarterly_averages = (
    metrics
    .reindex(new_index)
    .resample("Q-JUN")
    .aggregate(
        # We need the lambda function because a plain "mean" would get translated
        # into PeriodIndexResampler.mean, which has no option to retain NaNs (which
        # allows us to report NaNs rather than misleading quarterly values based
        # on partial data)
        lambda x: x.mean(skipna=False)
    )
)

# This automatically picks the latest quarter with at least a month of data as the
# reporting period but you can replace the line to manually specify any period you choose.
# For example: `quarter_to_report = pd.Period("2023Q4", freq="Q-JUN")`
quarter_to_report = quarterly_averages.index[-1]

If the table is missing values, it's likely that some data is missing (such as the data for the last month in the quarter). Check the data files in the "data" directory to investigate.

In [17]:
core_metrics = [
    "unique_devices",
    "south_asia_unique_devices",
    "latin_america_caribbean_unique_devices",
    "north_america_unique_devices",
    "northern_western_europe_unique_devices"
]

(
    quarterly_averages
    .reindex(core_metrics, axis="columns")
    .apply(utils.calc_rpt, reporting_period=quarter_to_report)
    .transpose()
    .pipe(utils.format_report, metrics_type="core", reporting_period=quarter_to_report)
)

Unnamed: 0_level_0,2024Q1 core metrics,2024Q1 core metrics,2024Q1 core metrics
Unnamed: 0_level_1,value,year_over_year_change,naive_forecast
unique_devices,1.53 B,-0.1%,1.62 B
south_asia_unique_devices,160.0 M,5.8%,156.0 M
latin_america_caribbean_unique_devices,136.0 M,-8.6%,145.0 M
north_america_unique_devices,309.0 M,5.5%,328.0 M
northern_western_europe_unique_devices,357.0 M,1.3%,385.0 M


In [5]:
#     """
#    The quarterly metrics for the content gap data is calculated as the average Month over Month change in new articles, for each category in
#    the regional/gender data, as a proportion of the average new articles. The naive forecast is calculated as the rate of change
#    from the same quarter last year to the subsequent quarter multiplied by the current quarter's metric.  
#    If data is unavailable for any months of a quarter, then the average of the available months is calculated.
   
#     """

minorities = [
    "underrepresented_regions_net_new_articles_sum", 
    "gender_minorities_net_new_articles_sum"
]

totals = [
    "all_regions_net_new_articles_sum",
    "all_genders_net_new_articles_sum"
]

index_names = [
    '% of new articles about underrepresented regions', 
    '% of new articles about gender minorities'
]

#  Checks if data for the final month of current quarter is missing so anyone running this notebook
#  knows that average was calculated using the available months. This will have to be re-run once the current month's data 
#  becomes availble to get the average of the complete month.
quarterly_averages = content.check_for_incomplete_quarterly_data(metrics,new_index)

(
    quarterly_averages
    .pipe(content.calc_content_rpt, quarter_to_report, minorities, totals, index_names)
    .transpose()
    .pipe(utils.format_report, metrics_type="content gap metrics", reporting_period=quarter_to_report)
)

Unnamed: 0_level_0,2024Q1 content gap metrics metrics,2024Q1 content gap metrics metrics
Unnamed: 0_level_1,value,naive forecast
% of new articles about underrepresented regions,34.3%,36.0%
% of new articles about gender minorities,24.2%,20.8%
