Skip to content

v0.2.0 - Stata helpers, R toolkit expansion, CSO rebrand, workflow diagrams

Choose a tag to compare

@jpazvd jpazvd released this 24 May 22:06
· 228 commits to main since this release
b78b819

First substantive release beyond the v0.1.0-rc1 scaffold. Promotes
everything that landed on develop today (PRs #1, #2, #3) to main
via release PR #4.

R helpers (r/R/)

Seven new files:

  • aggregate_data.R — original aggregate_data() (mean / weighted_mean, optional global aggregate, population + country coverage).
  • aggregate_data_v2.Raggregate_data_v2() with weighted_mean / mean / sum / proportion, coverage threshold, metadata columns. Ships generate_agg_footnote() and apply_time_window().
  • generate_markdown_report.Rgenerate_markdown_report() + process_all_csv_files() — descriptive-stats Markdown reports from CSV files.
  • create_sector_script.Rcreate_sector_script(sector_name, sector_code, base_dir, ...) scaffolds a sector run-script template; DW-Production convenience wrapper create_dw_sector_script().
  • profile_helpers.Rcreate_profile(repo_name, ...) scaffolds a profile_<repo>.R with the standard CSO building blocks; review_profile(path, ...) audits an existing profile.
  • test_scripts.Rtest_scripts(path, ...) recursively scans .R scripts and flags direct calls to raw IO / API commands wrapped by dw_io.R / dw_api.R (16 built-in rules across io / api families). Per-line escape hatch via # cso-allow: <rule-id>; CI mode via error_on_violation = TRUE.
  • dw_nestweight.Rdw_nestweight() redistributes survey weights from missing nested observations so per-stratum totals are preserved. R port of edukit_nestweight (Diana Goldemberg).

Stata helpers (stata/src/)

First three Stata helpers — fills the v0.2 placeholder:

  • dw_save.ado (+ .sthlp) — Stata sibling of R dw_save(). isid + compress + save + sibling .provenance.json sidecar matching the R-side shape (JSON-escaped). Honours producer / reviewer mode via $dw_mode; canonical writes blocked in reviewer mode unless allow_canonical_write is passed. Content hash via Stata-native datasignature (no shell-out / AppLocker issue).
  • dw_compare.ado (+ .sthlp) — Stata sibling of R dw_compare(). Merges two .dta files on idvars and classifies each value column as identical / numerically-equivalent (within tol()) / different; optional Markdown report.
  • dw_mkdir.ado (+ .sthlp) — recursive mkdir (Stata's built-in is non-recursive). Idempotent.

Docs (docs/)

  • dw_io_reference.md — per-function reference for dw_io.R.
  • dw_api_reference.md — per-function reference for dw_api.R.
  • git_workflow.md — gitflow + branch-protection contract reference (main / develop / feature; admin bypass on develop for hotfixes; full enforce on main).
  • roles_and_workflow.md — extended with Mermaid data-flow diagram (PRODUCER / REVIEWER / INGESTOR boundaries, colour-coded by role) + role-vs-action matrix.

Branding + meta

  • Top-level README.md rebranded as UNICEF Chief Statistician Office toolkit with a new Objective and motivation section spelling out the reproducibility-and-scalability mission for the D&A Section in OSE.
  • NEWS.md documents every addition.
  • r/R/README.md and stata/src/README.md updated to the live helper inventory; the stata/src "placeholder" line is gone.

Lineage credits

The three Stata helpers and dw_nestweight are ports from the
World Bank EduAnalyticsToolkit.
Each ported file credits the original author in its header:

cso-toolkit EduAnalyticsToolkit ancestor Original author
dw_save.ado edukit_save / savemetadata Diana Goldemberg
dw_compare.ado comparefiles / edukit_comparefiles Kristoffer Bjärkefur
dw_mkdir.ado rmkdir / edukit_rmkdir Kristoffer Bjärkefur
dw_nestweight.R nestweight / edukit_nestweight Diana Goldemberg

Review fixes folded in

Every PR landed with all Copilot review comments addressed: 11 fixes
on PR #1 (multi-col by correctness, n_distinct NA-counting,
namespace hygiene, removed source-time side effects, %||%
collision, generated-template defaults), 2 on PR #2 (Mermaid paths
matching documented layout, pushed-commit recovery recipe), 5 on PR
#3 (capture isid vs quietly isid, JSON-escaped sidecar values,
DRY compare logic, non-numeric guard).

Diff stats

+3,047 / −30 across 21 files, 23 commits from main to develop.

Known limitations (carrying into v0.3)

  • No Stata equivalent of dw_use() yet. Reading is unconstrained on
    the Stata side; the reviewer-mode no-API guard does not exist.
  • No Stata dw_require_no_api.do helper.
  • No Stata dw_load_config.do (each project profile still wires up
    $dw_mode itself).
  • See the dedicated tracking issue for the v0.3 Stata gaps.

Install

Vendored, not installed. R helpers ship in r/R/; Stata helpers in
stata/src/. See docs/toolkit_strategy.md
for the vendoring rationale and the cso_toolkit_pull() workflow
for refreshing a downstream consumer.