v0.2.0 - Stata helpers, R toolkit expansion, CSO rebrand, workflow diagrams
First substantive release beyond the v0.1.0-rc1 scaffold. Promotes
everything that landed on develop today (PRs #1, #2, #3) to main
via release PR #4.
R helpers (r/R/)
Seven new files:
aggregate_data.R— originalaggregate_data()(mean / weighted_mean, optional global aggregate, population + country coverage).aggregate_data_v2.R—aggregate_data_v2()withweighted_mean/mean/sum/proportion, coverage threshold, metadata columns. Shipsgenerate_agg_footnote()andapply_time_window().generate_markdown_report.R—generate_markdown_report()+process_all_csv_files()— descriptive-stats Markdown reports from CSV files.create_sector_script.R—create_sector_script(sector_name, sector_code, base_dir, ...)scaffolds a sector run-script template; DW-Production convenience wrappercreate_dw_sector_script().profile_helpers.R—create_profile(repo_name, ...)scaffolds aprofile_<repo>.Rwith the standard CSO building blocks;review_profile(path, ...)audits an existing profile.test_scripts.R—test_scripts(path, ...)recursively scans.Rscripts and flags direct calls to raw IO / API commands wrapped bydw_io.R/dw_api.R(16 built-in rules acrossio/apifamilies). Per-line escape hatch via# cso-allow: <rule-id>; CI mode viaerror_on_violation = TRUE.dw_nestweight.R—dw_nestweight()redistributes survey weights from missing nested observations so per-stratum totals are preserved. R port ofedukit_nestweight(Diana Goldemberg).
Stata helpers (stata/src/)
First three Stata helpers — fills the v0.2 placeholder:
dw_save.ado(+.sthlp) — Stata sibling of Rdw_save().isid+compress+save+ sibling.provenance.jsonsidecar matching the R-side shape (JSON-escaped). Honours producer / reviewer mode via$dw_mode; canonical writes blocked in reviewer mode unlessallow_canonical_writeis passed. Content hash via Stata-nativedatasignature(no shell-out / AppLocker issue).dw_compare.ado(+.sthlp) — Stata sibling of Rdw_compare(). Merges two.dtafiles onidvarsand classifies each value column as identical / numerically-equivalent (withintol()) / different; optional Markdown report.dw_mkdir.ado(+.sthlp) — recursivemkdir(Stata's built-in is non-recursive). Idempotent.
Docs (docs/)
dw_io_reference.md— per-function reference fordw_io.R.dw_api_reference.md— per-function reference fordw_api.R.git_workflow.md— gitflow + branch-protection contract reference (main / develop / feature; admin bypass on develop for hotfixes; full enforce on main).roles_and_workflow.md— extended with Mermaid data-flow diagram (PRODUCER / REVIEWER / INGESTOR boundaries, colour-coded by role) + role-vs-action matrix.
Branding + meta
- Top-level README.md rebranded as UNICEF Chief Statistician Office toolkit with a new Objective and motivation section spelling out the reproducibility-and-scalability mission for the D&A Section in OSE.
- NEWS.md documents every addition.
r/R/README.mdandstata/src/README.mdupdated to the live helper inventory; the stata/src "placeholder" line is gone.
Lineage credits
The three Stata helpers and dw_nestweight are ports from the
World Bank EduAnalyticsToolkit.
Each ported file credits the original author in its header:
| cso-toolkit | EduAnalyticsToolkit ancestor | Original author |
|---|---|---|
dw_save.ado |
edukit_save / savemetadata |
Diana Goldemberg |
dw_compare.ado |
comparefiles / edukit_comparefiles |
Kristoffer Bjärkefur |
dw_mkdir.ado |
rmkdir / edukit_rmkdir |
Kristoffer Bjärkefur |
dw_nestweight.R |
nestweight / edukit_nestweight |
Diana Goldemberg |
Review fixes folded in
Every PR landed with all Copilot review comments addressed: 11 fixes
on PR #1 (multi-col by correctness, n_distinct NA-counting,
namespace hygiene, removed source-time side effects, %||%
collision, generated-template defaults), 2 on PR #2 (Mermaid paths
matching documented layout, pushed-commit recovery recipe), 5 on PR
#3 (capture isid vs quietly isid, JSON-escaped sidecar values,
DRY compare logic, non-numeric guard).
Diff stats
+3,047 / −30 across 21 files, 23 commits from main to develop.
Known limitations (carrying into v0.3)
- No Stata equivalent of
dw_use()yet. Reading is unconstrained on
the Stata side; the reviewer-mode no-API guard does not exist. - No Stata
dw_require_no_api.dohelper. - No Stata
dw_load_config.do(each project profile still wires up
$dw_modeitself). - See the dedicated tracking issue for the v0.3 Stata gaps.
Install
Vendored, not installed. R helpers ship in r/R/; Stata helpers in
stata/src/. See docs/toolkit_strategy.md
for the vendoring rationale and the cso_toolkit_pull() workflow
for refreshing a downstream consumer.