Releases: unicef-drp/cso-toolkit
v0.4.6 — mode-lock fix + dw_root wrapper + dashboard
v0.4.6 (2026-05-30)
Quality release. Four issues land in one cycle — one HIGH-severity
canonical-recognition fix that would have allowed reviewer-mode
overwrites of canonical Teams artefacts, plus three cleanups
(re-exported dw_root() wrapper, uniform .cso_require envelope on
standalone-source %>% gate, and an r/.gitattributes pin so the R
subtree checks out with LF endings on Windows). No public API
breaks.
dw_is_canonical recognises OneDrive-mounted Teams Documents path (issue #54) — HIGH severity
Pre-fix, dw_is_canonical() matched canonical paths against
teamsFolderCanonical and the Z: drive root only. On UNICEF laptops
where the Teams "Documents" library is mounted via OneDrive
(C:/Users/<user>/UNICEF/<team> - Documents/...), the canonical
prefix the helper saw at runtime did not match the literal path the
sector profile assembled when writing — so dw_is_canonical()
returned FALSE for paths that were, in fact, canonical Teams
deposits.
Combined with the v0.4.0 reviewer-mode write-refusal contract
(dw_save refuses canonical writes in reviewer mode unless
allow_canonical_write = TRUE), the false-negative meant a reviewer-
mode dw_save() call would have silently overwritten the canonical
Teams artefact instead of hard-stopping.
Surfaced empirically by the 2026-05-30 HVA + ED reviewer-mode
fanout runs (DW-Production): the canonical path through the
OneDrive-mounted Documents folder passed the
!dw_is_canonical(path) precondition and would have written to the
canonical artefact had the runs not been halted by the audit. The
fix extends dw_is_canonical() to also recognise the OneDrive-
mounted Documents form, so the canonical-write refusal contract
fires correctly on UNICEF-laptop reviewer sessions.
Tests in r/tests/testthat/test-dw-is-canonical.R extended to cover
the OneDrive-mounted form alongside the existing Z: and
teamsFolderCanonical cases.
dw_root() public wrapper re-exported (issue #53)
Several DW-Production sector scripts (carried forward from the v0.3
era) call dw_root() directly to derive the sector-folder anchor
for relative path resolution. dw_root() was never an exported entry
in the v0.4.x NAMESPACE — sector scripts that vendored v0.4.x copies
of the toolkit hit Error: could not find function "dw_root" the
first time they sourced the profile.
The internal helper is re-exported as a public wrapper in v0.4.6;
the existing implementation is unchanged. NAMESPACE gains an
export(dw_root) entry and man/dw_root.Rd is generated. The
helper joins Other io: in the family cross-references so it
surfaces in pkgdown alongside dw_save, dw_use, and friends.
Uniform .cso_require envelope on %>% standalone-source gate (issue #51)
v0.4.5's #46 fix added a local %>% binding gated by exists(). The
gate worked, but it raised a bare base-R error if magrittr itself
wasn't installed — without the [cso_toolkit.<func>] WHAT / Why / Fix envelope the rest of the toolkit follows.
v0.4.6 wraps the local-binding fallback in .cso_require("magrittr")
so the envelope shape is uniform. Consumers who source the file
standalone without magrittr installed now see the same actionable
three-part message as every other toolkit raise.
r/.gitattributes pins LF endings on the R subtree (issue #52)
On Windows checkouts under default git autocrlf settings, R source
files in the package subtree (*.R, *.Rd, NAMESPACE,
DESCRIPTION, *.yml, *.yaml, *.md) were rewritten with CRLF
endings on checkout. That made local working-tree SHA-256 hashes
diverge from the LF-computed Git blob hashes — a recurring source
of spurious "drift" complaints from Windows consumers comparing
working-tree hashes against manifest entries computed on Linux.
A new r/.gitattributes pins LF endings on the R subtree's source
files so Windows checkouts of the package subtree are byte-identical
to Linux/macOS. This is scoped to r/ — the rest of the repo
inherits the platform default. Per-file hash drift detection in
cso_toolkit_check() itself is not part of this release; today
the function only compares the pinned manifest version against the
upstream latest tag. The richer drift logic is planned for the
cso_toolkit_diff() / cso_toolkit_pull() work (stubbed in v0.4.6).
Docs: third role renamed INGESTOR → PUBLISHER (DBM / DBR / DBP)
The third data-warehouse role is now PUBLISHER (was INGESTOR),
aligning the role label with the verb the code already uses
(dw_publish(); there is no dw_ingest()). Docs-only — no code,
no exported-API change, and no mode-contract value changed: the
contract still exposes producer / reviewer, and wiring a
publisher mode remains future scope (issue
#15). The role
acronyms are now spelled out across the docs — DBM = Database
Manager (PRODUCER), DBR = Database Reviewer (REVIEWER), DBP =
Database Publisher (PUBLISHER). Touches docs/roles_and_workflow.md,
README.md, r/DESCRIPTION, templates/dbm_submission_template.md,
docs/git_workflow.md, docs/toolkit_strategy.md.
v0.4.5 — standalone-source pipe binding + dw_ aliases
Closes the v0.4.5 milestone with two deliverables — standalone-source %>% binding (issue #46, PR #47) and 8 new dw_-prefixed aliases (issue #42, PR #48). A companion declaration of magrittr as a first-class Import (with importFrom in NAMESPACE) makes the dependency surface honest. No public API breaks; both names remain exported throughout v0.4.x.
Standalone-source %>% binding (#46)
v0.4.4's #36 fix made aggregate_data_v2.R partially safe to source standalone by adding a local .cso_require() fallback. But the file still used the unqualified %>% operator without a binding — .cso_require() calls requireNamespace(), which loads but does NOT attach package exports, so source("aggregate_data_v2.R") errored at the first pipe call with Error: could not find function "%>%" even when magrittr was installed.
Same one-line gate pattern resolves it. Applied to both files that use %>% standalone:
if (!exists("%>%", mode = "function", inherits = TRUE)) {
`%>%` <- magrittr::`%>%`
}In the installed-package context the local binding is a no-op (NAMESPACE's importFrom(magrittr, "%>%") wins). In standalone-source mode the local binding provides the same %>% symbol via magrittr's namespace, so consumers don't have to library(magrittr) first.
To make this fully honest: magrittr is now declared in DESCRIPTION::Imports (previously it came in transitively via dplyr), and zzz.R::.cso_require carries an @importFrom magrittr %>% roxygen tag so NAMESPACE gains the explicit importFrom(magrittr, "%>%") entry. Both belong to the principled fix: the package's dependency declaration now matches the comments and the test claims about the installed-package no-op path.
Also corrected a misleading comment in zzz.R::globalVariables that referred to a non-existent apply_time_window.R file (apply_time_window() is defined inside aggregate_data_v2.R).
Surfaced empirically by Copilot review of DW-Production PR #144 (WS v0.4.4 install) on 2026-05-29; Copilot review of cso-toolkit PR #47 then flagged that the comment claims about NAMESPACE were aspirational, prompting the principled magrittr import declaration.
dw_-prefixed canonical aliases for the remaining un-prefixed exports (#42)
v0.4.4 added dw_ aliases for aggregate_data_v2 and create_sector_script (the two exports touched by #36). v0.4.5 extends the program to the remaining 8 un-prefixed exports so the toolkit's public surface consistently uses the dw_ namespace:
| Un-prefixed | dw_-prefixed alias |
|---|---|
aggregate_data |
dw_aggregate_data |
generate_agg_footnote |
dw_generate_agg_footnote |
apply_time_window |
dw_apply_time_window |
generate_markdown_report |
dw_generate_markdown_report |
process_all_csv_files |
dw_process_all_csv_files |
create_profile |
dw_create_profile |
review_profile |
dw_review_profile |
test_scripts |
dw_test_scripts |
Both names continue to work and share the same \link{} man page via roxygen @rdname. No breaking change: consumers using the un-prefixed names see no behaviour change. The un-prefixed names remain exported and will continue to work indefinitely in v0.4.x; a future v0.5.x cycle may emit a lifecycle::deprecate_soft() warning on the un-prefixed forms to nudge migration, but only after sector leads confirm the migration is complete.
create_dw_sector_script was intentionally NOT aliased: it already carries a dw infix (it's a DW-Production-specific wrapper of create_sector_script), and create_sector_script got dw_create_sector_script in v0.4.4 anyway. The dual-naming is historical and won't be cleaned up under #42.
The test_scripts → dw_test_scripts alias also got a roxygen note: a future v0.5.x rename to dw_audit_scripts (to surface the audit intent and avoid the testthat-name collision) is on the design horizon. For now the prefix-only alias keeps the cleanup additive.
Forward look
v0.5.0 will land the live dw_publish() submission branch (issue #15) and the dw_regions() API redesign against the Country-and-Region-Metadata-API package (issue #40) once sector leads finalise the Helix endpoint contract and the regions API output schema.
DW-Production propagation
Once this release lands, the 4 active DW-Production sector branches (NT, IM, WS, HVA on their review/sector-*-2026-05-18 branches) will receive install PRs bumping their vendored toolkit from v0.4.4 to v0.4.5. The diff vs v0.4.4 affects 4 files (aggregate_data.R, aggregate_data_v2.R, zzz.R, DESCRIPTION). The remaining 7 vendored files are byte-identical to v0.4.4.
Full changelog: v0.4.4...v0.4.5
v0.4.4 — Quality release (DW-Production v0.4.3.1 fanout findings)
v0.4.4 (2026-05-29)
Quality release. Three v0.4.4 milestone issues land in one cycle (PRs
#39,
#41,
#43). All three
surfaced empirically during the DW-Production v0.4.3.1 fanout audit
(IM / WS / HVA install + reviewer-mode runs on 2026-05-28). No public
API breaks; the bumped-default behaviour stays backwards-compatible
for v0.4.3.1 consumers.
Follow-up issue #42
tracks the remaining un-prefixed exports for a single naming-cleanup
PR (proposed v0.4.5).
Three carry-forward bugs + dw_-prefix aliases (issue #36)
Closes two of the three sub-fixes flagged on #36 during the HVA
scaffold-install Copilot review on 2026-05-28. The third (a value-arg
propagation bug in dw_regions.R) is moot — dw_regions is being
redesigned to consume the new unicef-drp/Country-and-Region-Metadata-API
package in #40 (v0.5.0); the affected code path is removed.
Sub-fix 1: aggregate_data_v2.R is now safe to source standalone
Pre-fix, aggregate_data_v2.R called .cso_require() from zzz.R.
Sourcing aggregate_data_v2.R directly (without sourcing zzz.R
first) left .cso_require undefined; the first call to
aggregate_data_v2(...) errored with could not find function .cso_require.
The file now defines a local fallback for .cso_require() at source
time, gated by exists(".cso_require", mode = "function", inherits = TRUE).
When zzz.R has already been sourced into .GlobalEnv, the shared
helper wins and nothing is redefined; when only aggregate_data_v2.R
is sourced, the local fallback provides the same behaviour.
Sub-fix 2: create_sector_script() profile sentinel check relaxed (and aligned with create_profile())
Pre-fix, the generated 00_run_<sector>.R template checked
isTRUE(profile_DW_Production). The DW-Production profile
(profile_DW-Production.R) does not set profile_DW_Production, so
the generated script errored at the sentinel check even after the
profile was sourced successfully.
The check is now relaxed from isTRUE(<name>) to !is.null(<name>),
which accepts any non-null value — character paths, numeric values,
or the boolean sentinel that create_profile("DW-Production") emits
(profile_DW_Production <- TRUE). The default profile_name stays at
"profile_DW_Production" so the documented scaffold flow
(create_profile() → create_dw_sector_script()) works out of the
box without additional configuration.
The new error message names the missing variable so future
profile-vs-template mismatches surface a concrete fix. The roxygen
@param doc also clarifies that the generated template uses
projectFolder directly for input/output paths, so the profile MUST
set projectFolder for the runner to do useful work — the sentinel
check only confirms the profile was sourced.
For DW-Production consumers (whose existing profile_DW-Production.R
doesn't set the sentinel), the one-line profile_DW_Production <- TRUE
must be added to the profile. Tracked as a separate DW-Production-side
follow-up PR.
dw_-prefixed canonical aliases for the two touched exports
Toolkit-export naming consolidates around the dw_ prefix in v0.4.x;
the non-prefixed names predate that convention. While in this PR's
files anyway, added:
dw_aggregate_data_v2(alias foraggregate_data_v2)dw_create_sector_script(alias forcreate_sector_script)
Both names point to the same function and share the same \\code{\\link{}} man page (via roxygen @rdname). The non-prefixed names continue to work — no breaking change. Follow-up issue tracks the rest of the un-prefixed exports (aggregate_data, generate_markdown_report, apply_time_window, generate_agg_footnote, create_profile, review_profile, test_scripts, create_dw_sector_script) as a single cleanup PR.
dw_default_unicef_allowlist() helper for consumers (issue #37)
New exported helper returns a character vector of ^...-anchored
regex patterns covering UNICEF DRP GitHub-raw and repository URLs.
Consumers seed dw_url_allowlist from this constant instead of
re-deriving the patterns per project:
# In profile_<consumer>.R
dw_url_allowlist <- c(
dw_default_unicef_allowlist(),
# Project-specific extras:
"^https://yourorg\\.github\\.io/"
)Surfaced empirically by the DW-Production reviewer-mode audit on
2026-05-28 (IM 01_immunization.R): every URL-using sector script
hand-wrote the same ^https://raw\\.githubusercontent\\.com/unicef-drp/
pattern. The helper consolidates the duplication and lets future
UNICEF-DRP additions land in one place upstream rather than in each
consumer's profile.
Purely additive — consumers must opt in by composing the helper into
their dw_url_allowlist. The URL-freeze safety contract is unchanged
(no URL is fetchable without explicit ratification).
.dw_frozen_root() resolution is now discoverable (issue #38)
.dw_frozen_root() falls through a 3-tier resolution chain when
locating the URL-freeze cache root:
dw_frozen_rootglobal (opt-in; preferred)<githubFolder>/_frozen(fallback)<getwd()>/_frozen(last-resort fallback)
Pre-v0.4.4 the helper resolved silently — consumers whose project
layout didn't match the fallback heuristic had to grep dw_io.R to
discover why dw_use("https://...") couldn't find their frozen file.
v0.4.4 adds two discoverability improvements:
- A new internal helper
.dw_frozen_root_resolved()returns a
(path, source)pair so downstream callers can surface the chosen
tier in messages and error envelopes. .dw_frozen_root_notify_once()emits a session-scoped notice the
first time the helper falls back beyond tier #1:
message()for tier #2 (<githubFolder>/_frozen),
warning()for tier #3 (<getwd()>/_frozen).
Consumers that explicitly setdw_frozen_rootget no notice.
The missing-frozen-copy error envelope in .resolve_remote_url()
now includes the resolution tier so consumers see which fallback
fired (or that the explicit global picked the path that's wrong):
[cso_toolkit.dw_use:remote] Reviewer mode forbids fetching from the network.
Missing frozen copy: <path>
URL: <url>
Frozen-root resolution: <chosen-root> (<tier-name>)
Fix:
1. If the path above is wrong, set `dw_frozen_root <- '<your-canonical-frozen-path>'` in your profile.
2. Otherwise, a producer must call dw_use(...) once and commit the frozen file + sidecar.
Surfaced empirically by the DW-Production IM reviewer-mode audit on
2026-05-28: the fallback resolved to <githubFolder>/_frozen instead
of DW-Production's convention of <projectFolder>/01_dw_prep/011_rawdata/_frozen/.
Three runs (75+ min of slow Teams network) were needed to diagnose
what a single message could have surfaced at session start.
No public-API changes. The internal .dw_frozen_root() (path-only)
is preserved for backward compatibility with v0.4.3.1 callers.
v0.4.3.1 - Version stamp drift fix
v0.4.3.1 (2026-05-28)
Patch release. v0.4.3 (cut earlier today) bumped DESCRIPTION::Version
and NEWS.md but missed three version stamps inside r/R/dw_io.R:
- header banner (
# Toolkit version: 0.4.2) dw_toolkit_version()docstring (Currently "0.4.2")dw_toolkit_version()return value ("0.4.2")
Caught by Copilot review on DW-Production install PRs (#134 + #136):
dw_toolkit_version() was returning "0.4.2" while consumers had
manifest::pulled_version = "v0.4.3" — an inconsistency that would
have polluted dw_publish() provenance sidecars with the wrong
toolkit version. This patch bumps all three stamps to 0.4.3.1 so
they agree with DESCRIPTION and the manifest pin.
The release-cut checklist for the next minor (v0.5.0) should include
a grep -rn '0\.[0-9]\.[0-9]' r/R/ step to catch any future stamp
drift before tagging.
No behavioural changes; safe to install as a drop-in replacement for
v0.4.3.
v0.4.3 — Integrity release
v0.4.3 (2026-05-28)
Integrity release. Two dw_use() fixes (issues
#30 +
#31) ported from
the DW-Production NT reviewer-mode reproducibility audit on 2026-05-27
(PR #33; DW-Production
PR #133). Both
landed first on the DW-Production vendored copy as local_edits; this
release lets the next cso_toolkit_pull(target_version = "v0.4.3") drop
those local edits.
Issue #32
(provenance sidecars) is the design-foundation companion in the same
milestone; implementation ships in a follow-up PR.
dw_use() — parquet / dta col_select = NULL conditional dispatch (issue #30)
The v0.4.2 parquet branch unconditionally passed col_select = cols
to arrow::read_parquet(). When a caller invoked dw_use(path) without
explicit columns, cols defaulted to NULL, and
arrow::read_parquet(path, col_select = NULL) returned a zero-column
schema rather than all columns. The same pattern affected
haven::read_dta(col_select = NULL). Both branches now use conditional
dispatch: pass col_select only when cols is non-NULL.
Surfaced empirically by DW-Production Run #6 (NT pipeline): 13+ stages
downstream of 1b_cmrs_series_import.R failed with "object 'COLUMN'
not found" because the upstream dw_use(out_dw_nut_*.parquet) returned
an empty tibble. After the fix (Run #7): 24/25 stages OK.
dw_use(cols_lenient = FALSE) — new flag for any_of()-style schema intersect (issue #31)
Sector scripts that wanted "select these columns if present, ignore
the absent" semantics passed dw_use(cols = dplyr::any_of(c(...))).
any_of() errors fatally outside a tidyselect selecting context
(tidyselect >= 1.2.0); R evaluates the helper before the call to
dw_use, so no lazy-eval trick inside the toolkit can save it.
New cols_lenient = FALSE parameter (default off for backwards compat).
When TRUE, dw_use introspects the file schema cheaply (parquet
metadata, csv / tsv / xlsx zero-row read, dta header) and intersects
the requested cols with the actual columns before the data read.
Empty intersection → warning + read all columns (forward-progress
guarantee). New internal helper .dw_schema_cols(path, fmt) performs
the schema-only read.
Migration:
dw_use(cols = dplyr::any_of(c(...)))→dw_use(cols = c(...), cols_lenient = TRUE)dw_use(cols = dplyr::all_of(c(...)))→dw_use(cols = c(...))(strict;all_of()at top level is deprecated in tidyselect 1.2.0 anyway)
Companion issue: provenance sidecars (issue #32)
Issue #32 sketches the producer → reviewer → ingestor integrity chain
that .write_remote_provenance (v0.4.0, URL-freeze sidecars) is the
seed of. Foundational design captured; implementation deferred to a
follow-up PR within the v0.4.3 milestone.
v0.4.0 -- Mode contract + Stata parity + demographics + dw_publish STUB
cso-toolkit v0.4.0 (2026-05-26)
Full notes also in NEWS.md.
Issue #15 — dw_publish() STUB (dry-run only)
Ships r/R/dw_publish.R as a deliberate STUB so DW-Production sector
scripts can wire the canonical call site today and have the live
branch light up automatically when v0.5.0 lands.
What's in:
- Public signature matching the final v0.5.0 contract:
dw_publish(path, indicator, vintage, sector, endpoint = "helix", dry_run = TRUE, ...). - Producer-only mode contract -- reviewer-mode calls raise
BEFORE any I/O via the same envelope shape asdw_api_fetch(). - Argument validation -- empty / missing
path/indicator/
vintage/sectorraises the envelope;pathmust exist on
disk and not be a directory;endpointmust be"helix"(the
only recognised value in v0.4.0). dry_run = TRUEreturns a validated payload withsha256,
bytes,built_at,built_by, and the toolkit-version stamp.
Caller scripts can assert the payload shape today without ever
hitting the network.
What's deliberately deferred:
- Live submission (
dry_run = FALSE) raises with the
envelope-shaped "Live submission not yet implemented" message
and a pointer to GitHub issue #15. Real Helix endpoint
integration ships in v0.5.0 once sector leads (@karavan88,
@sbrar29, @laurenfrancis1202) finalise the submission contract.
Scope boundary -- folded into the helper's roxygen + docstring so
the long-running DW-Production confusion is finally resolved:
dw_save()-- filesystem (Teams + Z: drive mirror).dw_publish()-- API (Helix submission).
Tested: 6 new asserts in
r/tests/testthat/test-dw_publish.R (cover the mode lockout,
argument validation, missing path, endpoint allowlist, dry-run
payload shape, and the v0.5.0-not-yet envelope). Total R test
suite is now 235 / 0; devtools::check() remains 0 / 0 / 0.
Issues #17 + #18 — dw_pop() and dw_regions() (R only)
Two convenience wrappers that almost every sector pipeline needs but
that v0.3.0 made users write themselves. Both ship R-only in v0.4.0;
Python and Stata parity are tracked at the same GitHub issues for a
future minor.
r/R/dw_pop.R--dw_pop()wrapsdw_api_fetch(api = "wb")
for the World Bank total-population indicator (SP.POP.TOTL) and
returns a tidy(REF_AREA, TIME_PERIOD, OBS_VALUE)tibble. When
yearisNULL(default), only the latest available year per
country is returned; pass a year (or vector of years) to subset.
Optionalcountriesfilter,refreshto force a live fetch, and
cache_keyoverride.r/R/dw_regions.R--dw_regions()fetches the UNICEF region
taxonomy fromunicef-drp/Country-and-Region-Metadata
(defaultUNICEF_REP_REG_GLOBAL.csv) via
dw_api_fetch(api = "github_raw"), joins the country -> region map
into the caller's tibble, callsaggregate_data_v2()per region
with the suppliedvalue+by+method, and appends the
regional rows to the original. Whenweight = "population"(the
default), denominators come fromdw_pop()and are merged in on
REF_AREA + TIME_PERIOD; otherwise the named column is used
directly.
New pkgdown reference section: Demographics. Both helpers are
registered with @family demographics and exported.
Tested: 11 new asserts for dw_pop + 19 for dw_regions (total R
suite now 191 / 0); devtools::check() stays at 0 / 0 / 0.
Issue #5 — Stata helpers reaching mode-contract parity
Ships the three Stata helpers that completed the v0.4.0 producer /
reviewer contract on the Stata side, closing the gaps surfaced when
v0.3.0 landed Stata-as-a-supported-target with read + API parity still
deferred:
stata/src/dw_use.ado+.sthlp— uniform Stata read wrapper
with auto-dispatch on.dta/.csv/.xlsx. Implements the v0.4.0
mode-branched resolver (producer = local-first, reviewer =
network-first), parses sibling.provenance.jsonfor the recorded
datasignature, and runs a non-blocking Z: drive integrity check
(size by default;datasignaturedeep check via
verify_z(sha256)).stata/src/dw_require_no_api.ado+.sthlp— preflight gate
that aborts (Stata error 459) when$dw_mode == "reviewer". Mirrors
the Rr/R/profile_helpers.R::dw_require_no_apishape.stata/src/dw_load_config.ado+.sthlp— hand-rolled YAML
reader for~/.config/user_config.yml. No external dependency
(AppLocker-safe). Populates$dw_mode+ theteams*and
sandboxRootglobals; hard-stops with the envelope-shaped error
whendw_modeis missing or set to anything other than
producer|reviewer.
Stata-side dw_api_fetch and Parquet / RDS read support remain out of
scope by design (route through R or Python and dw_save to .dta).
Issue #14 — Producer / reviewer mode contract tightening (BREAKING)
Refines the producer / reviewer split so that producer outputs are
provably redundant and reviewer reads are provably canonical. R +
Python siblings ship in lock-step.
- Producer-mode writes are now redundant. Every primary write
fans out to BOTH the Teams canonical mirror AND the Z: drive mirror
(whichever are available).dw_savehard-stops with the standard
envelope when neither mirror is configured / reachable — producer
outputs cannot live only on the producer's laptop. - Reviewer-mode writes broadened. In addition to refusing writes
under canonical (v0.3.0),dw_savenow also refuses writes under
the configured Z: drive root. Bypass withallow_canonical_write = TRUEfor deliberate DBM bootstraps as before. - Reviewer-mode reads are network-first.
dw_usenow tries
Teams → Z: → repo-local in reviewer sessions. When the network
mirrors are unavailable and a local copy exists, the read still
succeeds but emits an envelope-shaped warning flagging the
provenance gap. Hard-stops when the file is missing everywhere.
Producer-mode read order is unchanged (local-first; v0.3.0
preserved). overwritedefault flipped TRUE → FALSE. This is the only
source-incompatible change in v0.4.0. The overwrite check now
examines ALL three destinations (primary, Teams, Z:); the helper
refuses if any of them already exists. Passoverwrite = TRUE
explicitly to restore v0.3.0 behaviour. (Python: same flip on the
overwrite: boolargument; the legacymirror_to_zkeyword is
silently dropped with aDeprecationWarning.)- New
dw_toolkit_version()(R + Python). Returns the toolkit
semver as a single string ("0.4.0"). Useful for stamping logs
and asserting minimum-version requirements in consumer profiles.
Migration guide. Existing producer-mode callers that relied on
the v0.3.0 silent re-write semantics must either:
- Set
overwrite = TRUEexplicitly when overwriting an existing
deposit. This is the common path for daily re-runs of the same
vintage. - Or sequence the write under a fresh vintage subfolder so no prior
deposit collides. This is the recommended pattern for archival
work.
Reviewer-mode callers do not need changes — the new network-first
read order is transparent when Teams/Z: are reachable; the new
warning surfaces when they are not (which used to be a silent
provenance gap).
Regression coverage. 9 new testthat assertions in
r/tests/testthat/test-dw_io-mode-contract.R (161 total R asserts;
0/0/0 from devtools::check) and 9 new smoke checks in
python/tests/manual/smoke_test.py (34 total). Error-envelope test
file extended to keep [cso_toolkit.<func>] WHAT / Why / Fix shape
on every new raise.
DW-Production backports (v0.3.0.9000 development line)
Landed via PR adopting the four undeclared local edits found by
docs/dw-production-alignment-2026-05-25.md:
-
B1 (new feature):
dw_use("https://...")is now a first-class
call site. R: new.is_allowlisted_url(),.dw_frozen_root(),
.url_to_frozen_path(),.write_remote_provenance(),
.download_and_freeze(),.resolve_remote_url()inr/R/dw_io.R,
withdw_use()'s read resolver dispatching on
^https?://. Python: same shape inpython/src/dw_io.py.
Allowlist is empty by default so the toolkit ships
consumer-neutral; the consumer's profile populates
dw_url_allowlist(R global) /_state.dw_url_allowlist(Python).
Reviewer mode refuses to fetch new URLs; producer mode downloads
once and writes a.provenance.jsonwithsha256+bytes+
fetched_at+fetched_by+dw_mode. Three new_statekeys:
dw_url_allowlist,dw_frozen_root,githubFolder. -
B2 (QoL fix):
dw_saveauto-detects gzip when the path already
ends in.gz(was previously a foot-gun: passing
compress = FALSEand a.gzpath would write the file
uncompressed under the misleading name). -
B3 (robustness fix):
.provenance.jsonsidecar write is now
wrapped intryCatch(R) /try/except(Python) so a
non-serialisable metadata value warns rather than rolling back the
primary file. Sidecars are metadata; the asset is what matters. -
B4 (bug fix):
dw_api.RUIS-fetcher URL-encodes query keys +
values viautils::URLencode(reserved = TRUE); previously param
values containing&/=/ spaces / non-ASCII would corrupt the
query. Default cache extension forhttpandgithub_rawAPIs
bumped fromcsvtords(R) /pkl(Python) so text and binary
payloads round-trip correctly.
Regression tests: python/tests/manual/smoke_test.py now exercises
5 new B1–B4 invariants (20 total). R CMD check remains 0/0/0.
v0.3.0 — Python parity + Roxygen-complete R + graceful errors
First release with full Python parity for every R helper, plus the
R Roxygen-complete reference (NAMESPACE + 26 Rd files + pkgdown
config), and a three-part error envelope ([cso_toolkit.<func>] WHAT / Why / Fix) standardised across R and Python.
Highlights
- Full Python port at
python/src/— 10 modules, 26 public entries,
same behaviour contract as R (mode-aware path routing, Z: drive
mirror, provenance sidecars, version-drift detection). - Roxygen-complete R reference —
NAMESPACE+man/(26.Rd
files) +_pkgdown.ymlconfig with 8 grouped families
(io,api,sync,aggregate,survey-weights,reporting,
scaffolding,audit). - Graceful error envelopes across R + Python with WHAT / Why / Fix
shape and grep-friendly[cso_toolkit.<func>]prefix. - Secrets redaction for
dw_api_fetchkwargs persisted to the
.provenance.jsonsidecar. - Per-language top-level READMEs (
r/,python/,stata/). - R CMD check — 0 errors / 0 warnings / 0 notes.
- Python smoke test — 15/15 invariants.
- Python error-envelope test — 30/30 raise paths.
Version metadata
r/DESCRIPTION→Version: 0.3.0python/pyproject.toml→version = "0.3.0"python/src/__init__.py→__version__ = "0.3.0"templates/.toolkit_manifest.yml→pulled_version: "v0.3.0"
See NEWS.md for the consolidated changelog.
Released via develop → main merge (PR #9) on 2026-05-25.
v0.2.0 - Stata helpers, R toolkit expansion, CSO rebrand, workflow diagrams
First substantive release beyond the v0.1.0-rc1 scaffold. Promotes
everything that landed on develop today (PRs #1, #2, #3) to main
via release PR #4.
R helpers (r/R/)
Seven new files:
aggregate_data.R— originalaggregate_data()(mean / weighted_mean, optional global aggregate, population + country coverage).aggregate_data_v2.R—aggregate_data_v2()withweighted_mean/mean/sum/proportion, coverage threshold, metadata columns. Shipsgenerate_agg_footnote()andapply_time_window().generate_markdown_report.R—generate_markdown_report()+process_all_csv_files()— descriptive-stats Markdown reports from CSV files.create_sector_script.R—create_sector_script(sector_name, sector_code, base_dir, ...)scaffolds a sector run-script template; DW-Production convenience wrappercreate_dw_sector_script().profile_helpers.R—create_profile(repo_name, ...)scaffolds aprofile_<repo>.Rwith the standard CSO building blocks;review_profile(path, ...)audits an existing profile.test_scripts.R—test_scripts(path, ...)recursively scans.Rscripts and flags direct calls to raw IO / API commands wrapped bydw_io.R/dw_api.R(16 built-in rules acrossio/apifamilies). Per-line escape hatch via# cso-allow: <rule-id>; CI mode viaerror_on_violation = TRUE.dw_nestweight.R—dw_nestweight()redistributes survey weights from missing nested observations so per-stratum totals are preserved. R port ofedukit_nestweight(Diana Goldemberg).
Stata helpers (stata/src/)
First three Stata helpers — fills the v0.2 placeholder:
dw_save.ado(+.sthlp) — Stata sibling of Rdw_save().isid+compress+save+ sibling.provenance.jsonsidecar matching the R-side shape (JSON-escaped). Honours producer / reviewer mode via$dw_mode; canonical writes blocked in reviewer mode unlessallow_canonical_writeis passed. Content hash via Stata-nativedatasignature(no shell-out / AppLocker issue).dw_compare.ado(+.sthlp) — Stata sibling of Rdw_compare(). Merges two.dtafiles onidvarsand classifies each value column as identical / numerically-equivalent (withintol()) / different; optional Markdown report.dw_mkdir.ado(+.sthlp) — recursivemkdir(Stata's built-in is non-recursive). Idempotent.
Docs (docs/)
dw_io_reference.md— per-function reference fordw_io.R.dw_api_reference.md— per-function reference fordw_api.R.git_workflow.md— gitflow + branch-protection contract reference (main / develop / feature; admin bypass on develop for hotfixes; full enforce on main).roles_and_workflow.md— extended with Mermaid data-flow diagram (PRODUCER / REVIEWER / INGESTOR boundaries, colour-coded by role) + role-vs-action matrix.
Branding + meta
- Top-level README.md rebranded as UNICEF Chief Statistician Office toolkit with a new Objective and motivation section spelling out the reproducibility-and-scalability mission for the D&A Section in OSE.
- NEWS.md documents every addition.
r/R/README.mdandstata/src/README.mdupdated to the live helper inventory; the stata/src "placeholder" line is gone.
Lineage credits
The three Stata helpers and dw_nestweight are ports from the
World Bank EduAnalyticsToolkit.
Each ported file credits the original author in its header:
| cso-toolkit | EduAnalyticsToolkit ancestor | Original author |
|---|---|---|
dw_save.ado |
edukit_save / savemetadata |
Diana Goldemberg |
dw_compare.ado |
comparefiles / edukit_comparefiles |
Kristoffer Bjärkefur |
dw_mkdir.ado |
rmkdir / edukit_rmkdir |
Kristoffer Bjärkefur |
dw_nestweight.R |
nestweight / edukit_nestweight |
Diana Goldemberg |
Review fixes folded in
Every PR landed with all Copilot review comments addressed: 11 fixes
on PR #1 (multi-col by correctness, n_distinct NA-counting,
namespace hygiene, removed source-time side effects, %||%
collision, generated-template defaults), 2 on PR #2 (Mermaid paths
matching documented layout, pushed-commit recovery recipe), 5 on PR
#3 (capture isid vs quietly isid, JSON-escaped sidecar values,
DRY compare logic, non-numeric guard).
Diff stats
+3,047 / −30 across 21 files, 23 commits from main to develop.
Known limitations (carrying into v0.3)
- No Stata equivalent of
dw_use()yet. Reading is unconstrained on
the Stata side; the reviewer-mode no-API guard does not exist. - No Stata
dw_require_no_api.dohelper. - No Stata
dw_load_config.do(each project profile still wires up
$dw_modeitself). - See the dedicated tracking issue for the v0.3 Stata gaps.
Install
Vendored, not installed. R helpers ship in r/R/; Stata helpers in
stata/src/. See docs/toolkit_strategy.md
for the vendoring rationale and the cso_toolkit_pull() workflow
for refreshing a downstream consumer.
v0.1.0-rc1 — first release candidate
First release candidate. Extracted from unicef-drp/DW-Production PR #89.
Highlights
- R helpers feature-complete under
r/R/:dw_io,dw_api,cso_toolkit_sync. - Mode contract enforced in
dw_api.Rviadw_require_no_api()— reviewers cannot call upstream APIs. .provenance.jsonsidecars on everydw_save()(sha256, schema, user, timestamp, metadata).- Vendoring model — copy into consumer's
00_functions/, pin via.toolkit_manifest.yml. Seedocs/toolkit_strategy.md. - Three-role model documented (
docs/roles_and_workflow.md) — PRODUCER / REVIEWER / INGESTOR. - DBM pre-deposit template (
templates/dbm_submission_template.md) — 8-section checklist with the ed pilot as a worked example.
What's not in this RC
- Stata helpers (
stata/src/) — scaffolded only; ship in v0.2. - Python helpers (
python/src/) — scaffolded only; ship in v0.2. - HIV / WASH Stata
savecalls still write to canonical paths.
Cutting v1.0.0
After the ed sector pilot lands (DW-Production PR #64) and a second sector vendors the helpers without modification.