Skip to content

Releases: unicef-drp/cso-toolkit

v0.4.6 — mode-lock fix + dw_root wrapper + dashboard

30 May 16:17
005bd11

Choose a tag to compare

v0.4.6 (2026-05-30)

Quality release. Four issues land in one cycle — one HIGH-severity
canonical-recognition fix that would have allowed reviewer-mode
overwrites of canonical Teams artefacts, plus three cleanups
(re-exported dw_root() wrapper, uniform .cso_require envelope on
standalone-source %>% gate, and an r/.gitattributes pin so the R
subtree checks out with LF endings on Windows). No public API
breaks.

dw_is_canonical recognises OneDrive-mounted Teams Documents path (issue #54) — HIGH severity

Pre-fix, dw_is_canonical() matched canonical paths against
teamsFolderCanonical and the Z: drive root only. On UNICEF laptops
where the Teams "Documents" library is mounted via OneDrive
(C:/Users/<user>/UNICEF/<team> - Documents/...), the canonical
prefix the helper saw at runtime did not match the literal path the
sector profile assembled when writing — so dw_is_canonical()
returned FALSE for paths that were, in fact, canonical Teams
deposits.

Combined with the v0.4.0 reviewer-mode write-refusal contract
(dw_save refuses canonical writes in reviewer mode unless
allow_canonical_write = TRUE), the false-negative meant a reviewer-
mode dw_save() call would have silently overwritten the canonical
Teams artefact instead of hard-stopping.

Surfaced empirically by the 2026-05-30 HVA + ED reviewer-mode
fanout runs (DW-Production): the canonical path through the
OneDrive-mounted Documents folder passed the
!dw_is_canonical(path) precondition and would have written to the
canonical artefact had the runs not been halted by the audit. The
fix extends dw_is_canonical() to also recognise the OneDrive-
mounted Documents form, so the canonical-write refusal contract
fires correctly on UNICEF-laptop reviewer sessions.

Tests in r/tests/testthat/test-dw-is-canonical.R extended to cover
the OneDrive-mounted form alongside the existing Z: and
teamsFolderCanonical cases.

dw_root() public wrapper re-exported (issue #53)

Several DW-Production sector scripts (carried forward from the v0.3
era) call dw_root() directly to derive the sector-folder anchor
for relative path resolution. dw_root() was never an exported entry
in the v0.4.x NAMESPACE — sector scripts that vendored v0.4.x copies
of the toolkit hit Error: could not find function "dw_root" the
first time they sourced the profile.

The internal helper is re-exported as a public wrapper in v0.4.6;
the existing implementation is unchanged. NAMESPACE gains an
export(dw_root) entry and man/dw_root.Rd is generated. The
helper joins Other io: in the family cross-references so it
surfaces in pkgdown alongside dw_save, dw_use, and friends.

Uniform .cso_require envelope on %>% standalone-source gate (issue #51)

v0.4.5's #46 fix added a local %>% binding gated by exists(). The
gate worked, but it raised a bare base-R error if magrittr itself
wasn't installed — without the [cso_toolkit.<func>] WHAT / Why / Fix envelope the rest of the toolkit follows.

v0.4.6 wraps the local-binding fallback in .cso_require("magrittr")
so the envelope shape is uniform. Consumers who source the file
standalone without magrittr installed now see the same actionable
three-part message as every other toolkit raise.

r/.gitattributes pins LF endings on the R subtree (issue #52)

On Windows checkouts under default git autocrlf settings, R source
files in the package subtree (*.R, *.Rd, NAMESPACE,
DESCRIPTION, *.yml, *.yaml, *.md) were rewritten with CRLF
endings on checkout. That made local working-tree SHA-256 hashes
diverge from the LF-computed Git blob hashes — a recurring source
of spurious "drift" complaints from Windows consumers comparing
working-tree hashes against manifest entries computed on Linux.

A new r/.gitattributes pins LF endings on the R subtree's source
files so Windows checkouts of the package subtree are byte-identical
to Linux/macOS. This is scoped to r/ — the rest of the repo
inherits the platform default. Per-file hash drift detection in
cso_toolkit_check() itself is not part of this release; today
the function only compares the pinned manifest version against the
upstream latest tag. The richer drift logic is planned for the
cso_toolkit_diff() / cso_toolkit_pull() work (stubbed in v0.4.6).

Docs: third role renamed INGESTOR → PUBLISHER (DBM / DBR / DBP)

The third data-warehouse role is now PUBLISHER (was INGESTOR),
aligning the role label with the verb the code already uses
(dw_publish(); there is no dw_ingest()). Docs-only — no code,
no exported-API change, and no mode-contract value changed: the
contract still exposes producer / reviewer, and wiring a
publisher mode remains future scope (issue
#15). The role
acronyms are now spelled out across the docs — DBM = Database
Manager (PRODUCER), DBR = Database Reviewer (REVIEWER), DBP =
Database Publisher (PUBLISHER). Touches docs/roles_and_workflow.md,
README.md, r/DESCRIPTION, templates/dbm_submission_template.md,
docs/git_workflow.md, docs/toolkit_strategy.md.

v0.4.5 — standalone-source pipe binding + dw_ aliases

29 May 17:10
cbcda83

Choose a tag to compare

Closes the v0.4.5 milestone with two deliverables — standalone-source %>% binding (issue #46, PR #47) and 8 new dw_-prefixed aliases (issue #42, PR #48). A companion declaration of magrittr as a first-class Import (with importFrom in NAMESPACE) makes the dependency surface honest. No public API breaks; both names remain exported throughout v0.4.x.

Standalone-source %>% binding (#46)

v0.4.4's #36 fix made aggregate_data_v2.R partially safe to source standalone by adding a local .cso_require() fallback. But the file still used the unqualified %>% operator without a binding — .cso_require() calls requireNamespace(), which loads but does NOT attach package exports, so source("aggregate_data_v2.R") errored at the first pipe call with Error: could not find function "%>%" even when magrittr was installed.

Same one-line gate pattern resolves it. Applied to both files that use %>% standalone:

if (!exists("%>%", mode = "function", inherits = TRUE)) {
  `%>%` <- magrittr::`%>%`
}

In the installed-package context the local binding is a no-op (NAMESPACE's importFrom(magrittr, "%>%") wins). In standalone-source mode the local binding provides the same %>% symbol via magrittr's namespace, so consumers don't have to library(magrittr) first.

To make this fully honest: magrittr is now declared in DESCRIPTION::Imports (previously it came in transitively via dplyr), and zzz.R::.cso_require carries an @importFrom magrittr %>% roxygen tag so NAMESPACE gains the explicit importFrom(magrittr, "%>%") entry. Both belong to the principled fix: the package's dependency declaration now matches the comments and the test claims about the installed-package no-op path.

Also corrected a misleading comment in zzz.R::globalVariables that referred to a non-existent apply_time_window.R file (apply_time_window() is defined inside aggregate_data_v2.R).

Surfaced empirically by Copilot review of DW-Production PR #144 (WS v0.4.4 install) on 2026-05-29; Copilot review of cso-toolkit PR #47 then flagged that the comment claims about NAMESPACE were aspirational, prompting the principled magrittr import declaration.

dw_-prefixed canonical aliases for the remaining un-prefixed exports (#42)

v0.4.4 added dw_ aliases for aggregate_data_v2 and create_sector_script (the two exports touched by #36). v0.4.5 extends the program to the remaining 8 un-prefixed exports so the toolkit's public surface consistently uses the dw_ namespace:

Un-prefixed dw_-prefixed alias
aggregate_data dw_aggregate_data
generate_agg_footnote dw_generate_agg_footnote
apply_time_window dw_apply_time_window
generate_markdown_report dw_generate_markdown_report
process_all_csv_files dw_process_all_csv_files
create_profile dw_create_profile
review_profile dw_review_profile
test_scripts dw_test_scripts

Both names continue to work and share the same \link{} man page via roxygen @rdname. No breaking change: consumers using the un-prefixed names see no behaviour change. The un-prefixed names remain exported and will continue to work indefinitely in v0.4.x; a future v0.5.x cycle may emit a lifecycle::deprecate_soft() warning on the un-prefixed forms to nudge migration, but only after sector leads confirm the migration is complete.

create_dw_sector_script was intentionally NOT aliased: it already carries a dw infix (it's a DW-Production-specific wrapper of create_sector_script), and create_sector_script got dw_create_sector_script in v0.4.4 anyway. The dual-naming is historical and won't be cleaned up under #42.

The test_scriptsdw_test_scripts alias also got a roxygen note: a future v0.5.x rename to dw_audit_scripts (to surface the audit intent and avoid the testthat-name collision) is on the design horizon. For now the prefix-only alias keeps the cleanup additive.

Forward look

v0.5.0 will land the live dw_publish() submission branch (issue #15) and the dw_regions() API redesign against the Country-and-Region-Metadata-API package (issue #40) once sector leads finalise the Helix endpoint contract and the regions API output schema.

DW-Production propagation

Once this release lands, the 4 active DW-Production sector branches (NT, IM, WS, HVA on their review/sector-*-2026-05-18 branches) will receive install PRs bumping their vendored toolkit from v0.4.4 to v0.4.5. The diff vs v0.4.4 affects 4 files (aggregate_data.R, aggregate_data_v2.R, zzz.R, DESCRIPTION). The remaining 7 vendored files are byte-identical to v0.4.4.


Full changelog: v0.4.4...v0.4.5

v0.4.4 — Quality release (DW-Production v0.4.3.1 fanout findings)

29 May 11:58
99d672d

Choose a tag to compare

v0.4.4 (2026-05-29)

Quality release. Three v0.4.4 milestone issues land in one cycle (PRs
#39,
#41,
#43). All three
surfaced empirically during the DW-Production v0.4.3.1 fanout audit
(IM / WS / HVA install + reviewer-mode runs on 2026-05-28). No public
API breaks; the bumped-default behaviour stays backwards-compatible
for v0.4.3.1 consumers.

Follow-up issue #42
tracks the remaining un-prefixed exports for a single naming-cleanup
PR (proposed v0.4.5).

Three carry-forward bugs + dw_-prefix aliases (issue #36)

Closes two of the three sub-fixes flagged on #36 during the HVA
scaffold-install Copilot review on 2026-05-28. The third (a value-arg
propagation bug in dw_regions.R) is moot — dw_regions is being
redesigned to consume the new unicef-drp/Country-and-Region-Metadata-API
package in #40 (v0.5.0); the affected code path is removed.

Sub-fix 1: aggregate_data_v2.R is now safe to source standalone

Pre-fix, aggregate_data_v2.R called .cso_require() from zzz.R.
Sourcing aggregate_data_v2.R directly (without sourcing zzz.R
first) left .cso_require undefined; the first call to
aggregate_data_v2(...) errored with could not find function .cso_require.

The file now defines a local fallback for .cso_require() at source
time, gated by exists(".cso_require", mode = "function", inherits = TRUE).
When zzz.R has already been sourced into .GlobalEnv, the shared
helper wins and nothing is redefined; when only aggregate_data_v2.R
is sourced, the local fallback provides the same behaviour.

Sub-fix 2: create_sector_script() profile sentinel check relaxed (and aligned with create_profile())

Pre-fix, the generated 00_run_<sector>.R template checked
isTRUE(profile_DW_Production). The DW-Production profile
(profile_DW-Production.R) does not set profile_DW_Production, so
the generated script errored at the sentinel check even after the
profile was sourced successfully.

The check is now relaxed from isTRUE(<name>) to !is.null(<name>),
which accepts any non-null value — character paths, numeric values,
or the boolean sentinel that create_profile("DW-Production") emits
(profile_DW_Production <- TRUE). The default profile_name stays at
"profile_DW_Production" so the documented scaffold flow
(create_profile()create_dw_sector_script()) works out of the
box without additional configuration.

The new error message names the missing variable so future
profile-vs-template mismatches surface a concrete fix. The roxygen
@param doc also clarifies that the generated template uses
projectFolder directly for input/output paths, so the profile MUST
set projectFolder for the runner to do useful work — the sentinel
check only confirms the profile was sourced.

For DW-Production consumers (whose existing profile_DW-Production.R
doesn't set the sentinel), the one-line profile_DW_Production <- TRUE
must be added to the profile. Tracked as a separate DW-Production-side
follow-up PR.

dw_-prefixed canonical aliases for the two touched exports

Toolkit-export naming consolidates around the dw_ prefix in v0.4.x;
the non-prefixed names predate that convention. While in this PR's
files anyway, added:

  • dw_aggregate_data_v2 (alias for aggregate_data_v2)
  • dw_create_sector_script (alias for create_sector_script)

Both names point to the same function and share the same \\code{\\link{}} man page (via roxygen @rdname). The non-prefixed names continue to work — no breaking change. Follow-up issue tracks the rest of the un-prefixed exports (aggregate_data, generate_markdown_report, apply_time_window, generate_agg_footnote, create_profile, review_profile, test_scripts, create_dw_sector_script) as a single cleanup PR.

dw_default_unicef_allowlist() helper for consumers (issue #37)

New exported helper returns a character vector of ^...-anchored
regex patterns covering UNICEF DRP GitHub-raw and repository URLs.
Consumers seed dw_url_allowlist from this constant instead of
re-deriving the patterns per project:

# In profile_<consumer>.R
dw_url_allowlist <- c(
  dw_default_unicef_allowlist(),
  # Project-specific extras:
  "^https://yourorg\\.github\\.io/"
)

Surfaced empirically by the DW-Production reviewer-mode audit on
2026-05-28 (IM 01_immunization.R): every URL-using sector script
hand-wrote the same ^https://raw\\.githubusercontent\\.com/unicef-drp/
pattern. The helper consolidates the duplication and lets future
UNICEF-DRP additions land in one place upstream rather than in each
consumer's profile.

Purely additive — consumers must opt in by composing the helper into
their dw_url_allowlist. The URL-freeze safety contract is unchanged
(no URL is fetchable without explicit ratification).

.dw_frozen_root() resolution is now discoverable (issue #38)

.dw_frozen_root() falls through a 3-tier resolution chain when
locating the URL-freeze cache root:

  1. dw_frozen_root global (opt-in; preferred)
  2. <githubFolder>/_frozen (fallback)
  3. <getwd()>/_frozen (last-resort fallback)

Pre-v0.4.4 the helper resolved silently — consumers whose project
layout didn't match the fallback heuristic had to grep dw_io.R to
discover why dw_use("https://...") couldn't find their frozen file.

v0.4.4 adds two discoverability improvements:

  • A new internal helper .dw_frozen_root_resolved() returns a
    (path, source) pair so downstream callers can surface the chosen
    tier in messages and error envelopes.
  • .dw_frozen_root_notify_once() emits a session-scoped notice the
    first time the helper falls back beyond tier #1:
    message() for tier #2 (<githubFolder>/_frozen),
    warning() for tier #3 (<getwd()>/_frozen).
    Consumers that explicitly set dw_frozen_root get no notice.

The missing-frozen-copy error envelope in .resolve_remote_url()
now includes the resolution tier so consumers see which fallback
fired (or that the explicit global picked the path that's wrong):

[cso_toolkit.dw_use:remote] Reviewer mode forbids fetching from the network.
 Missing frozen copy: <path>
 URL: <url>
 Frozen-root resolution: <chosen-root> (<tier-name>)
 Fix:
   1. If the path above is wrong, set `dw_frozen_root <- '<your-canonical-frozen-path>'` in your profile.
   2. Otherwise, a producer must call dw_use(...) once and commit the frozen file + sidecar.

Surfaced empirically by the DW-Production IM reviewer-mode audit on
2026-05-28: the fallback resolved to <githubFolder>/_frozen instead
of DW-Production's convention of <projectFolder>/01_dw_prep/011_rawdata/_frozen/.
Three runs (75+ min of slow Teams network) were needed to diagnose
what a single message could have surfaced at session start.

No public-API changes. The internal .dw_frozen_root() (path-only)
is preserved for backward compatibility with v0.4.3.1 callers.

v0.4.3.1 - Version stamp drift fix

28 May 04:31
70beede

Choose a tag to compare

v0.4.3.1 (2026-05-28)

Patch release. v0.4.3 (cut earlier today) bumped DESCRIPTION::Version
and NEWS.md but missed three version stamps inside r/R/dw_io.R:

  • header banner (# Toolkit version: 0.4.2)
  • dw_toolkit_version() docstring (Currently "0.4.2")
  • dw_toolkit_version() return value ("0.4.2")

Caught by Copilot review on DW-Production install PRs (#134 + #136):
dw_toolkit_version() was returning "0.4.2" while consumers had
manifest::pulled_version = "v0.4.3" — an inconsistency that would
have polluted dw_publish() provenance sidecars with the wrong
toolkit version. This patch bumps all three stamps to 0.4.3.1 so
they agree with DESCRIPTION and the manifest pin.

The release-cut checklist for the next minor (v0.5.0) should include
a grep -rn '0\.[0-9]\.[0-9]' r/R/ step to catch any future stamp
drift before tagging.

No behavioural changes; safe to install as a drop-in replacement for
v0.4.3.

v0.4.3 — Integrity release

28 May 03:53
7d33278

Choose a tag to compare

v0.4.3 (2026-05-28)

Integrity release. Two dw_use() fixes (issues
#30 +
#31) ported from
the DW-Production NT reviewer-mode reproducibility audit on 2026-05-27
(PR #33; DW-Production
PR #133). Both
landed first on the DW-Production vendored copy as local_edits; this
release lets the next cso_toolkit_pull(target_version = "v0.4.3") drop
those local edits.

Issue #32
(provenance sidecars) is the design-foundation companion in the same
milestone; implementation ships in a follow-up PR.

dw_use() — parquet / dta col_select = NULL conditional dispatch (issue #30)

The v0.4.2 parquet branch unconditionally passed col_select = cols
to arrow::read_parquet(). When a caller invoked dw_use(path) without
explicit columns, cols defaulted to NULL, and
arrow::read_parquet(path, col_select = NULL) returned a zero-column
schema rather than all columns. The same pattern affected
haven::read_dta(col_select = NULL). Both branches now use conditional
dispatch: pass col_select only when cols is non-NULL.

Surfaced empirically by DW-Production Run #6 (NT pipeline): 13+ stages
downstream of 1b_cmrs_series_import.R failed with "object 'COLUMN'
not found" because the upstream dw_use(out_dw_nut_*.parquet) returned
an empty tibble. After the fix (Run #7): 24/25 stages OK.

dw_use(cols_lenient = FALSE) — new flag for any_of()-style schema intersect (issue #31)

Sector scripts that wanted "select these columns if present, ignore
the absent" semantics passed dw_use(cols = dplyr::any_of(c(...))).
any_of() errors fatally outside a tidyselect selecting context
(tidyselect >= 1.2.0); R evaluates the helper before the call to
dw_use, so no lazy-eval trick inside the toolkit can save it.

New cols_lenient = FALSE parameter (default off for backwards compat).
When TRUE, dw_use introspects the file schema cheaply (parquet
metadata, csv / tsv / xlsx zero-row read, dta header) and intersects
the requested cols with the actual columns before the data read.
Empty intersection → warning + read all columns (forward-progress
guarantee). New internal helper .dw_schema_cols(path, fmt) performs
the schema-only read.

Migration:

  • dw_use(cols = dplyr::any_of(c(...)))dw_use(cols = c(...), cols_lenient = TRUE)
  • dw_use(cols = dplyr::all_of(c(...)))dw_use(cols = c(...)) (strict; all_of() at top level is deprecated in tidyselect 1.2.0 anyway)

Companion issue: provenance sidecars (issue #32)

Issue #32 sketches the producer → reviewer → ingestor integrity chain
that .write_remote_provenance (v0.4.0, URL-freeze sidecars) is the
seed of. Foundational design captured; implementation deferred to a
follow-up PR within the v0.4.3 milestone.

v0.4.0 -- Mode contract + Stata parity + demographics + dw_publish STUB

27 May 01:52
507b867

Choose a tag to compare

cso-toolkit v0.4.0 (2026-05-26)

Full notes also in NEWS.md.

Issue #15dw_publish() STUB (dry-run only)

Ships r/R/dw_publish.R as a deliberate STUB so DW-Production sector
scripts can wire the canonical call site today and have the live
branch light up automatically when v0.5.0 lands.

What's in:

  • Public signature matching the final v0.5.0 contract:
    dw_publish(path, indicator, vintage, sector, endpoint = "helix", dry_run = TRUE, ...).
  • Producer-only mode contract -- reviewer-mode calls raise
    BEFORE any I/O via the same envelope shape as dw_api_fetch().
  • Argument validation -- empty / missing path / indicator /
    vintage / sector raises the envelope; path must exist on
    disk and not be a directory; endpoint must be "helix" (the
    only recognised value in v0.4.0).
  • dry_run = TRUE returns a validated payload with sha256,
    bytes, built_at, built_by, and the toolkit-version stamp.
    Caller scripts can assert the payload shape today without ever
    hitting the network.

What's deliberately deferred:

  • Live submission (dry_run = FALSE) raises with the
    envelope-shaped "Live submission not yet implemented" message
    and a pointer to GitHub issue #15. Real Helix endpoint
    integration ships in v0.5.0 once sector leads (@karavan88,
    @sbrar29, @laurenfrancis1202) finalise the submission contract.

Scope boundary -- folded into the helper's roxygen + docstring so
the long-running DW-Production confusion is finally resolved:

  • dw_save() -- filesystem (Teams + Z: drive mirror).
  • dw_publish() -- API (Helix submission).

Tested: 6 new asserts in
r/tests/testthat/test-dw_publish.R (cover the mode lockout,
argument validation, missing path, endpoint allowlist, dry-run
payload shape, and the v0.5.0-not-yet envelope). Total R test
suite is now 235 / 0; devtools::check() remains 0 / 0 / 0.

Issues #17 + #18dw_pop() and dw_regions() (R only)

Two convenience wrappers that almost every sector pipeline needs but
that v0.3.0 made users write themselves. Both ship R-only in v0.4.0;
Python and Stata parity are tracked at the same GitHub issues for a
future minor.

  • r/R/dw_pop.R -- dw_pop() wraps dw_api_fetch(api = "wb")
    for the World Bank total-population indicator (SP.POP.TOTL) and
    returns a tidy (REF_AREA, TIME_PERIOD, OBS_VALUE) tibble. When
    year is NULL (default), only the latest available year per
    country is returned; pass a year (or vector of years) to subset.
    Optional countries filter, refresh to force a live fetch, and
    cache_key override.
  • r/R/dw_regions.R -- dw_regions() fetches the UNICEF region
    taxonomy from unicef-drp/Country-and-Region-Metadata
    (default UNICEF_REP_REG_GLOBAL.csv) via
    dw_api_fetch(api = "github_raw"), joins the country -> region map
    into the caller's tibble, calls aggregate_data_v2() per region
    with the supplied value + by + method, and appends the
    regional rows to the original. When weight = "population" (the
    default), denominators come from dw_pop() and are merged in on
    REF_AREA + TIME_PERIOD; otherwise the named column is used
    directly.

New pkgdown reference section: Demographics. Both helpers are
registered with @family demographics and exported.

Tested: 11 new asserts for dw_pop + 19 for dw_regions (total R
suite now 191 / 0); devtools::check() stays at 0 / 0 / 0.

Issue #5 — Stata helpers reaching mode-contract parity

Ships the three Stata helpers that completed the v0.4.0 producer /
reviewer contract on the Stata side, closing the gaps surfaced when
v0.3.0 landed Stata-as-a-supported-target with read + API parity still
deferred:

  • stata/src/dw_use.ado + .sthlp — uniform Stata read wrapper
    with auto-dispatch on .dta / .csv / .xlsx. Implements the v0.4.0
    mode-branched resolver (producer = local-first, reviewer =
    network-first), parses sibling .provenance.json for the recorded
    datasignature, and runs a non-blocking Z: drive integrity check
    (size by default; datasignature deep check via
    verify_z(sha256)).
  • stata/src/dw_require_no_api.ado + .sthlp — preflight gate
    that aborts (Stata error 459) when $dw_mode == "reviewer". Mirrors
    the R r/R/profile_helpers.R::dw_require_no_api shape.
  • stata/src/dw_load_config.ado + .sthlp — hand-rolled YAML
    reader for ~/.config/user_config.yml. No external dependency
    (AppLocker-safe). Populates $dw_mode + the teams* and
    sandboxRoot globals; hard-stops with the envelope-shaped error
    when dw_mode is missing or set to anything other than
    producer | reviewer.

Stata-side dw_api_fetch and Parquet / RDS read support remain out of
scope by design (route through R or Python and dw_save to .dta).

Issue #14 — Producer / reviewer mode contract tightening (BREAKING)

Refines the producer / reviewer split so that producer outputs are
provably redundant and reviewer reads are provably canonical. R +
Python siblings ship in lock-step.

  • Producer-mode writes are now redundant. Every primary write
    fans out to BOTH the Teams canonical mirror AND the Z: drive mirror
    (whichever are available). dw_save hard-stops with the standard
    envelope when neither mirror is configured / reachable — producer
    outputs cannot live only on the producer's laptop.
  • Reviewer-mode writes broadened. In addition to refusing writes
    under canonical (v0.3.0), dw_save now also refuses writes under
    the configured Z: drive root. Bypass with allow_canonical_write = TRUE for deliberate DBM bootstraps as before.
  • Reviewer-mode reads are network-first. dw_use now tries
    Teams → Z: → repo-local in reviewer sessions. When the network
    mirrors are unavailable and a local copy exists, the read still
    succeeds but emits an envelope-shaped warning flagging the
    provenance gap. Hard-stops when the file is missing everywhere.
    Producer-mode read order is unchanged (local-first; v0.3.0
    preserved).
  • overwrite default flipped TRUE → FALSE. This is the only
    source-incompatible change in v0.4.0. The overwrite check now
    examines ALL three destinations (primary, Teams, Z:); the helper
    refuses if any of them already exists. Pass overwrite = TRUE
    explicitly to restore v0.3.0 behaviour. (Python: same flip on the
    overwrite: bool argument; the legacy mirror_to_z keyword is
    silently dropped with a DeprecationWarning.)
  • New dw_toolkit_version() (R + Python). Returns the toolkit
    semver as a single string ("0.4.0"). Useful for stamping logs
    and asserting minimum-version requirements in consumer profiles.

Migration guide. Existing producer-mode callers that relied on
the v0.3.0 silent re-write semantics must either:

  1. Set overwrite = TRUE explicitly when overwriting an existing
    deposit. This is the common path for daily re-runs of the same
    vintage.
  2. Or sequence the write under a fresh vintage subfolder so no prior
    deposit collides. This is the recommended pattern for archival
    work.

Reviewer-mode callers do not need changes — the new network-first
read order is transparent when Teams/Z: are reachable; the new
warning surfaces when they are not (which used to be a silent
provenance gap).

Regression coverage. 9 new testthat assertions in
r/tests/testthat/test-dw_io-mode-contract.R (161 total R asserts;
0/0/0 from devtools::check) and 9 new smoke checks in
python/tests/manual/smoke_test.py (34 total). Error-envelope test
file extended to keep [cso_toolkit.<func>] WHAT / Why / Fix shape
on every new raise.

DW-Production backports (v0.3.0.9000 development line)

Landed via PR adopting the four undeclared local edits found by
docs/dw-production-alignment-2026-05-25.md:

  • B1 (new feature): dw_use("https://...") is now a first-class
    call site. R: new .is_allowlisted_url(), .dw_frozen_root(),
    .url_to_frozen_path(), .write_remote_provenance(),
    .download_and_freeze(), .resolve_remote_url() in r/R/dw_io.R,
    with dw_use()'s read resolver dispatching on
    ^https?://. Python: same shape in python/src/dw_io.py.
    Allowlist is empty by default so the toolkit ships
    consumer-neutral; the consumer's profile populates
    dw_url_allowlist (R global) / _state.dw_url_allowlist (Python).
    Reviewer mode refuses to fetch new URLs; producer mode downloads
    once and writes a .provenance.json with sha256 + bytes +
    fetched_at + fetched_by + dw_mode. Three new _state keys:
    dw_url_allowlist, dw_frozen_root, githubFolder.

  • B2 (QoL fix): dw_save auto-detects gzip when the path already
    ends in .gz (was previously a foot-gun: passing
    compress = FALSE and a .gz path would write the file
    uncompressed under the misleading name).

  • B3 (robustness fix): .provenance.json sidecar write is now
    wrapped in tryCatch (R) / try/except (Python) so a
    non-serialisable metadata value warns rather than rolling back the
    primary file. Sidecars are metadata; the asset is what matters.

  • B4 (bug fix): dw_api.R UIS-fetcher URL-encodes query keys +
    values via utils::URLencode(reserved = TRUE); previously param
    values containing & / = / spaces / non-ASCII would corrupt the
    query. Default cache extension for http and github_raw APIs
    bumped from csv to rds (R) / pkl (Python) so text and binary
    payloads round-trip correctly.

Regression tests: python/tests/manual/smoke_test.py now exercises
5 new B1–B4 invariants (20 total). R CMD check remains 0/0/0.

v0.3.0 — Python parity + Roxygen-complete R + graceful errors

25 May 20:12
e4f9b07

Choose a tag to compare

First release with full Python parity for every R helper, plus the
R Roxygen-complete reference (NAMESPACE + 26 Rd files + pkgdown
config), and a three-part error envelope ([cso_toolkit.<func>] WHAT / Why / Fix) standardised across R and Python.

Highlights

  • Full Python port at python/src/ — 10 modules, 26 public entries,
    same behaviour contract as R (mode-aware path routing, Z: drive
    mirror, provenance sidecars, version-drift detection).
  • Roxygen-complete R referenceNAMESPACE + man/ (26 .Rd
    files) + _pkgdown.yml config with 8 grouped families
    (io, api, sync, aggregate, survey-weights, reporting,
    scaffolding, audit).
  • Graceful error envelopes across R + Python with WHAT / Why / Fix
    shape and grep-friendly [cso_toolkit.<func>] prefix.
  • Secrets redaction for dw_api_fetch kwargs persisted to the
    .provenance.json sidecar.
  • Per-language top-level READMEs (r/, python/, stata/).
  • R CMD check — 0 errors / 0 warnings / 0 notes.
  • Python smoke test — 15/15 invariants.
  • Python error-envelope test — 30/30 raise paths.

Version metadata

  • r/DESCRIPTIONVersion: 0.3.0
  • python/pyproject.tomlversion = "0.3.0"
  • python/src/__init__.py__version__ = "0.3.0"
  • templates/.toolkit_manifest.ymlpulled_version: "v0.3.0"

See NEWS.md for the consolidated changelog.

Released via develop → main merge (PR #9) on 2026-05-25.

v0.2.0 - Stata helpers, R toolkit expansion, CSO rebrand, workflow diagrams

24 May 22:06
b78b819

Choose a tag to compare

First substantive release beyond the v0.1.0-rc1 scaffold. Promotes
everything that landed on develop today (PRs #1, #2, #3) to main
via release PR #4.

R helpers (r/R/)

Seven new files:

  • aggregate_data.R — original aggregate_data() (mean / weighted_mean, optional global aggregate, population + country coverage).
  • aggregate_data_v2.Raggregate_data_v2() with weighted_mean / mean / sum / proportion, coverage threshold, metadata columns. Ships generate_agg_footnote() and apply_time_window().
  • generate_markdown_report.Rgenerate_markdown_report() + process_all_csv_files() — descriptive-stats Markdown reports from CSV files.
  • create_sector_script.Rcreate_sector_script(sector_name, sector_code, base_dir, ...) scaffolds a sector run-script template; DW-Production convenience wrapper create_dw_sector_script().
  • profile_helpers.Rcreate_profile(repo_name, ...) scaffolds a profile_<repo>.R with the standard CSO building blocks; review_profile(path, ...) audits an existing profile.
  • test_scripts.Rtest_scripts(path, ...) recursively scans .R scripts and flags direct calls to raw IO / API commands wrapped by dw_io.R / dw_api.R (16 built-in rules across io / api families). Per-line escape hatch via # cso-allow: <rule-id>; CI mode via error_on_violation = TRUE.
  • dw_nestweight.Rdw_nestweight() redistributes survey weights from missing nested observations so per-stratum totals are preserved. R port of edukit_nestweight (Diana Goldemberg).

Stata helpers (stata/src/)

First three Stata helpers — fills the v0.2 placeholder:

  • dw_save.ado (+ .sthlp) — Stata sibling of R dw_save(). isid + compress + save + sibling .provenance.json sidecar matching the R-side shape (JSON-escaped). Honours producer / reviewer mode via $dw_mode; canonical writes blocked in reviewer mode unless allow_canonical_write is passed. Content hash via Stata-native datasignature (no shell-out / AppLocker issue).
  • dw_compare.ado (+ .sthlp) — Stata sibling of R dw_compare(). Merges two .dta files on idvars and classifies each value column as identical / numerically-equivalent (within tol()) / different; optional Markdown report.
  • dw_mkdir.ado (+ .sthlp) — recursive mkdir (Stata's built-in is non-recursive). Idempotent.

Docs (docs/)

  • dw_io_reference.md — per-function reference for dw_io.R.
  • dw_api_reference.md — per-function reference for dw_api.R.
  • git_workflow.md — gitflow + branch-protection contract reference (main / develop / feature; admin bypass on develop for hotfixes; full enforce on main).
  • roles_and_workflow.md — extended with Mermaid data-flow diagram (PRODUCER / REVIEWER / INGESTOR boundaries, colour-coded by role) + role-vs-action matrix.

Branding + meta

  • Top-level README.md rebranded as UNICEF Chief Statistician Office toolkit with a new Objective and motivation section spelling out the reproducibility-and-scalability mission for the D&A Section in OSE.
  • NEWS.md documents every addition.
  • r/R/README.md and stata/src/README.md updated to the live helper inventory; the stata/src "placeholder" line is gone.

Lineage credits

The three Stata helpers and dw_nestweight are ports from the
World Bank EduAnalyticsToolkit.
Each ported file credits the original author in its header:

cso-toolkit EduAnalyticsToolkit ancestor Original author
dw_save.ado edukit_save / savemetadata Diana Goldemberg
dw_compare.ado comparefiles / edukit_comparefiles Kristoffer Bjärkefur
dw_mkdir.ado rmkdir / edukit_rmkdir Kristoffer Bjärkefur
dw_nestweight.R nestweight / edukit_nestweight Diana Goldemberg

Review fixes folded in

Every PR landed with all Copilot review comments addressed: 11 fixes
on PR #1 (multi-col by correctness, n_distinct NA-counting,
namespace hygiene, removed source-time side effects, %||%
collision, generated-template defaults), 2 on PR #2 (Mermaid paths
matching documented layout, pushed-commit recovery recipe), 5 on PR
#3 (capture isid vs quietly isid, JSON-escaped sidecar values,
DRY compare logic, non-numeric guard).

Diff stats

+3,047 / −30 across 21 files, 23 commits from main to develop.

Known limitations (carrying into v0.3)

  • No Stata equivalent of dw_use() yet. Reading is unconstrained on
    the Stata side; the reviewer-mode no-API guard does not exist.
  • No Stata dw_require_no_api.do helper.
  • No Stata dw_load_config.do (each project profile still wires up
    $dw_mode itself).
  • See the dedicated tracking issue for the v0.3 Stata gaps.

Install

Vendored, not installed. R helpers ship in r/R/; Stata helpers in
stata/src/. See docs/toolkit_strategy.md
for the vendoring rationale and the cso_toolkit_pull() workflow
for refreshing a downstream consumer.

v0.1.0-rc1 — first release candidate

24 May 14:04

Choose a tag to compare

Pre-release

First release candidate. Extracted from unicef-drp/DW-Production PR #89.

Highlights

  • R helpers feature-complete under r/R/: dw_io, dw_api, cso_toolkit_sync.
  • Mode contract enforced in dw_api.R via dw_require_no_api() — reviewers cannot call upstream APIs.
  • .provenance.json sidecars on every dw_save() (sha256, schema, user, timestamp, metadata).
  • Vendoring model — copy into consumer's 00_functions/, pin via .toolkit_manifest.yml. See docs/toolkit_strategy.md.
  • Three-role model documented (docs/roles_and_workflow.md) — PRODUCER / REVIEWER / INGESTOR.
  • DBM pre-deposit template (templates/dbm_submission_template.md) — 8-section checklist with the ed pilot as a worked example.

What's not in this RC

  • Stata helpers (stata/src/) — scaffolded only; ship in v0.2.
  • Python helpers (python/src/) — scaffolded only; ship in v0.2.
  • HIV / WASH Stata save calls still write to canonical paths.

Cutting v1.0.0

After the ed sector pilot lands (DW-Production PR #64) and a second sector vendors the helpers without modification.