Schema/database-level in-line masking and table subset filters — implements [discussion #16](#16).
Until now, in-line masking (masking applied while data streams from source to target) was only available per table. Cloning a whole schema or database meant masking afterwards — either a slow mass UPDATE (followed by re-analyze/vacuum) or masked views that aren't transparent to end users. v4.4.0 removes that gap, and adds regex-based table selection on top.
No SQL signature changes: both features are new keys in the existing JSON options argument of pgclone.schema(), pgclone.database(), and pgclone.database_create().
In-line masking for schema/database clones — "masks"
SELECT pgclone.schema(
'host=source dbname=prod user=postgres password=...',
'hr', true,
'{"masks": {
"employees": {"email": "email", "ssn": "null"},
"candidates": {"full_name": "name", "phone": "phone"}
}}'
);- Keys are table names; values use the exact same format as the single-table
"mask"option — all 8 masking strategies supported. - Masking happens inside the COPY stream: the mask expressions run on the source as part of the per-table
COPY (SELECT ...), so unmasked data never reaches the target. No post-cloneUPDATE, no re-vacuum, no view layer. - For database clones, keys may be schema-qualified (
"hr.employees") to disambiguate identically named tables; qualified keys win over bare names. - Tables not listed are cloned unmasked.
Table subset filters — "tables" / "exclude_tables"
-- Only order_* partitions plus customers, minus anything archived
SELECT pgclone.schema(
'host=source dbname=prod user=postgres password=...',
'sales', true,
'{"tables": ["order_[0-9]+", "customers"],
"exclude_tables": [".*_archive"]}'
);- Entries are POSIX regular expressions, anchored as
^(pattern)$(whole-name match), evaluated by the source server againstpg_tables.tablename. Excludes apply after includes. - The filtered table list flows through the FK-retry and deferred-trigger passes automatically. Sequences, views, materialized views, and functions of the schema are still cloned in full.
- Patterns are sent as quoted literals — no regex code in the extension and no injection surface; an invalid regex fails the clone with the source's
invalid regular expressionerror.
Both options combine freely, and pgclone.database_create() forwards them verbatim into the target database's clone run.
Internals
- The hand-rolled option parser gained string-aware JSON scanning (
pgclone_json_balanced_end,pgclone_json_string_end,pgclone_json_unescape,pgclone_parse_pattern_array), so regex character classes like[0-9]inside option strings no longer terminate arrays early. - Per-table mask objects are captured as raw JSON during option parsing and re-emitted verbatim as the
"mask"option of each matching per-table sub-call — the existing single-table masking pipeline does all the real work, unchanged. pgclone.database()propagatesmasks/tables/exclude_tablesverbatim to its per-schema sub-calls.
Tests & docs
- pgTAP test group 25 (10 new tests, plan 77 → 87) with a new
filter_testseed schema: include/exclude regex filtering, in-line email/null masking during a schema clone, per-table mask scoping, unfiltered data integrity. docs/USAGE.md: new sections Clone a subset of tables and In-line masking during schema/database clone, plus JSON options reference rows.COMMENT ON FUNCTIONdocumentation forpgclone.schema(TEXT, TEXT, BOOLEAN, TEXT)andpgclone.database(TEXT, BOOLEAN, TEXT).
Limitations
- Synchronous paths only:
pgclone.schema_async()and the parallel worker pool do not carry JSON options through shared memory and ignore these keys (as they already do for"mask"). - Bare-name mask keys apply in every schema with a matching table during a database clone; use schema-qualified keys to scope them.
Upgrade
ALTER EXTENSION pgclone UPDATE TO '4.4.0';sql/pgclone--4.3.2--4.4.0.sql only updates function comments — no catalog changes. Identical behavior on PostgreSQL 14–18.
Full changelog: v4.3.2...v4.4.0