Skip to content

pgclone v4.4.0 -Schema/database-level in-line masking and table subset filters

Latest

Choose a tag to compare

@valehdba valehdba released this 11 Jun 19:41
· 1 commit to main since this release

Schema/database-level in-line masking and table subset filters — implements [discussion #16](#16).

Until now, in-line masking (masking applied while data streams from source to target) was only available per table. Cloning a whole schema or database meant masking afterwards — either a slow mass UPDATE (followed by re-analyze/vacuum) or masked views that aren't transparent to end users. v4.4.0 removes that gap, and adds regex-based table selection on top.

No SQL signature changes: both features are new keys in the existing JSON options argument of pgclone.schema(), pgclone.database(), and pgclone.database_create().

In-line masking for schema/database clones — "masks"

SELECT pgclone.schema(
    'host=source dbname=prod user=postgres password=...',
    'hr', true,
    '{"masks": {
        "employees":  {"email": "email", "ssn": "null"},
        "candidates": {"full_name": "name", "phone": "phone"}
     }}'
);
  • Keys are table names; values use the exact same format as the single-table "mask" option — all 8 masking strategies supported.
  • Masking happens inside the COPY stream: the mask expressions run on the source as part of the per-table COPY (SELECT ...), so unmasked data never reaches the target. No post-clone UPDATE, no re-vacuum, no view layer.
  • For database clones, keys may be schema-qualified ("hr.employees") to disambiguate identically named tables; qualified keys win over bare names.
  • Tables not listed are cloned unmasked.

Table subset filters — "tables" / "exclude_tables"

-- Only order_* partitions plus customers, minus anything archived
SELECT pgclone.schema(
    'host=source dbname=prod user=postgres password=...',
    'sales', true,
    '{"tables": ["order_[0-9]+", "customers"],
      "exclude_tables": [".*_archive"]}'
);
  • Entries are POSIX regular expressions, anchored as ^(pattern)$ (whole-name match), evaluated by the source server against pg_tables.tablename. Excludes apply after includes.
  • The filtered table list flows through the FK-retry and deferred-trigger passes automatically. Sequences, views, materialized views, and functions of the schema are still cloned in full.
  • Patterns are sent as quoted literals — no regex code in the extension and no injection surface; an invalid regex fails the clone with the source's invalid regular expression error.

Both options combine freely, and pgclone.database_create() forwards them verbatim into the target database's clone run.

Internals

  • The hand-rolled option parser gained string-aware JSON scanning (pgclone_json_balanced_end, pgclone_json_string_end, pgclone_json_unescape, pgclone_parse_pattern_array), so regex character classes like [0-9] inside option strings no longer terminate arrays early.
  • Per-table mask objects are captured as raw JSON during option parsing and re-emitted verbatim as the "mask" option of each matching per-table sub-call — the existing single-table masking pipeline does all the real work, unchanged.
  • pgclone.database() propagates masks/tables/exclude_tables verbatim to its per-schema sub-calls.

Tests & docs

  • pgTAP test group 25 (10 new tests, plan 77 → 87) with a new filter_test seed schema: include/exclude regex filtering, in-line email/null masking during a schema clone, per-table mask scoping, unfiltered data integrity.
  • docs/USAGE.md: new sections Clone a subset of tables and In-line masking during schema/database clone, plus JSON options reference rows.
  • COMMENT ON FUNCTION documentation for pgclone.schema(TEXT, TEXT, BOOLEAN, TEXT) and pgclone.database(TEXT, BOOLEAN, TEXT).

Limitations

  • Synchronous paths only: pgclone.schema_async() and the parallel worker pool do not carry JSON options through shared memory and ignore these keys (as they already do for "mask").
  • Bare-name mask keys apply in every schema with a matching table during a database clone; use schema-qualified keys to scope them.

Upgrade

ALTER EXTENSION pgclone UPDATE TO '4.4.0';

sql/pgclone--4.3.2--4.4.0.sql only updates function comments — no catalog changes. Identical behavior on PostgreSQL 14–18.

Full changelog: v4.3.2...v4.4.0