Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements a major architectural shift from schema log table-based DDL replication to a stateless approach using DDL events emitted directly into the WAL stream. Instead of maintaining a separate pgstream.schema_log table, the system now captures DDL events as logical messages in the PostgreSQL WAL, computes table metadata on-the-fly, and processes schema diffs directly from DDL events.
Changes:
- Removed schema log dependency from all components and replaced with WAL-based DDL event processing
- Split migrations into core (basic DDL replication) and injector (metadata injection with
table_ids) for reduced database footprint - Updated initialization flow, status checker, and CLI to support the new multi-migration structure
- Fixed OpenSearch compatibility issues and removed internal column version from WAL data
Reviewed changes
Copilot reviewed 132 out of 149 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/wal/processor/search/store/search_adapter.go | Updated to use versioned index names and removed schema log conversion method |
| pkg/wal/processor/search/store.go | Changed interface from schema log entries to WAL schema diffs |
| pkg/wal/processor/search/search_store_retrier.go | Updated method signature to use schema diffs instead of log entries |
| pkg/wal/processor/search/search_msg_batch.go | Replaced schema change field with schema diff field |
| pkg/wal/processor/search/search_batch_indexer.go | Updated to process schema diffs and removed schema log specific logic |
| pkg/wal/processor/search/errors.go | Removed version-related error constant |
| pkg/wal/processor/postgres/postgres_writer.go | Removed schema log store dependency from writer initialization |
| pkg/wal/processor/postgres/postgres_wal_adapter.go | Updated to use DDL events instead of schema log events |
| pkg/wal/processor/postgres/postgres_schema_observer.go | Refactored to process DDL events instead of schema log entries |
| pkg/wal/processor/postgres/postgres_bulk_ingest_writer.go | Added flag to ignore DDL events |
| pkg/wal/processor/postgres/postgres_batch_writer.go | Removed schema log store dependency and added feature not supported error handling |
| pkg/wal/processor/postgres/config.go | Removed schema log store configuration |
| pkg/wal/processor/kafka/wal_kafka_batch_writer.go | Updated message key extraction to use DDL events |
| pkg/wal/processor/filter/wal_filter.go | Updated filtering logic to work with DDL events |
| pkg/stream/stream_status.go | Changed migration status from singular to array to support multiple migrations |
| pkg/stream/stream_init.go | Refactored to use new migrator library with support for multiple migration sets |
| pkg/stream/config.go | Added helper methods for init configuration with injector migration flag |
| internal/searchstore/search_api.go | Added JSON tags to mapping structs |
| internal/searchstore/opensearch/opensearch_client.go | Fixed index mapping retrieval to work with aliases |
| internal/postgres/errors.go | Added feature not supported error type |
| internal/migrator/migrator.go | New internal library for managing multiple migration sets |
| migrations/postgres/core/* | New core migrations for basic DDL replication functionality |
| migrations/postgres/injector/* | New injector migrations for metadata injection functionality |
| docs/* | Updated documentation to reflect stateless DDL replication approach |
| cmd/* | Updated CLI commands to support injector migration flags |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6ea2cd0 to
4c178f2
Compare
4c178f2 to
fcd936b
Compare
Merging this branch changes the coverage (9 decrease, 8 increase)
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
535d77a to
62cba1b
Compare
Description
This branch implements a major architectural shift from the existing schemalog table based DDL replication to a stateless approach using DDL events emitted directly into the WAL.
DDL tracking
Instead of maintaining a separate schema log store/table to track DDL changes, the system now:
This means schema changes are sent through the same replication stream as data changes, preserving the order of data events, making the entire system stateless since there's no external schema log to maintain or query.
Injector
The injector relied on the
pgstream.table_idstable to get the pgstream ids that would then be used by processors such as the search indexer. Before, it offered the option to select a version column for the event while using the LSN by default. The refactor was used to simplify this part, and use the LSN as version for all events, removing the added complexity of configuring the version.The identity columns are still selected in the same way (primary keys or unique not null columns), and are still required for replication.
Migrations
The migrations have been simplified and are now split by functionality:
table_idstable), required for search targets to work.This way, only the relevant migrations are applied for each specific usecase, reducing the footprint on the source database.
The initialization flow has been updated accordingly to be able to manage the new migrations structure, as well as the status checker.
Snapshots
The schema snapshot generator was simplified by replacing the dedicated schema log generator with a restoreToWAL function that parses DDL statements from pg_dump output and converts them directly into WAL DDL events. These snapshot DDL events now flow through the same processor pipeline as runtime DDL events, eliminating the need for the schema log table and creating a single unified processing path for all schema changes.
Search processor
With the new schemalog-less approach, the search processor had to be refactored. Before, it maintained a pgstream index with schemalog entries that was used to map the internal pgstream IDs used as field names with the table and column names. Now, there's no schemalog tracking on the search store, and instead aliases are created in the index mapping to map names to pgstream IDs.
This makes the search integration much more user-friendly while maintaining the stability benefits of using immutable internal IDs for storage.
Commits have been split for ease of reviewing. Each commit can be reviewed independently to simplify the review process.
Related Issue(s)
Type of Change
Changes Made
Testing
Additional Notes
This PR will be part of a new major pgstream version,
v1.0.0, since it contains structural changes that break backwards compatibility (removal of theschema_logand all previous pgstream state in the source database).Functionally the overall behaviour should remain the same, but the underlying behaviour is inherently different, so by having a new major version we ensure enough care is put into the migration.