-
Notifications
You must be signed in to change notification settings - Fork 1
Backup and restore overview in Babelfish
On this page:
- Introduction
- File system level backups
- SQL-level backups over TDS connection
- Dump backups over PostgreSQL connection
- Multiple logical DBs
- Babelfish-flavoured pg_dump
- Restoring DB under different name
- GUI Backup tool
Backing up an RDBMS is an essential task to ensure that data recovery can be performed in case of hardware or other failures. Backups need to be done regularly and restore tests need to be run on these backups to check that all required data is there and that it can actually be restored when needed.
With Babelfish some of its implementation details (multiple T-SQL DBs inside a single PostgreSQL DB, Babelfish metadata caltalogs) make backups less straightforward than with plain Postgres.
In this article we will talk mostly about "full DB" backups, when DB data is backed up along with DB schema so the whole DB can be restored on another machine. Other kinds of backups can be: partial (only some DB objects), incremental (only changed data, not yet supported in stable Postgres, coming in Pg17) or data only (data export/import).
Each PostgreSQL instance stores all DB files and settings in a dedicated DB cluster file system directory. This directory is self-contained and, when copied to another machine, can be used as-is with the same version of Postgres.
WiltonDB on Windows stores default DB cluster files inside <install_dir>/data
directory (see Changing DB data directory to use different directory). On Linux DB cluster is stored under /var/lib/pgsql/data
or /var/lib/postgresql/15/main
depending on a distro.
Copying the whole DB cluster is a supported backup approach, but it comes with a big caveat: Postgres instance must be completely stopped before copying files. Accidentally copying DB cluster files of a running instance will produce corrupted backup even if this instance is idle.
Besides that, it is not possible to backup only a subset of DB files - all cluster files need to be copied to have a consistent backup.
Babelfish allows to access the same DB data over both Postgres (default port 5432) and TDS (default port 1433) connections. All data can be accessed over TDS connection using ordinary SQL queries. And definition for tables and other DB objects in most cases can be obtained from metadata introspection views (information_schema views and others).
For example, if we create some DB object:
create view myview1 as select 42
We can obtain original definition of this object from information_schema
:
select view_definition from information_schema.views where table_name = 'myview1'
So, in theory, using only TDS connection, it is possible to enumerate all DB tables, get colunms description for these tables (to generate DDL definition) and fetch all data from these tables to generate insert
statements. Such tools exist for MSSQL and some of them may be compatible with WiltonDB.
For example, DBeaver tool is known to work correctly with WiltonDB (using default MSSQL JDBC drivers), see DBeaver wiki for details.
Using this approach for "full DB" backups may be problematic for following reasons:
-
Besides tables and views there are many DB object kinds in T-SQL (namespaces, triggers, domains, extended properties etc). We are not aware of any tool that can get definitions of all DB objects through SQL introspection with WiltonDB.
-
To get a consistent backup it is necessary to have selected logical T-SQL DB in a completely idle state, when the introspection/fetch tool connection is the only connection to this DB. This may be not easy to achieve on live DB instances.
While the points above make SQL-level backups infeasible as "full DB" backups, this approach still can be valuable for backups of particular tables and for moving data between MSSQL and WiltonDB, see also Data import with bcp
Most RDBMS do not require client tools to perform manual introspection to do backups. Usually first-party standards tools exist for this that are specific for particular RDBMS. In Postgres such tools are pg_dump and pg_dumpall.
In general, pg_dump
is doing more or less the same as some custom T-SQL tool from previous section - it performs introspection of DB objects and writes definitions of these objects and all table data to a file system backup file that can be restored using psql or pg_restore utilities. Just unlike third-party SQL-level tools, pg_dump
uses internal API for introspection, and, being the part of core Postgres, is guaranteed to correctly enumerate and dump all kinds of DB objects.
pg_dump
also allows to create consistent backups on live DB instances without any additional setup - it uses MVCC features to automatically enable Serializable isolation level at the beginning of a backup run. It supports many options allowing to do backups in different formats, use compression or select a subset of DB objects.
pg_dump
is a go-to tool for plain Postgres DBs (though there are also other popular backup tools like pgBackRest) but things get more complicated with Babelfish-flavoured Postgres. Two related features of Babelfish require additional care when using pg_dump
- multiple "logical" DBs and Babelfish system catalogs.
Babelfish stores all its data in a single "physical" Postgres DB, it is named wilton
in WiltonDB and babelfish_db
in AWS Aurora. While Babelfish supports both single-db
and multi-db
modes in this article we assume that multi-db
mode is used. In this case, when new logical T-SQL DB is created using create database mydb1
, under the hood a new DB schema mydb1_dbo
is created for it inside physical DB.
pg_dump
treats physical DB as a target for a "full DB" backup. But with multiple T-SQL DBs we want to backup only a single selected logical DB, not all logical DBs at once. Default unmodified pg_dump
cannot do this.
Babelfish includes a modified version of pg_dump
that supports --bbf-database-name option that allows to specify a name of a logical DB.
This modified pg_dump
also contains special logic for backing up Babelfish system catalogs that store metadata for T-SQL objects:
One comon scenario we've observed in practice is to restore DB backup under a different DB name. For example, to take a mydb1
backup from a live server and restore it to dev machine under mydb1_mycustomer1_date1
name along with other versions of the same DB that already exist on dev machine. This scenario works well with MSSQL where DBs can be easily renamed.
Unfortunately in Babelfish (as of version 3.3
) logical DBs cannot be renamed. And in Postgres pg_dump
writes fully qualified names into backup file, including DB schema name that is used to store logical DB name.
To work around this limitation WiltonDB provides pgdump_toc_rewrite library (and a command line tool) that can be used to rewrite logical DB name in backup files. When backup file is being restored into an empty Babelfish instance, server does not have any information about DB name except the name encoded in backup files. pgdump_toc_rewrite
rewrites DB name in both TOC (table of contents) file and also in backed up data files of Babelfish catalogs.
Besides the catalog table entries, that can be adjusted with pgdump_toc_rewrite
, Babelfish also requires a number of DB server roles for every logical DB. These roles are global for the whole DB cluster thus they are not included into pg_dump
output that is produced for a particular (in our case, always for a Babelfish one) Postgres DB.
Global roles for a logical DB can be backed up in SQL form using pg_dumpall
utility and then adjusted as needed befor restore. They also can be created manually, see Creating server roles for new DB name for details.
WiltonDB provides a GUI Backup tool that uses pg_dump
, pg_restore
utilities and pgdump_toc_rewrite
library under the hood:
See WiltonDB Backup GUI tool for details.
See also: Scripting backup and restore operations.
See an error or have a question: open an issue or ask in Discord chat.