Tutorial

This documentation section for pgcopydb contains a list of classic pgcopydb use-cases. For details about the commands and their options see the manual page for each command at pgcopydb.

Copy Postgres Database to a new server

The simplest way to use pgcopydb is to just use the pgcopydb clone command as in the following example.

$ export PGCOPYDB_SOURCE_PGURI="dbname=pagila"
$ export PGCOPYDB_TARGET_PGURI="postgres://user@target:5432/pagila"

$ pgcopydb clone

Note that the options --source and --target can also be used to set the Postgres connection strings to the databases; however, using environment variables is particulary useful when using Docker containers.

You might also notice here that both the source and target Postgres databases must already exist for pgcopydb to operate.

Copy Postgres users and extensions

To copy Postgres users, a privileged connection to the target database must be setup, and to include passwords, a privileged connection to the source database must be setup as well. If it is required to limit these privileged connections to a minimum, then the following approach may be used:

$ coproc ( pgcopydb snapshot --source ... )

# first two commands would use a superuser role
$ pgcopydb copy roles --source ... --target ...
$ pgcopydb copy extensions --source ... --target ...

# now it's possible to use a non-superuser role
$ pgcopydb clone --skip-extensions --source ... --target ...

$ kill -TERM ${COPROC_PID}
$ wait ${COPROC_PID}

How to edit the schema when copying a database?

It is possible to split pgcopydb operations and to run them one at a time.

However, please note that in these cases, concurrency and performance characteristics that depend on concurrency are then going to be pretty limited compared to the main pgcopydb clone command where different sections are running concurrently with one-another.

Still in some cases, running operations with more control over different steps can be necessary. An interesting such use-case consists of injecting schema changes before copying the data over:

#
# pgcopydb uses the environment variables
#
$ export PGCOPYDB_SOURCE_PGURI=...
$ export PGCOPYDB_TARGET_PGURI=...

#
# we need to export a snapshot, and keep it while the indivual steps are
# running, one at a time
#
$ coproc ( pgcopydb snapshot )

$ pgcopydb dump schema --resume
$ pgcopydb restore pre-data --resume

#
# Here you can implement your own SQL commands on the target database.
#
$ psql -d ${PGCOPYDB_TARGET_PGURI} -f schema-changes.sql

# Now get back to copying the table-data, indexes, constraints, sequences
$ pgcopydb copy data --resume
$ pgcopydb restore post-data --resume

$ kill -TERM ${COPROC_PID}
$ wait ${COPROC_PID}

$ pgcopydb list progress --summary

Note that to ensure consistency of operations, the pgcopydb snapshot command has been used. See Resuming Operations (snaphots) for details.

Follow mode, or Change Data Capture

When implementing Change Data Capture then more sync points are needed between pgcopydb and the application in order to implement a clean cutover.

Start with the initial copy and the replication setup:

$ export PGCOPYDB_SOURCE_PGURI="dbname=pagila"
$ export PGCOPYDB_TARGET_PGURI="postgres://user@target:5432/pagila"

$ pgcopydb clone --follow

While the command is running, check the replication progress made by pgcopydb with the Postgres pg_stat_replication view.

When the lag is close enough for your maintenance window specifications, then it’s time to disconnect applications from the source database, finish the migration off, and re-connect your applications to the target database:

$ pgcopydb stream sentinel set endpos --current

This command must be run within the same --dir as the main pgcopydb clone --follow command, in order to share the same internal catalogs with the running processes.

When the migration is completed, cleanup the resources created for the Change Data Capture with the following command:

$ pgcopydb stream cleanup

See also Change Data Capture using Postgres Logical Decoding for mode details and other modes of operations.

How to validate schema and data migration?

The command pgcopydb compare schema is currently limited to comparing the metadata that pgcopydb grabs about the Postgres schema. This applies to comparing the list of tables, their attributes, their indexes and constraints, and the sequences values.

The command pgcopydb compare data runs an SQL query that computes a checksum of the data on each Postgres instance (i.e. source and target) for each table, and then only compares the checksums. This is not a full comparison of the data set, and it shall produce a false positive for cases where the checksums are the same but the data is different.

$ pgcopydb compare schema
$ pgcopydb compare data