.. _resuming_operations:

Resuming Operations (snaphots)
==============================

An important aspect of pgcopydb design is detailed in the documentation
section :ref:`pgcopydb_concurrency` and has to do with using many concurrent
worker processes to implement parallelism.

Even when using multiple worker processes, it is important that pgcopydb
operations are consistent. It is essential to guarantee that the same source
schema and data set are used by every single worker process throughout the
operations.

Consistency with multiple Postgres sessions is achieved thanks to Postgres'
ability to export and import snapshots. As per Postgres docs about `Snapshot
Synchronization Functions`__:

__ https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION

.. admonition:: From the PostgreSQL documentation

   PostgreSQL allows database sessions to synchronize their snapshots. A
   snapshot determines which data is visible to the transaction that is
   using the snapshot. Synchronized snapshots are necessary when two or more
   sessions need to see identical content in the database. If two sessions
   just start their transactions independently, there is always a
   possibility that some third transaction commits between the executions of
   the two START TRANSACTION commands, so that one session sees the effects
   of that transaction and the other does not.

   To solve this problem, PostgreSQL allows a transaction to export the
   snapshot it is using. As long as the exporting transaction remains open,
   other transactions can import its snapshot, and thereby be guaranteed
   that they see exactly the same view of the database that the first
   transaction sees. But note that any database changes made by any one of
   these transactions remain invisible to the other transactions, as is
   usual for changes made by uncommitted transactions. So the transactions
   are synchronized with respect to pre-existing data, but act normally for
   changes they make themselves.

   Snapshots are exported with the pg_export_snapshot function, shown in
   `Table 9.94`__, and imported with the `SET TRANSACTION`__ command.

   __ https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION-TABLE
   __ https://www.postgresql.org/docs/current/sql-set-transaction.html

Using these Postgres APIs allows pgcopydb to implement consistent operations
even when using multiple worker processes.

Bypassing consistency issues
----------------------------

If you can ensure that no writes happen on the source database for the whole
duration of the pgcopydb operations, which means no schema change (DDL) and
no data change (DML), then consistency issues can't happen: that's because
the database is *static* for our context, probably within a *maintenance
window* setup where the applications are disconnected from the source
database service.

Note that pgcopydb offers the ``--not-consistent`` option that allows
bypassing all the complexity of sharing a snapshot throughout the
operations. In particular, resuming operations after a crash or even
implementing multi-steps operations is made easier when bypassing
consistency aspects altogether.

When you are able to work within a maintenance window where the database is
isolated from any application traffic, consider using ``--not-consistent``.
   
Consistency and concurrency: Postgres snapshots
-----------------------------------------------

As seen above, Postgres offers different APIs to export and import a
snapshot:

 1. Function ``pg_export_snapshot()`` exports the current snapshot.
 2. SQL command ``SET TRANSACTION SNAPSHOT`` imports the given snapshot.
 3. `Replication protocol`__ command ``CREATE_REPLICATION_SLOT`` allows
    exporting its snapshot.

    __ https://www.postgresql.org/docs/16/protocol-replication.html

Exporting a Postgres snapshot can be done either at the *create replication
slot* time, or from a non-replication connection using the SQL function
``pg_export_snapshot()``. This is an either/or situation, Postgres does not
allow mixing these two approaches.

Also remember that a single snapshot must be used throughout pgcopydb
operations, both the initial COPY of the schema and data and also the Change
Data Capture aspects in order to achieve consistency (no data loss, no
duplicates in the data change stream).

To be able to implement multiple worker processes in pgcopydb and have a
consistent view of the same database (schema, data) in every single process,
pgcopydb needs to first export a single common snapshot and then have every
worker process import that snapshot when connecting to the source database.

When implementing Change-Data-Capture thanks to the ``--follow`` option,
then it is also required that there is no gap between the initial snapshot
and the first change received, and also that no changes are sent that were
part of the initial copy. Postgres knows how to offer that guarantee via its
snapshot exporting facility in the ``CREATE_REPLICATION_SLOT`` replication
command.

As a result, the ``pgcopydb snapshot --follow`` command is required by the
Postgres API to also create the pgcopydb replication slot, and exports the
replication slot snapshot.

The ``pgcopydb snapshot`` command, when the ``--follow`` option is not used,
simply connects to the Postgres source database using the normal query
protocol and run the command ``select pg_export_snapshot()`` to grab a
snapshot that can be shared by all the worker processes.

Moreover the Postgres API for exporting a snapshot has the following
limitation:

.. admonition:: From the PostgreSQL docs

   The snapshot is available for import only until the end of the
   transaction that exported it.

This means that the ``pgcopydb snapshot`` command must be kept running for
the whole pgcopydb initial copy operations. The replication client only uses
the replication slot to ensure consistency, so when only the follow worker
processes are running, holding on to the snapshot is not required anymore.
  
Resumability of pgcopydb operations
-----------------------------------

The ability to resume operations when using pgcopydb faces three quite
different contexts. Depending on the context and when the previous operation
has been interrupted, then running the same pgcopydb command again with the
``--resume`` option might just work, or might error out because the
situation does not allow a consistent resuming of the operation that was
interrupted.

Bypassing consistency issues
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When the ``--resume --not-consistent`` options are used, then there is no
restrictions around snapshot re-use when trying to resume interrupted
operations.

Consistent copy of the data
^^^^^^^^^^^^^^^^^^^^^^^^^^^

When using ``pgcopydb clone --resume`` the snapshot used in the previous
attempts is going to be re-used. For Postgres to be able to import that
snapshot again, the transaction that exported the snapshot must still be
running on the source database system.

Single pgcopydb command
  When using ``pgcopydb clone`` the snapshot holding process is part of that
  single process tree, and any interruption of this command (signal, C-c,
  crash) also terminates the snapshot holding sub-process and the snapshot
  is then lost.

Separate pgcopydb snapshot command
  That's why the ``pgcopydb snapshot`` command can be used separately. Then
  the main ``pgcopydb clone`` command re-uses the snapshot automatically and
  can be left holding the snapshot even in case of an interruption of the
  ``pgcopydb clone`` command.

External snapshot
  It is also possible to use another command or software to export and
  maintain the snapshot that pgcopydb uses and then use ``pgcopydb
  clone --snapshot ...`` to share the snapshot with pgcopydb.

Consistent copy of the data with CDC
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When using Change Data Capture with the ``--follow`` option resuming
operations consistently requires the following situation:

  1. The initial COPY of the data needs to still have access to the exported
     snapshot.

     Even when the snapshot has been exported with a replication protocol
     command, Postgres still requires the session to be maintained opened
     here.
     
  2. The logical replication on the client side is not concerned with the
     snapshot operations, that's done server-side when creating the
     replication slot; from there on, all the client has to do is consume
     from the replication slot.

     
Snapshot and catalogs (cache invalidation)
------------------------------------------

The source catalog table ``setup`` registers information about the current
pgcopydb command. The information is checked at start-up in order to avoid
re-using data in a different context.

The information registered is the following, and also contains the
*snapshot* information. In case of a mismatch, consider using ``--resume
--not-consistent`` when that's relevant to your operations.

Here's how to inspect the current ``setup`` information that pgcopydb
maintains in its local catalog cache:

::

   $ sqlite3 /tmp/pgcopydb/schema/source.db
   sqlite> .mode line
   sqlite> select * from setup;
                         id = 1
              source_pg_uri = postgres:///pagila
              target_pg_uri = postgres:///plop
                   snapshot = 00000003-00000048-1
   split_tables_larger_than = 0
                    filters = {"type":"SOURCE_FILTER_TYPE_NONE"}
                     plugin =
                  slot_name =

The source and target connection strings only contain the Postgres servers
hostname, port, database name and connecting role name. In particular,
authentication credentials are not stored in the catalogs.