pgcopydb stream

pgcopydb stream - Stream changes from source database

Warning

This mode of operations has been designed for unit testing only.

Consider using the pgcopydb clone (with the --follow option) or the pgcopydb follow command instead.

Note

Some pgcopydb stream commands are still designed for normal operations, rather than unit testing only.

The pgcopydb stream sentinel set startpos, pgcopydb stream sentinel set endpos, pgcopydb stream sentinel set apply, and pgcopydb stream sentinel set prefetch commands are necessary to communicate with the main pgcopydb clone --follow or pgcopydb follow process. See Change Data Capture Example 1 for a detailed example using pgcopydb stream sentinel set endpos.

By default these commands read/write the sentinel directly in the source SQLite catalog, which requires running where the catalog files live. Pass --host (and optionally --port) to instead talk to the running follow process over TCP, with no shared catalog files. See Sentinel Control.

Also the commands pgcopydb stream setup and pgcopydb stream cleanup might be used directly in normal operations. See Change Data Capture Example 2 for a detailed example.

This command prefixes the following sub-commands:

pgcopydb stream: Stream changes from the source database

Available commands:
  pgcopydb stream
    setup     Setup source and target systems for logical decoding
    cleanup   Cleanup source and target systems for logical decoding
    prune     Remove already-applied CDC files from disk to reclaim disk space
    prefetch  Stream changes from the source database into the SQLite CDC store
    catchup   Transform and apply prefetched changes from the SQLite CDC store to the target
    replay    Replay changes from the source to the target database, live
  + sentinel  Maintain a sentinel table
    receive   Stream changes from the source database
    apply     Apply changes from the replayDB to the target database, or stdout

pgcopydb stream sentinel: Maintain a sentinel table

Available commands:
  pgcopydb stream sentinel
    setup  Setup the sentinel table
    get    Get the sentinel table values
  + set    Set the sentinel table values

pgcopydb stream sentinel set: Set the sentinel table values

Available commands:
  pgcopydb stream sentinel set
    startpos  Set the sentinel start position LSN
    endpos    Set the sentinel end position LSN
    apply     Set the sentinel apply mode
    prefetch  Set the sentinel prefetch mode

pgcopydb stream sentinel setup: Setup the sentinel table
usage: pgcopydb stream sentinel setup <start lsn> <end lsn>

Those commands implement a part of the whole database replay operation as detailed in section pgcopydb follow. Only use those commands to debug a specific part, or because you know that you just want to implement that step.

Note

The sub-commands stream setup then stream prefetch and stream catchup are higher level commands that use internal information to track their progress in the SQLite Change Data Capture store.

The sub-commands stream receive and stream apply are lower level interfaces to the same SQLite store: receive writes decoded changes to the CDC output database, and apply performs the inline transform and writes the resulting statements to the CDC replay database before applying them to the target.

pgcopydb stream setup

pgcopydb stream setup - Setup source and target systems for logical decoding

The command pgcopydb stream setup connects to the target database and creates a replication origin positioned at the LSN position of the logical decoding replication slot that must have been created already. See pgcopydb snapshot to create the replication slot and export a snapshot.

pgcopydb stream setup: Setup source and target systems for logical decoding
usage: pgcopydb stream setup

  --source                      Postgres URI to the source database
  --target                      Postgres URI to the target database
  --dir                         Work directory to use
  --restart                     Allow restarting when temp files exist already
  --resume                      Allow resuming operations after a failure
  --not-consistent              Allow taking a new snapshot on the source database
  --snapshot                    Use snapshot obtained with pg_export_snapshot
  --plugin                      Output plugin to use (pgoutput, test_decoding, wal2json)
  --wal2json-numeric-as-string  Print numeric data type as string when using wal2json output plugin
  --replay-no-op-updates        Replay UPDATE statements even when no column values changed
  --slot-name                   Stream changes recorded by this slot
  --origin                      Name of the Postgres replication origin

pgcopydb stream cleanup

pgcopydb stream cleanup - cleanup source and target systems for logical decoding

The command pgcopydb stream cleanup connects to the source and target databases to delete the objects created in the pgcopydb stream setup step.

pgcopydb stream cleanup: Cleanup source and target systems for logical decoding
usage: pgcopydb stream cleanup

  --source         Postgres URI to the source database
  --target         Postgres URI to the target database
  --restart        Allow restarting when temp files exist already
  --resume         Allow resuming operations after a failure
  --not-consistent Allow taking a new snapshot on the source database
  --snapshot       Use snapshot obtained with pg_export_snapshot
  --slot-name      Stream changes recorded by this slot
  --origin         Name of the Postgres replication origin

pgcopydb stream prune

pgcopydb stream prune - Remove already-applied CDC files from disk to reclaim disk space

The command pgcopydb stream prune removes CDC output.db / replay.db file pairs that have already been fully applied to the target and are no longer needed for resume or restart.

A file pair is safe to delete when its endpos is strictly less than sentinel.replay_lsn (the LSN acknowledged back to the PostgreSQL replication slot as confirmed_flush) and its done_time_epoch is set (the receive process has closed it). Those transactions have been durably committed on the target; the slot will never re-deliver them.

The deleted entries are removed from the cdc_files catalog table in source.db. The freed SQLite freelist pages are recycled automatically when the receive process inserts new cdc_files rows, so no explicit VACUUM is needed.

The command runs in two modes:

Direct mode (--dir, no --host): opens source.db in the work directory directly. Suitable when the follow process is not running or the caller shares the filesystem.
Coordinator mode (--host / --port): sends an IPC message to a running pgcopydb follow (or pgcopydb clone --follow) process. The follow process owns the SQLite catalog exclusively and performs the cleanup safely without requiring shared filesystem access.

The follow process also runs stream prune automatically every 300 seconds (matching PostgreSQL’s default checkpoint_timeout), so manual invocation is optional during a live follow operation.

pgcopydb stream prune: Remove already-applied CDC files from disk to reclaim disk space
usage: pgcopydb stream prune  [ --dir ] [ --dry-run ] [ --host ] [ --port ]

  --dir            Work directory to use
  --dry-run        List files that would be removed without deleting them
  --host           Host of the running pgcopydb follow process to connect to
  --port           Port of the running pgcopydb follow process to connect to

pgcopydb stream prefetch

pgcopydb stream prefetch - Stream changes from the source database into the SQLite CDC store

The command pgcopydb stream prefetch connects to the source database using the logical replication protocol and the given replication slot.

The prefetch command receives the changes from the source database in a streaming fashion and writes them to the SQLite Change Data Capture output database (named <timeline>-<startlsn>-output.db). The transform into replay statements happens later, inside the stream catchup (apply) step.

pgcopydb stream prefetch: Stream changes from the source database into the SQLite CDC store
usage: pgcopydb stream prefetch

  --source         Postgres URI to the source database
  --dir            Work directory to use
  --restart        Allow restarting when temp files exist already
  --resume         Allow resuming operations after a failure
  --not-consistent Allow taking a new snapshot on the source database
  --slot-name           Stream changes recorded by this slot
  --endpos              LSN position where to stop receiving changes
  --max-replaydb-size   Rotate CDC files at this size (default 1GB)

pgcopydb stream catchup

pgcopydb stream catchup - Transform and apply prefetched changes to the target database

The command pgcopydb stream catchup connects to the target database, transforms the changes prefetched into the CDC output database into replay statements (stored in the CDC replay database), and applies them to the target. Progress is tracked using the Postgres replication origin.

pgcopydb stream catchup: Transform and apply prefetched changes from the SQLite CDC store to the target
usage: pgcopydb stream catchup

  --source              Postgres URI to the source database
  --target              Postgres URI to the target database
  --dir                 Work directory to use
  --restart             Allow restarting when temp files exist already
  --resume              Allow resuming operations after a failure
  --not-consistent      Allow taking a new snapshot on the source database
  --slot-name           Stream changes recorded by this slot
  --endpos              LSN position where to stop receiving changes
  --max-replaydb-size   Rotate CDC files at this size (default 1GB)
  --origin              Name of the Postgres replication origin

pgcopydb stream replay

pgcopydb stream replay - Replay changes from the source to the target database, live

The command pgcopydb stream replay connects to the source database and streams changes using the logical decoding protocol, running the receive and apply stages together as a single live pipeline that applies changes to the target database as they arrive.

pgcopydb stream replay: Replay changes from the source to the target database, live
usage: pgcopydb stream replay

  --source              Postgres URI to the source database
  --target              Postgres URI to the target database
  --dir                 Work directory to use
  --restart             Allow restarting when temp files exist already
  --resume              Allow resuming operations after a failure
  --not-consistent      Allow taking a new snapshot on the source database
  --slot-name           Stream changes recorded by this slot
  --endpos              LSN position where to stop receiving changes
  --max-replaydb-size   Rotate CDC files at this size (default 1GB)
  --origin              Name of the Postgres replication origin

This command is equivalent to running pgcopydb stream receive and pgcopydb stream apply together: the apply stage performs the inline transform from the CDC output database to the CDC replay database before applying the changes to the target.

pgcopydb stream sentinel get

pgcopydb stream sentinel get - Get the sentinel table values

pgcopydb stream sentinel get: Get the sentinel table values
usage: pgcopydb stream sentinel get

  --json           Format the output using JSON
  --startpos       Get only the startpos value
  --endpos         Get only the endpos value
  --apply          Get only the apply value
  --write-lsn      Get only the write LSN value
  --flush-lsn      Get only the flush LSN value
  --transform-lsn  Get only the tranform LSN value
  --replay-lsn     Get only the replay LSN value
  --host           Reach the follow coordinator over TCP at this host
  --port           Follow coordinator TCP port (default 5442)

pgcopydb stream sentinel set startpos

pgcopydb stream sentinel set startpos - Set the sentinel start position LSN

pgcopydb stream sentinel set startpos: Set the sentinel start position LSN
usage: pgcopydb stream sentinel set startpos <start lsn>

  --host        Reach the follow coordinator over TCP at this host
  --port        Follow coordinator TCP port (default 5442)

This is an advanced API used for unit-testing and debugging, the operation is automatically covered in normal pgcopydb operations.

Logical replication target system registers progress by assigning a current LSN to the --origin node name. When creating an origin on the target database system, it is required to provide the current LSN from the source database system, in order to properly bootstrap pgcopydb logical decoding.

pgcopydb stream sentinel set endpos

pgcopydb stream sentinel set endpos - Set the sentinel end position LSN

pgcopydb stream sentinel set endpos: Set the sentinel end position LSN
usage: pgcopydb stream sentinel set endpos [ --source ... ] [ <end lsn> | --current ]

  --source      Postgres URI to the source database
  --current     Use pg_current_wal_flush_lsn() as the endpos
  --host        Reach the follow coordinator over TCP at this host
  --port        Follow coordinator TCP port (default 5442)

Logical replication target LSN to use. Automatically stop replication and exit with normal exit status 0 when receiving reaches the specified LSN. If there’s a record with LSN exactly equal to lsn, the record will be output.

The --endpos option is not aware of transaction boundaries and may truncate output partway through a transaction. Any partially output transaction will not be consumed and will be replayed again when the slot is next read from. Individual messages are never truncated.

See also documentation for pg_recvlogical.

pgcopydb stream sentinel set apply

pgcopydb stream sentinel set apply - Set the sentinel apply mode

pgcopydb stream sentinel set apply: Set the sentinel apply mode
usage: pgcopydb stream sentinel set apply

  --host        Reach the follow coordinator over TCP at this host
  --port        Follow coordinator TCP port (default 5442)

pgcopydb stream sentinel set prefetch

pgcopydb stream sentinel set prefetch - Set the sentinel prefetch mode

pgcopydb stream sentinel set prefetch: Set the sentinel prefetch mode
usage: pgcopydb stream sentinel set prefetch

  --host        Reach the follow coordinator over TCP at this host
  --port        Follow coordinator TCP port (default 5442)

pgcopydb stream receive

pgcopydb stream receive - Stream changes from the source database

The command pgcopydb stream receive connects to the source database using the logical replication protocol and the given replication slot.

The receive command receives the changes from the source database in a streaming fashion and writes them to the SQLite Change Data Capture output database. It can also stream raw decoded messages to standard output with --to-stdout.

Note

The separate pgcopydb stream transform command has been removed. In the SQLite-based pipeline the transform from the output database to the replay database is performed inline by the apply stage (see pgcopydb stream apply).

pgcopydb stream receive: Stream changes from the source database
usage: pgcopydb stream receive

  --source              Postgres URI to the source database
  --dir                 Work directory to use
  --to-stdout           Stream logical decoding messages to stdout
  --restart             Allow restarting when temp files exist already
  --resume              Allow resuming operations after a failure
  --not-consistent      Allow taking a new snapshot on the source database
  --slot-name           Stream changes recorded by this slot
  --endpos              LSN position where to stop receiving changes
  --max-replaydb-size   Rotate CDC files at this size (default 1GB)

pgcopydb stream apply

pgcopydb stream apply - Apply changes from the CDC store to the target database

The command pgcopydb stream apply transforms the changes received into the CDC output database into replay statements (stored in the CDC replay database) and applies them to the target database. The apply process tracks progress thanks to the Postgres API for Replication Progress Tracking.

pgcopydb stream apply: Apply changes from the replayDB to the target database, or stdout
usage: pgcopydb stream apply

  --target         Postgres URI to the target database
                   Use '-' to emit SQL to stdout without connecting
  --dir            Work directory to use
  --restart        Allow restarting when temp files exist already
  --resume         Allow resuming operations after a failure
  --not-consistent Allow taking a new snapshot on the source database
  --origin         Name of the Postgres replication origin

Use --target - to emit the SQL to standard output instead of connecting to a target database (used by the unit tests).

Options

The following options are available to pgcopydb stream sub-commands:

--source

Connection string to the source Postgres instance. See the Postgres documentation for connection strings for the details. In short both the quoted form "host=... dbname=..." and the URI form postgres://user@host:5432/dbname are supported.

--target

Connection string to the target Postgres instance.

--dir

During its normal operations pgcopydb creates a lot of temporary files to track sub-processes progress. Temporary files are created in the directory specified by this option, or defaults to ${TMPDIR}/pgcopydb when the environment variable is set, or otherwise to /tmp/pgcopydb.

Change Data Capture files are stored in the cdc sub-directory of the --dir option when provided, otherwise see XDG_DATA_HOME environment variable below.

--restart

When running the pgcopydb command again, if the work directory already contains information from a previous run, then the command refuses to proceed and delete information that might be used for diagnostics and forensics.

In that case, the --restart option can be used to allow pgcopydb to delete traces from a previous run.

--resume

When the pgcopydb command was terminated before completion, either by an interrupt signal (such as C-c or SIGTERM) or because it crashed, it is possible to resume the database migration.

To be able to resume a streaming operation in a consistent way, all that’s required is re-using the same replication slot as in previous run(s).

--plugin

Logical decoding output plugin to use. The default is pgoutput which is built into PostgreSQL core (since Postgres 10) and does not require any extension installation on the source server. Using pgoutput avoids the need for superuser to install an extension and works with any PostgreSQL publication-based setup.

It is also possible to use test_decoding (ships with Postgres core but does not support all data types as well as pgoutput) or wal2json (an external extension). Both are still supported for backwards compatibility.

--wal2json-numeric-as-string

When using the wal2json output plugin, it is possible to use the --wal2json-numeric-as-string option to instruct wal2json to output numeric values as strings and thus prevent some precision loss.

You need to have a wal2json plugin version on source database that supports --numeric-data-types-as-string option to use this option.

See also the documentation for wal2json regarding this option for details.

--replay-no-op-updates

When set, pgcopydb replays UPDATE statements even when no column values changed between the old and new tuple. By default, such no-op UPDATEs are skipped as an optimisation (the default is appropriate for migration workloads where target triggers do not need to fire for every UPDATE).

Enable this option when the target database has UPDATE triggers that must fire for every UPDATE, including those that do not change any column values. PostgreSQL’s own logical replication always replays all UPDATEs; this option restores that behaviour.

No-op UPDATEs arise most often with REPLICA IDENTITY FULL tables, where both the old and new full row are available for comparison.

--slot-name

Logical decoding slot name to use.

--origin

Logical replication target system needs to track the transactions that have been applied already, so that in case we get disconnected or need to resume operations we can skip already replayed transaction.

Postgres uses a notion of an origin node name as documented in Replication Progress Tracking. This option allows to pick your own node name and defaults to “pgcopydb”. Picking a different name is useful in some advanced scenarios like migrating several sources in the same target, where each source should have their own unique origin node name.

--host

Optional follow-coordinator TCP endpoint host.

On clone --follow / follow / stream replay (the server side), when set the follow process starts a TCP coordinator listening on --host:--port (use 0.0.0.0 to accept connections from other hosts/containers). On stream sentinel get/set (the client side), when set the command talks to that coordinator over TCP instead of opening the SQLite catalog directly — so it works without sharing the catalog files. See Sentinel Control.

On the server side this may also be provided via the PGCOPYDB_HOST environment variable.

--port

TCP port for the follow-coordinator endpoint (see --host). Defaults to 5442. On the server side this may also be provided via the PGCOPYDB_PORT environment variable.

--verbose

Increase current verbosity. The default level of verbosity is INFO. In ascending order pgcopydb knows about the following verbosity levels: FATAL, ERROR, WARN, INFO, NOTICE, DEBUG, TRACE.

--debug

Set current verbosity to DEBUG level.

--trace

Set current verbosity to TRACE level.

--quiet

Set current verbosity to ERROR level.

--dry-run

Used with pgcopydb stream prune. Lists the CDC file pairs that would be deleted without actually removing them from disk or from the catalog. Useful to preview how much disk space would be reclaimed before committing to the deletion.

--max-replaydb-size

Maximum on-disk size of a single SQLite output.db file before it is rotated to a new file pair. Accepts human-readable units: 1kB, 500MB, 1GB, etc. Defaults to 1GB.

A transaction larger than the threshold is always kept intact in one file — rotation only happens at a transaction commit boundary. This means a single oversized transaction causes the file to exceed the threshold, which is expected and safe.

Environment

PGCOPYDB_SOURCE_PGURI

Connection string to the source Postgres instance. When --source is ommitted from the command line, then this environment variable is used.

PGCOPYDB_TARGET_PGURI

Connection string to the target Postgres instance. When --target is ommitted from the command line, then this environment variable is used.

PGCOPYDB_OUTPUT_PLUGIN

Logical decoding output plugin to use. When --plugin is omitted from the command line, then this environment variable is used.

PGCOPYDB_WAL2JSON_NUMERIC_AS_STRING

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb uses the wal2json option --numeric-data-types-as-string when using the wal2json output plugin.

When --wal2json-numeric-as-string is ommitted from the command line then this environment variable is used.

TMPDIR

The pgcopydb command creates all its work files and directories in ${TMPDIR}/pgcopydb, and defaults to /tmp/pgcopydb.

XDG_DATA_HOME

The pgcopydb command creates Change Data Capture files in the standard place XDG_DATA_HOME, which defaults to ~/.local/share. See the XDG Base Directory Specification.

Examples

Note

The captured output below predates the SQLite-based Change Data Capture store and is kept for illustration of the logical decoding messages. The current implementation stores received changes in the CDC output SQLite database and the transformed statements in the CDC replay SQLite database, rather than in .json and .sql files on disk.

As an example here is the output generated from running the cdc test case, where a replication slot is created before the initial copy of the data, and then the following INSERT statement is executed:

 begin;

 with r as
  (
    insert into rental(rental_date, inventory_id, customer_id, staff_id, last_update)
         select '2022-06-01', 371, 291, 1, '2022-06-01'
      returning rental_id, customer_id, staff_id
  )
  insert into payment(customer_id, staff_id, rental_id, amount, payment_date)
       select customer_id, staff_id, rental_id, 5.99, '2020-06-01'
         from r;

 commit;

The command then looks like the following, where the --endpos has been extracted by calling the pg_current_wal_lsn() SQL function:

$ pgcopydb stream receive --slot-name test_slot --restart --endpos 0/236D668 -vv
01:57 157 INFO  Running pgcopydb version 0.7 from "/usr/local/bin/pgcopydb"
01:57 157 DEBUG copydb.c:406 Change Data Capture data is managed at "/var/lib/postgres/.local/share/pgcopydb"
01:57 157 INFO  copydb.c:73 Using work dir "/tmp/pgcopydb"
01:57 157 DEBUG pidfile.c:143 Failed to signal pid 34: No such process
01:57 157 DEBUG pidfile.c:146 Found a stale pidfile at "/tmp/pgcopydb/pgcopydb.pid"
01:57 157 INFO  pidfile.c:147 Removing the stale pid file "/tmp/pgcopydb/pgcopydb.pid"
01:57 157 INFO  copydb.c:254 Work directory "/tmp/pgcopydb" already exists
01:57 157 INFO  copydb.c:258 A previous run has run through completion
01:57 157 INFO  copydb.c:151 Removing directory "/tmp/pgcopydb"
01:57 157 DEBUG copydb.c:445 rm -rf "/tmp/pgcopydb" && mkdir -p "/tmp/pgcopydb"
01:57 157 DEBUG copydb.c:445 rm -rf "/tmp/pgcopydb/schema" && mkdir -p "/tmp/pgcopydb/schema"
01:57 157 DEBUG copydb.c:445 rm -rf "/tmp/pgcopydb/run" && mkdir -p "/tmp/pgcopydb/run"
01:57 157 DEBUG copydb.c:445 rm -rf "/tmp/pgcopydb/run/tables" && mkdir -p "/tmp/pgcopydb/run/tables"
01:57 157 DEBUG copydb.c:445 rm -rf "/tmp/pgcopydb/run/indexes" && mkdir -p "/tmp/pgcopydb/run/indexes"
01:57 157 DEBUG copydb.c:445 rm -rf "/var/lib/postgres/.local/share/pgcopydb" && mkdir -p "/var/lib/postgres/.local/share/pgcopydb"
01:57 157 DEBUG pgsql.c:2476 starting log streaming at 0/0 (slot test_slot)
01:57 157 DEBUG pgsql.c:485 Connecting to [source] "postgres://postgres@source:/postgres?password=****&replication=database"
01:57 157 DEBUG pgsql.c:2009 IDENTIFY_SYSTEM: timeline 1, xlogpos 0/236D668, systemid 7104302452422938663
01:57 157 DEBUG pgsql.c:3188 RetrieveWalSegSize: 16777216
01:57 157 DEBUG pgsql.c:2547 streaming initiated
01:57 157 INFO  stream.c:237 Now streaming changes to "/var/lib/postgres/.local/share/pgcopydb/000000010000000000000002.json"
01:57 157 DEBUG stream.c:341 Received action B for XID 488 in LSN 0/236D638
01:57 157 DEBUG stream.c:341 Received action I for XID 488 in LSN 0/236D178
01:57 157 DEBUG stream.c:341 Received action I for XID 488 in LSN 0/236D308
01:57 157 DEBUG stream.c:341 Received action C for XID 488 in LSN 0/236D638
01:57 157 DEBUG pgsql.c:2867 pgsql_stream_logical: endpos reached at 0/236D668
01:57 157 DEBUG stream.c:382 Flushed up to 0/236D668 in file "/var/lib/postgres/.local/share/pgcopydb/000000010000000000000002.json"
01:57 157 INFO  pgsql.c:3030 Report write_lsn 0/236D668, flush_lsn 0/236D668
01:57 157 DEBUG pgsql.c:3107 end position 0/236D668 reached by WAL record at 0/236D668
01:57 157 DEBUG pgsql.c:408 Disconnecting from [source] "postgres://postgres@source:/postgres?password=****&replication=database"
01:57 157 DEBUG stream.c:414 streamClose: closing file "/var/lib/postgres/.local/share/pgcopydb/000000010000000000000002.json"
01:57 157 INFO  stream.c:171 Streaming is now finished after processing 4 messages

The JSON file then contains the following content, from the wal2json logical replication plugin. Note that you’re seeing diffent LSNs here because each run produces different ones, and the captures have not all been made from the same run.

$ cat /var/lib/postgres/.local/share/pgcopydb/000000010000000000000002.json
{"action":"B","xid":489,"timestamp":"2022-06-27 13:24:31.460822+00","lsn":"0/236F5A8","nextlsn":"0/236F5D8"}
{"action":"I","xid":489,"timestamp":"2022-06-27 13:24:31.460822+00","lsn":"0/236F0E8","schema":"public","table":"rental","columns":[{"name":"rental_id","type":"integer","value":16050},{"name":"rental_date","type":"timestamp with time zone","value":"2022-06-01 00:00:00+00"},{"name":"inventory_id","type":"integer","value":371},{"name":"customer_id","type":"integer","value":291},{"name":"return_date","type":"timestamp with time zone","value":null},{"name":"staff_id","type":"integer","value":1},{"name":"last_update","type":"timestamp with time zone","value":"2022-06-01 00:00:00+00"}]}
{"action":"I","xid":489,"timestamp":"2022-06-27 13:24:31.460822+00","lsn":"0/236F278","schema":"public","table":"payment_p2020_06","columns":[{"name":"payment_id","type":"integer","value":32099},{"name":"customer_id","type":"integer","value":291},{"name":"staff_id","type":"integer","value":1},{"name":"rental_id","type":"integer","value":16050},{"name":"amount","type":"numeric(5,2)","value":5.99},{"name":"payment_date","type":"timestamp with time zone","value":"2020-06-01 00:00:00+00"}]}
{"action":"C","xid":489,"timestamp":"2022-06-27 13:24:31.460822+00","lsn":"0/236F5A8","nextlsn":"0/236F5D8"}

In the SQLite-based pipeline the equivalent transform is performed inline by the pgcopydb stream catchup (apply) step, which reads the changes from the CDC output database, writes parameterised statements to the CDC replay database, and applies them to the target. The resulting replay statements are equivalent to:

BEGIN; -- {"xid":489,"lsn":"0/236F5A8"}
INSERT INTO "public"."rental" (rental_id, rental_date, inventory_id, customer_id, return_date, staff_id, last_update) VALUES (16050, '2022-06-01 00:00:00+00', 371, 291, NULL, 1, '2022-06-01 00:00:00+00');
INSERT INTO "public"."payment_p2020_06" (payment_id, customer_id, staff_id, rental_id, amount, payment_date) VALUES (32099, 291, 1, 16050, 5.99, '2020-06-01 00:00:00+00');
COMMIT; -- {"xid": 489,"lsn":"0/236F5A8"}