Sentinel Control — Design Considerations
This document describes how the pgcopydb stream sentinel commands
communicate with a running pgcopydb clone --follow / pgcopydb follow
process, and why pgcopydb offers an optional TCP transport for that control
channel in addition to the default SQLite catalog.
This is the external control channel. The follow pipeline also has an
internal coordination signal between its receive and apply workers,
documented separately under The receive→apply lifecycle pipe.
—
The sentinel and how it is read/written
The sentinel is a single row stored in the source SQLite catalog
(schema/source.db). It carries the user-controllable streaming knobs —
startpos, endpos, the apply flag — and the pipeline progress LSNs
(write_lsn, flush_lsn, replay_lsn). A running follow process polls
the sentinel to learn when to enable apply and where to stop; an operator (or a
test harness) updates it with pgcopydb stream sentinel set ….
Two transports are available:
SQLite (default). The
stream sentinelcommand opens the source catalog directly and reads/writes the sentinel row. This requires the command to run where the catalog files live (same host/container, or a shared filesystem / docker volume).TCP (opt-in, ``–host``/``–port``). The command connects to the follow process’ coordinator over TCP; the follow side performs the SQLite update on its behalf. The client never opens the catalog files.
—
Why a TCP transport: the SQLite write-locking constraint
pgcopydb serialises concurrent SQLite writers with a System V semaphore created
via semget(IPC_PRIVATE, …) (see lock_utils.c). IPC_PRIVATE produces
a semaphore that is identified only by the returned semId and is therefore
shareable only by processes that inherit it across ``fork()``. An
independently launched process — for example a separately invoked
pgcopydb stream sentinel set endpos — calls semget(IPC_PRIVATE, …) again
and gets a different semaphore.
Consequences:
The
followsupervisor and its forkedreceive/applychildren all share the same write semaphore and coordinate cleanly.An independently invoked
stream sentinelCLI does not share that semaphore. With the SQLite transport it relies solely on SQLite’s own WAL file locking, which requires the catalog to be on a shared filesystem and is prone toSQLITE_BUSYunder contention. Sharing the catalog across containers (a named docker volume) is the usual workaround.
The TCP transport removes that requirement: the CLI sends a request over the network and the follow process applies the sentinel change using the same shared semaphore as the rest of the pipeline. No catalog files are shared.
—
Where the coordinator runs: in the follow supervisor
The coordinator must live inside the follow process group to participate in the
IPC_PRIVATE semaphore. It runs in-process, inside the follow supervisor
(the process that already holds the source catalog open for
follow_reached_endpos and that forks/monitors receive and apply):
it reuses the supervisor’s already-open
sourceDBhandle and itssemId— adding no new SQLite connection or lock participant;it serves requests with short, non-blocking timeouts (a 100 ms
acceptfolded into the supervisor’s monitoring loop), so the endpos / child-exit detection stays responsive.
A dedicated coordinator subprocess would also work (it would inherit the
semId across fork) but would open a second SQLite connection on the same
file for no real benefit; in-process is preferred.
The coordinator is optional: it is started only when a listen endpoint is
configured, via --host / --port on clone --follow / follow /
stream replay, or via the PGCOPYDB_HOST / PGCOPYDB_PORT environment
variables (a convenience for docker-compose).
—
Wire protocol
A minimal request/response protocol (ld_ipc.h). Each message is
[version:1][type:1][payload_len:2][payload:N].
Message |
Direction |
Payload |
|---|---|---|
|
both |
— (liveness check; used by clients to wait for the coordinator) |
|
CLI → coord |
|
|
CLI → coord |
|
|
CLI → coord |
|
|
CLI → coord |
— ; reply |
|
coord → CLI |
request accepted, or an error string |
The coordinator answers every request by reading/writing SQLite
(sentinel_get / sentinel_update_startpos / …_endpos /
…_apply) — it does not trust an in-memory copy, because the live
write/flush/replay_lsn values are maintained by the receive / apply
children and are stale in the supervisor.
—
CLI client behaviour
pgcopydb stream sentinel get and set startpos|endpos|apply|prefetch
choose the transport explicitly:
with
--host(and optional--port, default5442): connect to the coordinator over TCP; the catalog is not opened. If the coordinator is unreachable the command fails (no silent fallback).without
--host: open the source SQLite catalog directly (the default).
pgcopydb stream sentinel setup always uses SQLite — it bootstraps the
sentinel table itself.
Hostnames are resolved with getaddrinfo, so --host accepts a
docker-compose service name (e.g. --host test) as well as a numeric address.
—