A look into database replication at scale.. but cheapish. With Spilo

A look into database replication at scale.. but cheapish. With Spilo

Craig Nielsen

Craig Nielsen

November 20, 2025

An overview on how, and why the postgres database cluster works, why to choose it, and how it can save money and lives.

How Spilo/Patroni Configures PostgreSQL and WAL-G

Overview

Spilo is a Docker image that packages PostgreSQL + Patroni + WAL-G/WAL-E together. When a Spilo container starts, it goes through a specific bootstrap process that reads environment variables and generates configuration files.

The Configuration Flow

Container Start
     ↓
Environment Variables (from ConfigMap)
     ↓
Spilo Bootstrap Script (/launch.sh)
     ↓
├── Generates Patroni Config (/run/etc/patroni.yml)
│   └── Contains PostgreSQL settings
│   └── Contains WAL-G settings
│   └── Contains backup schedules
│
├── Starts Patroni
│   └── Patroni manages PostgreSQL
│       └── Writes postgresql.conf
│       └── Sets archive_command
│       └── Sets restore_command
│
└── Sets up Cron Jobs (if backups enabled)
    └── Schedules WAL-G backups

Step-by-Step: What Actually Happens

1. Container Starts - Bootstrap Script Runs

The Spilo container runs

/launch.sh
(or similar) which is the main bootstrap script.

Location: Usually

/scripts/configure_spilo.py
or
/launch.sh

What it does:

  • Reads ALL environment variables
  • Validates required settings
  • Makes decisions based on these env vars

2. Generates Patroni Configuration

The bootstrap script creates

/run/etc/patroni.yml
based on environment variables.

Example of what gets generated:

# /run/etc/patroni.yml
scope: acid-minimal-cluster
name: pod-name

postgresql:
  listen: 0.0.0.0:5432
  connect_address: pod-ip:5432
  data_dir: /home/postgres/pgdata/pgroot/data

  # These come from environment variables!
  parameters:
    archive_mode: "on"  # Set if USE_WALG_BACKUP=true
    archive_timeout: 60s
    wal_level: replica
    # The critical one - archive_command is generated based on env vars:
    archive_command: "envdir /run/etc/wal-e.d/env wal-g wal-push %p"

  # Restore command for recovery
  recovery_conf:
    restore_command: "envdir /run/etc/wal-e.d/env wal-g wal-fetch %f %p"

# Backup/WAL-G configuration
postgresql:
  create_replica_methods:
    - basebackup_fast_xlog
  basebackup_fast_xlog:
    command: /scripts/basebackup.sh
    retries: 2

3. Generates WAL-G Environment Files

Spilo creates a directory with WAL-G configuration:

/run/etc/wal-e.d/env/

What gets created:

/run/etc/wal-e.d/env/
├── AWS_ACCESS_KEY_ID          # Contains: minio
├── AWS_SECRET_ACCESS_KEY      # Contains: minio123
├── AWS_ENDPOINT               # Contains: http://mys3-hl.mys3:9000
├── AWS_REGION                 # Contains: de01
├── AWS_S3_FORCE_PATH_STYLE    # Contains: true
├── WALG_DISABLE_S3_SSE        # Contains: true
├── WALG_S3_PREFIX             # Contains: s3://postgresql/spilo/cluster-name/scope
└── WALE_S3_PREFIX             # Contains: s3://postgresql/spilo/cluster-name/scope (for older WAL-E)

This is CRITICAL! The

WALG_S3_PREFIX
or
WAL_S3_BUCKET
determines where backups go.

4. Constructs the Archive Command

Based on the env vars, Spilo generates the PostgreSQL

archive_command
:

If USE_WALG_BACKUP=true:

archive_command = 'envdir /run/etc/wal-e.d/env wal-g wal-push %p'

What this does:

  • envdir /run/etc/wal-e.d/env
    - Loads all the env vars from that directory
  • wal-g wal-push %p
    - Pushes WAL file to S3/MinIO
  • %p
    - PostgreSQL replaces this with the actual WAL file path

If USE_WALG_BACKUP=false:

archive_command = '/bin/true'  # Does nothing, just returns success

5. Sets Up Cron Jobs for Full Backups

If

USE_WALG_BACKUP=true
and
BACKUP_SCHEDULE
is set, Spilo creates a cron job:

Example cron entry:

*/10 * * * * envdir /run/etc/wal-e.d/env wal-g backup-push /home/postgres/pgdata/pgroot/data

Breakdown:

  • */10 * * * *
    - Every 10 minutes (from your BACKUP_SCHEDULE)
  • envdir /run/etc/wal-e.d/env
    - Load environment
  • wal-g backup-push
    - Create a full backup
  • /home/postgres/pgdata/pgroot/data
    - PostgreSQL data directory

6. Starts Patroni

Patroni then:

  1. Reads
    /run/etc/patroni.yml
  2. Starts PostgreSQL with the generated configuration
  3. Continuously manages PostgreSQL (HA, failover, etc.)
  4. Updates configuration in response to changes

7. PostgreSQL Runs with Generated Config

PostgreSQL is now running with:

  • archive_mode = on
    (if backups enabled)
  • archive_command = 'envdir /run/etc/wal-e.d/env wal-g wal-push %p'
  • Every time a WAL file is complete, PostgreSQL calls this command
  • The command pushes the WAL to MinIO at the configured endpoint

What Spilo Actually Does:

  1. ✅ Reads environment variables at startup
  2. ✅ Generates configuration files (
    patroni.yml
    , WAL-G env files)
  3. ✅ Sets PostgreSQL's
    archive_command
    based on
    USE_WALG_BACKUP
  4. ✅ Creates cron jobs for full backups if enabled
  5. ✅ Starts Patroni, which starts PostgreSQL
  6. ❌ Does NOT dynamically reload ConfigMap changes (requires pod restart)

WAL File Compression Formats

Spilo/WAL-G can use different compression formats:

  • .lzo - LZO (fast, what you're using)
  • .lz4 - LZ4 (very fast)
  • .zst - Zstandard (better compression)
  • .br - Brotli (high compression)

I'm currently using LZO compression, which is a good balance of speed and compression ratio.