Cross-Cloud Database Resilience: PostgreSQL Streaming Replication with Dual-Cert SSL

Stop running your database naked! A guide to setting up a secure PostgreSQL standby across public networks using Docker, mTLS, and custom firewall strategies.

As an indie developer, “Single Point of Failure” is the stuff of nightmares. While Oracle Cloud offers a generous Free Tier, relying on a single provider for your production data is risky. To ensure data sovereignty and high availability, I recently implemented a cross-cloud PostgreSQL streaming replication setup.

This isn’t just a simple cron-job for backups—it’s a real-time “Shadow Database” architecture.

1. The Need for Real-Time Replication

Relying solely on daily snapshots had two major drawbacks:

  • RPO (Recovery Point Objective): A crash at 11 PM meant losing a whole day of data if the last backup was at 2 AM.
  • RTO (Recovery Time Objective): Restoring Giga-bytes of data from S3 to a new container during an outage takes far too long.

My Goal: Maintain a read-only replica on a separate VPS, ready for a minute-level failover.

2. The Solution: mTLS + Streaming Replication

Since data travels over the public internet, security was non-negotiable.

  • Encryption: TLS is mandatory to prevent sniffing.
  • Authentication: Instead of passwords, I used Mutual TLS (mTLS). Only the server holding a client certificate signed by my private CA can connect to the primary.
  • Isolation: Custom ports (e.g., 8765) combined with strict IP whitelisting via iptables.

3. Hard-Won Lessons (The Pitfalls)

Pitfall #1: Docker vs. System Firewall

Docker often bypasses standard ufw rules by manipulating iptables directly.

  • Fix: Explicitly insert rules into the INPUT chain for the specific standby IP and persist them using iptables-persistent.

Pitfall #2: The pg_basebackup Path Ghost

When running pg_basebackup in a temporary container, it hardcodes the ephemeral paths (like /temp_certs) into postgresql.auto.conf.

  • Fix: Manually edit the config post-clone to point to the permanent volume paths (e.g., /var/lib/postgresql/data/certs/).

Pitfall #3: Permissions (The 999 Rule)

PostgreSQL is notoriously picky about certificate permissions.

  • Error: could not open file "server.key": Permission denied.
  • Fix: Run chown -R 999:999 on the host data directory. Even if it looks like root on the host, the container process must own it as UID 999.

4. Key Configurations

Primary pg_hba.conf

# Only allow replication_user via certificate-based auth
hostssl replication replication_user <Standby_IP>/32 cert

Standby postgresql.auto.conf

primary_conninfo = 'user=replication_user host=<Primary_IP> port=8765 sslmode=verify-ca sslcert=/path/to/client.crt sslkey=/path/to/client.key sslrootcert=/path/to/ca.crt'

5. Maintenance: Why I Switched to Bind Mounts

I moved away from Docker Named Volumes to Bind Mounts.

  • Reason: When migrating the standby to a new server, a simple tar of /opt/pgsql/data is transparent. No more hunting for obscure Docker volume hashes. It makes migration 200% faster.

Conclusion

High Availability isn’t about showing off; it’s about sleeping better at night. Currently, this setup maintains near-zero lag under Uptime Kuma monitoring.