As an indie developer, “Single Point of Failure” is the stuff of nightmares. While Oracle Cloud offers a generous Free Tier, relying on a single provider for your production data is risky. To ensure data sovereignty and high availability, I recently implemented a cross-cloud PostgreSQL streaming replication setup.
This isn’t just a simple cron-job for backups—it’s a real-time “Shadow Database” architecture.
1. The Need for Real-Time Replication
Relying solely on daily snapshots had two major drawbacks:
- RPO (Recovery Point Objective): A crash at 11 PM meant losing a whole day of data if the last backup was at 2 AM.
- RTO (Recovery Time Objective): Restoring Giga-bytes of data from S3 to a new container during an outage takes far too long.
My Goal: Maintain a read-only replica on a separate VPS, ready for a minute-level failover.
2. The Solution: mTLS + Streaming Replication
Since data travels over the public internet, security was non-negotiable.
- Encryption: TLS is mandatory to prevent sniffing.
- Authentication: Instead of passwords, I used Mutual TLS (mTLS). Only the server holding a client certificate signed by my private CA can connect to the primary.
- Isolation: Custom ports (e.g.,
8765) combined with strict IP whitelisting viaiptables.
3. Hard-Won Lessons (The Pitfalls)
Pitfall #1: Docker vs. System Firewall
Docker often bypasses standard ufw rules by manipulating iptables directly.
- Fix: Explicitly insert rules into the
INPUTchain for the specific standby IP and persist them usingiptables-persistent.
Pitfall #2: The pg_basebackup Path Ghost
When running pg_basebackup in a temporary container, it hardcodes the ephemeral paths (like /temp_certs) into postgresql.auto.conf.
- Fix: Manually edit the config post-clone to point to the permanent volume paths (e.g.,
/var/lib/postgresql/data/certs/).
Pitfall #3: Permissions (The 999 Rule)
PostgreSQL is notoriously picky about certificate permissions.
- Error:
could not open file "server.key": Permission denied. - Fix: Run
chown -R 999:999on the host data directory. Even if it looks likerooton the host, the container process must own it as UID999.
4. Key Configurations
Primary pg_hba.conf
# Only allow replication_user via certificate-based auth
hostssl replication replication_user <Standby_IP>/32 cert
Standby postgresql.auto.conf
primary_conninfo = 'user=replication_user host=<Primary_IP> port=8765 sslmode=verify-ca sslcert=/path/to/client.crt sslkey=/path/to/client.key sslrootcert=/path/to/ca.crt'
5. Maintenance: Why I Switched to Bind Mounts
I moved away from Docker Named Volumes to Bind Mounts.
- Reason: When migrating the standby to a new server, a simple
tarof/opt/pgsql/datais transparent. No more hunting for obscure Docker volume hashes. It makes migration 200% faster.
Conclusion
High Availability isn’t about showing off; it’s about sleeping better at night. Currently, this setup maintains near-zero lag under Uptime Kuma monitoring.