When failover isn’t safe: Building high-availability PostgreSQL on Kubernetes
Datadog | The Monitor blog

When failover isn’t safe: Building high-availability PostgreSQL on Kubernetes


Summary

During a simulated "gameday" failure, Datadog discovered that network latency caused PostgreSQL replication lag to exceed safety thresholds, preventing the system from safely promoting standby nodes. To resolve this, they rearchitected their Kubernetes-based clusters to use synchronous replication for failover candidates, ensuring that automated failover can occur without risking data loss.
Read the Original Article

This article originally appeared on Datadog | The Monitor blog.

Read Full Article on Original Site

Popular from Datadog | The Monitor blog

1
Datadog LLM Observability natively supports OpenTelemetry GenAI Semantic Conventions
2
Introducing Bits AI Dev Agent for Code Security
Introducing Bits AI Dev Agent for Code Security

Datadog | The Monitor blog Mar 26, 2026 77 views

3
Monitoring MongoDB performance metrics (MMAP)
Monitoring MongoDB performance metrics (MMAP)

Datadog | The Monitor blog May 25, 2016 70 views

4
Understand session replays faster with AI summaries and smart chapters
Understand session replays faster with AI summaries and smart chapters

Datadog | The Monitor blog Apr 2, 2026 69 views