Summary
This article argues that perfectly detecting anomalies in complex systems is impossible, but practical anomaly detection is achievable through custom rules built with tools like Prometheus. It demonstrates building a Prometheus query to identify outlier server latency, progressively refining it to reduce false positives by adding conditions based on average latency and traffic volume. Ultimately, the author advocates for using these alerts to trigger automated remediation actions, freeing up engineers to focus on more impactful issues.