Monitoring checklist

Monitoring checklist

Nikolay takes us through a checklist of important things to monitor, while Michael tries to keep up.
Monitoring checklist (dashboard 1):
  1. TPS and (optional but also desired) QPS
  2. Latency (query duration) — at least average. Better: histogram, percentiles
  3. Connections (sessions) — stacked graph of session counts by state (first of all: active and idle-in-transaction; also interesting: idle, others) and how far the sum is from max_connection (+pool size for PgBouncer).
  4. Longest transactions (max transaction age or top-n transactions by age), excluding autovacuum activity
  5. Commits vs rollbacks — how many transactions are rolled back
  6. Transactions left till transaction ID wraparound
  7. Replication lags / bytes in replication slot / unused replication slots
  8. Count of WALs waiting to be archived (archiving lag)
  9. WAL generation rates
  10. Locks and deadlocks
  11. Basic query analysis graph (top-n by total_time or by mean_time?)
  12. Basic wait event analysis (a.k.a. “active session analysis” or “performance insights”)
And links to a few things we mentioned: 


What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov and @michristofides

If you would like to share this episode, here's a good link (and thank you!)

Postgres FM is brought to you by:

With special thanks to:

Some kind things our listeners have said