Monitoring checklist (dashboard 1):
- TPS and (optional but also desired) QPS
- Latency (query duration) — at least average. Better: histogram, percentiles
- Connections (sessions) — stacked graph of session counts by state (first of all: active and idle-in-transaction; also interesting: idle, others) and how far the sum is from max_connection (+pool size for PgBouncer).
- Longest transactions (max transaction age or top-n transactions by age), excluding autovacuum activity
- Commits vs rollbacks — how many transactions are rolled back
- Transactions left till transaction ID wraparound
- Replication lags / bytes in replication slot / unused replication slots
- Count of WALs waiting to be archived (archiving lag)
- WAL generation rates
- Locks and deadlocks
- Basic query analysis graph (top-n by total_time or by mean_time?)
- Basic wait event analysis (a.k.a. “active session analysis” or “performance insights”)
And links to a few things we mentioned:
------------------------
What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov and @michristofides
If you would like to share this episode, here's a good link (and thank you!)
Postgres FM is brought to you by:
With special thanks to: