Postgres Monitoring 101: Key metrics & tools

What is Postgres?

Postgres, short for PostgreSQL, is an advanced open-source relational database that has been available since 1986. It conforms to most of the features required by the SQL standard, which ensures compatibility and ease of migration from other SQL-based databases.

Architecture

Postgres follows a client/server architecture. The server process is responsible for opening and maintaining client connections, managing the database files, executing queries, and reading and writing data to the disk.

The client is the application that interacts with the server to perform operations on the database. It can be a command-line utility, a web application, or any other tool capable of opening a TCP connection with the database server.

A Postgres server can open multiple concurrent connections with its clients, and to achieve this,the server spawns an independent process for each new connection. This allows the server to handle several concurrent connections without impacting performance.

Features

Postgres offers a wide range of features that make it a great choice for businesses of all sizes and industries. Some key highlights include:

Postgres supports a vast array of data types, including hstore, bigint, bit varying, time, money, and serial.
It offers native support for JSON and JSONB, which means it can store schema-less documents.
It allows developers to choose different algorithms for indexes, including B-trees, GIN, Hash, and GiST. This flexibility of choice is crucial to building multi-purpose application infrastructures.
It comes with built-in support for replication, which makes it easy to achieve high availability and fault-tolerance.
It also has a feature known as table inheritance, which allows developers to create partitioned tables for better data organization and performance.

Use cases

Owing to its versatility and flexibility, Postgres can be used as the primary database for a wide range of applications, including:

Web applications: Postgres acts as a reliable backend database for web applications, handling data storage, retrieval, and transactions.
E-commerce applications: Postgres supports complex product catalogs, inventory management, and real-time order processing in e-commerce systems.
Data warehousing: Since Postgres can handle large data sets and complex queries, it’s a viable option for data warehousing use cases.
Search engines: Full text search and built-in support for different data types allow Postgres to be used to build performant search engines.
Real-time applications: Postgres provides a feature called LISTEN/NOTIFY that allows a server to send real-time notifications to all listening clients. This feature can be used to support real-time applications such as chat apps.

Why is it important to monitor Postgres?

PostgreSQL is a powerful and resilient database, but it’s necessary to monitor it to ensure it performs optimally. Monitoring PostgreSQL can help you to:

Ensure high performance

As the primary database for applications, Postgres acts as the backbone of an infrastructure. Data-driven applications rely on Postgres for timely execution of read, write, and update queries. Any latency in Postgres can create a detrimental ripple effect in performance across the entire system. For example, let's say application A executes a query before sending a response to application B. If the query times out, the response to B may get delayed, leading to a chain reaction of delays and disruptions.

Monitoring performance metrics such as query execution time, slow-running queries, and memory utilization allows administrators to gauge performance in real time and take any corrective action if necessary.

Remove any bottlenecks

Even the most well-built databases like Postgres can encounter bottlenecks due to factors, including suboptimal query execution plans, missing indexes, or insufficient resources. For example, if a slow-running query locks multiple tables for writes, it can create a system-wide bottleneck that prevents applications from writing data to the database.

Monitoring helps to pinpoint bottlenecks and identify improvement avenues. For example, a monitoring tool may issue an alert if the memory utilization spikes. Looking at the dashboard, the administrator may find that an unknown client executes too many concurrent requests. Force-terminating the client connection can reduce memory usage significantly and restore optimal performance.

Achieve high availability

Unplanned downtime can be costly and disruptive. Your customers may be unable to access your applications, or your valuable transactional data may not get logged. For example, if a database server goes down before an ongoing transaction gets committed, the transaction data may be lost forever.

Regular monitoring enables administrators to quickly detect and contextualize issues and proactively address them to prevent system failure and minimize downtime. This approach ensures high availability and a consistent user experience.

Optimize configurations

Postgres provides a wide range of configuration options, and the optimal settings vary based on specific infrastructure requirements. For example, the memory needs for Postgres in a small-scale web application would differ significantly from that of a carrier-grade enterprise billing system.

Monitoring the database's performance under different workloads allows you to fine-tune configuration parameters, maximizing resource utilization and throughput.

Avoid security incidents

Postgres, like any other database, is a security-critical resource. If a cybercriminal gains unauthorized access to a Postgres server, they can steal confidential data or cause downtime. For example, they may leverage privilege escalation to truncate critical database tables.

Regular monitoring helps detect suspicious activities that might indicate security breaches or unauthorized access attempts. For example, if someone tries to drop a database table on a production instance, the monitoring tool can issue a critical alert to the administrators who can then investigate and mitigate the threat.

Key metrics to monitor Postgres performance

The most important performance monitoring metrics for Postgres can be divided into the following broad categories:

Database-wide metrics

Database-wide metrics provide an overall view of the database’s health and performance. Let’s look at a few examples:

Metric	Description
Current state	The current state of this instance. For example, active, idle, or disabled.
Committed transactions	The total number of committed transactions in this database.
Rolled back transactions	The total number of transactions in this database that were rolled back.
Read blocks	The total number of disk blocks that this database has read.
Hit blocks	The total number of times a disk block was found in the buffer cache and therefore didn’t require a disk read.
Total inserted rows	The total number of rows that have been inserted into this database via queries.
Total deleted rows	The total number of rows that have been deleted from this database via queries.
Total updated rows	The total number of rows that have been updated in this database via queries.
Total fetched rows	The total number of rows that were retrieved via index scans from this database. Track this metric’s value to ensure that your indexes are being properly used.
Temporary file count	The total number of temporary files that were created during query execution on this database.
Total deadlocks	A cumulative count of deadlocks that have occurred in this database. Strive to maintain a zero value for this metric.
Temporary file bytes	The total amount of data (in bytes) written to temporary files during query execution on this database.
Last checksum failure	The timestamp at which the last checksum failure was detected in this database. Ideally, this metric should have a NULL value (indicating that no checksum failures have occurred).
Total checksum failures	The total number of checksum failures detected in this database. This metric should ideally have a zero value.
Total block read time	The total time (in milliseconds) spent on reading data blocks by this database.
Total block write time	The total time (in milliseconds) spent on writing data blocks by this database.
Total session time	The cumulative time (in milliseconds) of all the sessions created by this database.
Total active time	The cumulative time (in milliseconds) that was spent on query execution by this database.
Total sessions	The total number of sessions created by this database.
Total sessions fatal	The total number of sessions that were terminated due to fatal reasons. Non-zero values of this metric should be investigated.
Total sessions killed	The total number of sessions that had to be manually terminated.
Idle sessions	The number of currently idle sessions in the database. Investigate unexpectedly high values of this metric. (The definition of high differs based on operational requirements and SLAs.)

Resource utilization metrics

These metrics allow you to track resource usage and make sound decisions regarding capacity planning. Keep a special eye on the following metrics:

Metric	Description
Table pages	The total number of pages associated with a particular database table. A page is typically 8 kilobytes. This metric can be used to determine the disk usage of a table.
Largest table	The database table that’s taking up the most disk space.
Index pages	The total number of pages associated with a particular database index. A page is typically 8 kilobytes. This metric can be used to determine the disk usage of an index.
Current memory utilization	The amount of memory (in bytes) that’s currently being used by this instance.
Available memory	The amount of memory (in bytes) that’s available for use to this instance.
Max memory utilization	The maximum amount of memory (in bytes) used by this instance since it started.
Current disk usage	The amount of disk space that’s currently being used by this instance.
Max disk usage	The maximum amount of disk space used by this instance since it started.

Table and indexing metrics

Monitoring metrics related to tables and indexes help structure the database for better performance and scalability. Here are a few examples:

Metric	Description
Total sequential scans	The total number of sequential scans performed on this table.
Total index scans	The total number of index scans performed on this table.
Sequential rows read	The total number of rows retrieved via sequential scans performed on this table.
Index rows read	The total number of rows retrieved via index scans performed on this table.
Total updated rows	The total number of table rows that have been updated since the table was created.
Total inserted rows	The total number of rows that have been inserted into this table since it was created.
Total hot updated rows	The total number of table rows that have been HOT (Heap-Only Tuples) updated.
Total dead rows	The total number of dead rows in the table.
Total live rows	The total number of live rows in the table.
Inserts since last vacuum	The total number of rows inserted into this table since it was last vacuumed.
Last vacuum time	The timestamp at which this table was last vacuumed manually.
Last auto-vacuum time	The timestamp at which this table was last vacuumed by the auto-vacuum daemon.
Last auto-analyze time	The timestamp at which this table was last analyzed by the auto-vacuum daemon.
Vacuum count	The total number of times this table has been manually vacuumed.
Auto vacuum count	The total number of times this table has been vacuumed automatically by the auto-vacuum daemon.
Total table indexes	The total number of indexes in a database table.
Returned index entries	The total number of entries that have been returned by scans on a particular index.
Returned table rows	The total number of table rows that have been returned by scans on a particular index.

Activity metrics

This category encompasses metrics related to query execution, number of connections, and transaction rates. Here are a few examples:

Metric	Description
Total active connections	The total number of connections to this database that are currently active.
Total idle connections	The total number of connections to this database that are currently idle.
Max connections	The maximum number of concurrent connections opened by this database since the instance started.
Most expensive queries (historical)	The queries that have taken the most time to execute since the database instance started.
Expensive (slow-running) queries	Currently active slow-running queries. Strive to optimize these queries for better overall performance.
Total write queries	The total number of queries that performed write operations on the database.
Total read queries	The total number of queries that performed read operations on the database.

Replication metrics

Replication metrics allow you to track the status and latency of a database’s replication processes. Pay particular attention to these metrics:

Metric	Description
State	Current state of the write-ahead log (WAL) sender. Possible values are startup, streaming, catchup, and backup.
Last WAL location sent	The last write-ahead log location that has been sent on a particular connection.
Last WAL location written	The last WAL location that has been written to disk by this standby node.
Sync state	The synchronous state of this standby server. Some possible values are: sync, async, and potential.

Tools for monitoring Postgres performance

Postgres has a robust mechanism for collecting statistics. Whether you want to see currently running queries, check replication lag, or track the size of your tables and indexes, you can use built-in Postgres views to do so.

Statistics views are divided into two categories: dynamic and collected. Dynamic statistics views display real-time information, while collected views can be used to track cumulative statistics (over a period or since startup).

Tracking Postgres metrics using built-in views

The pg_stat_activity view contains all the currently executing processes, along with their state and queries. It includes metrics like client IP address, query start timestamp, state change timestamp, and query text, which can be useful in identifying slow-running queries or bottlenecks and their causes.

The pg_stat_wal can be used to retrieve statistics related to the WAL. For example, you can track metrics like WAL buffers, WAL bytes, WAL sync, WAL write time, and WAL sync time that clearly show how well an instance is handling replication.

The pg_stat_database is a good place to look for database-wide statistics. It allows you to keep tabs on key metrics like number of rollback and committed transactions, number of rows fetched by index and sequential scans, and number of inserted/updated/deleted rows in a database.

There are several other views that let you track different aspects of a Postgres database, such as pg_stat_all_tables, pg_stat_ssl, pg_stat_progress_copy, and pg_stat_user_functions. Depending on your specific use case and monitoring requirements, you can use a combination of these views to effectively track both the historical and real-time metrics of your Postgres database.

Tracking Postgres metrics using open-source tools

There are a few open-source tools that display collected and dynamic statistics from Postgres views in a user-friendly manner. Here are a few examples:

pg_view: A command-line tool that can show per-process statistics, global system statistics, memory stats, and per-partition stats.
pg_activity: An activity monitoring tool for Postgres that has an output similar to the Linux top command.
pgmetrics: A tool that can collect more than 350 metrics from a Postgres server and display them in a reader-friendly manner. Collected metrics can also be exported as JSON or CSV files.
pgstats: A monitoring tool for Postgres that has an output similar to the Linux vmstat command.

Tracking Postgres metrics using Site24x7

Site24x7 offers all the key aspects of Postgres monitoring. It allows administrators to track metrics like idle user count, database conflict count, DB commits, and cache usage ratio in real time from a user-friendly graphical dashboard.

Since the Python-based plugin is open-source, you can also extend it to monitor as many Postgres metrics as you want.

Fine-tuning Postgres for better performance

Use the following tips and tricks to finetune a Postgres instance for peak performance:

Use indexes religiously and judiciously. They can speed up queries by orders of magnitude.
Use the right data types. For example, if you want to store small integers, use the smallint data type instead of integer to save storage space and improve performance.
Regularly vacuum and analyze the database to maintain data integrity and optimize performance.
Optimize complicated or long queries by rewriting them or breaking them down into smaller queries.
Limit the use of unnecessary triggers and constraints that may affect table performance.
Leverage connection pooling to efficiently manage and reuse database connections and reduce the overhead associated with reestablishing connections.
Regularly track disk usage of an instance to ensure that the database is structured and designed for peak performance. For example, you can use the following queries to get statistics related to space on any Postgres instance:

a) To fetch the total number of pages for a table:

 
SELECT pg_relation_filepath(oid), relpages 
FROM pg_class  
WHERE relname = 'tableName';

b) To display index sizes for a table:

 
SELECT c2.relname, c2.relpages  
FROM pg_class c, pg_class c2, pg_index i 
WHERE c.relname = 'tableName' AND 
c.oid = i.indrelid AND  
c2.oid = i.indexrelid  
ORDER BY c2.relname;

c) To find the tables and indexes that take up the most space:

 
SELECT relname, relpages 
FROM pg_class  
ORDER BY relpages DESC;

Conclusion

Postgres is a popular, open-source, multi-faceted database system that caters to wide-ranging use cases. However, regular monitoring is essential to ensure that it delivers peak performance. Use the metrics, tools, and tips detailed in this article to monitor Postgres and ensure that your database runs smoothly and efficiently.

A Comprehensive guide to Postgres Monitoring