Postgres has been a staple of IT infrastructures for decades. It has a diverse features, scales well to handle enterprise data sets, and is actively maintained by a community of experienced developers. It’s also fully extensible, allowing you to create new data types or functions to meet your specific business needs.
In today’s data-driven landscape, the success of applications is increasingly dependent on the performance of the underlying database. A poorly optimized database can lead to latency, bottlenecks, unexpected downtime, and even data loss. This is why it’s important to actively monitor Postgres.
In this article, we will share a detailed guide to Postgres monitoring. We’ll touch upon its architecture, provide a list of its most important performance metrics, and discuss the tools that you can use to monitor its performance in real time.
Postgres, short for PostgreSQL, is an advanced open-source relational database that has been available since 1986. It conforms to most of the features required by the SQL standard, which ensures compatibility and ease of migration from other SQL-based databases.
Postgres follows a client/server architecture. The server process is responsible for opening and maintaining client connections, managing the database files, executing queries, and reading and writing data to the disk.
The client is the application that interacts with the server to perform operations on the database. It can be a command-line utility, a web application, or any other tool capable of opening a TCP connection with the database server.
A Postgres server can open multiple concurrent connections with its clients, and to achieve this,the server spawns an independent process for each new connection. This allows the server to handle several concurrent connections without impacting performance.
Postgres offers a wide range of features that make it a great choice for businesses of all sizes and industries. Some key highlights include:
Owing to its versatility and flexibility, Postgres can be used as the primary database for a wide range of applications, including:
PostgreSQL is a powerful and resilient database, but it’s necessary to monitor it to ensure it performs optimally. Monitoring PostgreSQL can help you to:
As the primary database for applications, Postgres acts as the backbone of an infrastructure. Data-driven applications rely on Postgres for timely execution of read, write, and update queries. Any latency in Postgres can create a detrimental ripple effect in performance across the entire system. For example, let's say application A executes a query before sending a response to application B. If the query times out, the response to B may get delayed, leading to a chain reaction of delays and disruptions.
Monitoring performance metrics such as query execution time, slow-running queries, and memory utilization allows administrators to gauge performance in real time and take any corrective action if necessary.
Even the most well-built databases like Postgres can encounter bottlenecks due to factors, including suboptimal query execution plans, missing indexes, or insufficient resources. For example, if a slow-running query locks multiple tables for writes, it can create a system-wide bottleneck that prevents applications from writing data to the database.
Monitoring helps to pinpoint bottlenecks and identify improvement avenues. For example, a monitoring tool may issue an alert if the memory utilization spikes. Looking at the dashboard, the administrator may find that an unknown client executes too many concurrent requests. Force-terminating the client connection can reduce memory usage significantly and restore optimal performance.
Unplanned downtime can be costly and disruptive. Your customers may be unable to access your applications, or your valuable transactional data may not get logged. For example, if a database server goes down before an ongoing transaction gets committed, the transaction data may be lost forever.
Regular monitoring enables administrators to quickly detect and contextualize issues and proactively address them to prevent system failure and minimize downtime. This approach ensures high availability and a consistent user experience.
Postgres provides a wide range of configuration options, and the optimal settings vary based on specific infrastructure requirements. For example, the memory needs for Postgres in a small-scale web application would differ significantly from that of a carrier-grade enterprise billing system.
Monitoring the database's performance under different workloads allows you to fine-tune configuration parameters, maximizing resource utilization and throughput.
Postgres, like any other database, is a security-critical resource. If a cybercriminal gains unauthorized access to a Postgres server, they can steal confidential data or cause downtime. For example, they may leverage privilege escalation to truncate critical database tables.
Regular monitoring helps detect suspicious activities that might indicate security breaches or unauthorized access attempts. For example, if someone tries to drop a database table on a production instance, the monitoring tool can issue a critical alert to the administrators who can then investigate and mitigate the threat.
The most important performance monitoring metrics for Postgres can be divided into the following broad categories:
Database-wide metrics provide an overall view of the database’s health and performance. Let’s look at a few examples:
Metric | Description |
---|---|
Current state | The current state of this instance. For example, active, idle, or disabled. |
Committed transactions | The total number of committed transactions in this database. |
Rolled back transactions | The total number of transactions in this database that were rolled back. |
Read blocks | The total number of disk blocks that this database has read. |
Hit blocks | The total number of times a disk block was found in the buffer cache and therefore didn’t require a disk read. |
Total inserted rows | The total number of rows that have been inserted into this database via queries. |
Total deleted rows | The total number of rows that have been deleted from this database via queries. |
Total updated rows | The total number of rows that have been updated in this database via queries. |
Total fetched rows | The total number of rows that were retrieved via index scans from this database. Track this metric’s value to ensure that your indexes are being properly used. |
Temporary file count | The total number of temporary files that were created during query execution on this database. |
Total deadlocks | A cumulative count of deadlocks that have occurred in this database. Strive to maintain a zero value for this metric. |
Temporary file bytes | The total amount of data (in bytes) written to temporary files during query execution on this database. |
Last checksum failure | The timestamp at which the last checksum failure was detected in this database. Ideally, this metric should have a NULL value (indicating that no checksum failures have occurred). |
Total checksum failures | The total number of checksum failures detected in this database. This metric should ideally have a zero value. |
Total block read time | The total time (in milliseconds) spent on reading data blocks by this database. |
Total block write time | The total time (in milliseconds) spent on writing data blocks by this database. |
Total session time | The cumulative time (in milliseconds) of all the sessions created by this database. |
Total active time | The cumulative time (in milliseconds) that was spent on query execution by this database. |
Total sessions | The total number of sessions created by this database. |
Total sessions fatal | The total number of sessions that were terminated due to fatal reasons. Non-zero values of this metric should be investigated. |
Total sessions killed | The total number of sessions that had to be manually terminated. |
Idle sessions | The number of currently idle sessions in the database. Investigate unexpectedly high values of this metric. (The definition of high differs based on operational requirements and SLAs.) |
These metrics allow you to track resource usage and make sound decisions regarding capacity planning. Keep a special eye on the following metrics:
Metric | Description |
---|---|
Table pages | The total number of pages associated with a particular database table. A page is typically 8 kilobytes. This metric can be used to determine the disk usage of a table. |
Largest table | The database table that’s taking up the most disk space. |
Index pages | The total number of pages associated with a particular database index. A page is typically 8 kilobytes. This metric can be used to determine the disk usage of an index. |
Current memory utilization | The amount of memory (in bytes) that’s currently being used by this instance. |
Available memory | The amount of memory (in bytes) that’s available for use to this instance. |
Max memory utilization | The maximum amount of memory (in bytes) used by this instance since it started. |
Current disk usage | The amount of disk space that’s currently being used by this instance. |
Max disk usage | The maximum amount of disk space used by this instance since it started. |
Monitoring metrics related to tables and indexes help structure the database for better performance and scalability. Here are a few examples:
Metric | Description |
---|---|
Total sequential scans | The total number of sequential scans performed on this table. |
Total index scans | The total number of index scans performed on this table. |
Sequential rows read | The total number of rows retrieved via sequential scans performed on this table. |
Index rows read | The total number of rows retrieved via index scans performed on this table. |
Total updated rows | The total number of table rows that have been updated since the table was created. |
Total inserted rows | The total number of rows that have been inserted into this table since it was created. |
Total hot updated rows | The total number of table rows that have been HOT (Heap-Only Tuples) updated. |
Total dead rows | The total number of dead rows in the table. |
Total live rows | The total number of live rows in the table. |
Inserts since last vacuum | The total number of rows inserted into this table since it was last vacuumed. |
Last vacuum time | The timestamp at which this table was last vacuumed manually. |
Last auto-vacuum time | The timestamp at which this table was last vacuumed by the auto-vacuum daemon. |
Last auto-analyze time | The timestamp at which this table was last analyzed by the auto-vacuum daemon. |
Vacuum count | The total number of times this table has been manually vacuumed. |
Auto vacuum count | The total number of times this table has been vacuumed automatically by the auto-vacuum daemon. |
Total table indexes | The total number of indexes in a database table. |
Returned index entries | The total number of entries that have been returned by scans on a particular index. |
Returned table rows | The total number of table rows that have been returned by scans on a particular index. |
This category encompasses metrics related to query execution, number of connections, and transaction rates. Here are a few examples:
Metric | Description |
---|---|
Total active connections | The total number of connections to this database that are currently active. |
Total idle connections | The total number of connections to this database that are currently idle. |
Max connections | The maximum number of concurrent connections opened by this database since the instance started. |
Most expensive queries (historical) | The queries that have taken the most time to execute since the database instance started. |
Expensive (slow-running) queries | Currently active slow-running queries. Strive to optimize these queries for better overall performance. |
Total write queries | The total number of queries that performed write operations on the database. |
Total read queries | The total number of queries that performed read operations on the database. |
Replication metrics allow you to track the status and latency of a database’s replication processes. Pay particular attention to these metrics:
Metric | Description |
---|---|
State | Current state of the write-ahead log (WAL) sender. Possible values are startup, streaming, catchup, and backup. |
Last WAL location sent | The last write-ahead log location that has been sent on a particular connection. |
Last WAL location written | The last WAL location that has been written to disk by this standby node. |
Sync state | The synchronous state of this standby server. Some possible values are: sync, async, and potential. |
Postgres has a robust mechanism for collecting statistics. Whether you want to see currently running queries, check replication lag, or track the size of your tables and indexes, you can use built-in Postgres views to do so.
Statistics views are divided into two categories: dynamic and collected. Dynamic statistics views display real-time information, while collected views can be used to track cumulative statistics (over a period or since startup).
The pg_stat_activity view contains all the currently executing processes, along with their state and queries. It includes metrics like client IP address, query start timestamp, state change timestamp, and query text, which can be useful in identifying slow-running queries or bottlenecks and their causes.
The pg_stat_wal can be used to retrieve statistics related to the WAL. For example, you can track metrics like WAL buffers, WAL bytes, WAL sync, WAL write time, and WAL sync time that clearly show how well an instance is handling replication.
The pg_stat_database is a good place to look for database-wide statistics. It allows you to keep tabs on key metrics like number of rollback and committed transactions, number of rows fetched by index and sequential scans, and number of inserted/updated/deleted rows in a database.
There are several other views that let you track different aspects of a Postgres database, such as pg_stat_all_tables, pg_stat_ssl, pg_stat_progress_copy, and pg_stat_user_functions. Depending on your specific use case and monitoring requirements, you can use a combination of these views to effectively track both the historical and real-time metrics of your Postgres database.
There are a few open-source tools that display collected and dynamic statistics from Postgres views in a user-friendly manner. Here are a few examples:
Site24x7 offers all the key aspects of Postgres monitoring. It allows administrators to track metrics like idle user count, database conflict count, DB commits, and cache usage ratio in real time from a user-friendly graphical dashboard.
Since the Python-based plugin is open-source, you can also extend it to monitor as many Postgres metrics as you want.
Use the following tips and tricks to finetune a Postgres instance for peak performance:
a) To fetch the total number of pages for a table:
SELECT pg_relation_filepath(oid), relpages
FROM pg_class
WHERE relname = 'tableName';
b) To display index sizes for a table:
SELECT c2.relname, c2.relpages
FROM pg_class c, pg_class c2, pg_index i
WHERE c.relname = 'tableName' AND
c.oid = i.indrelid AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
c) To find the tables and indexes that take up the most space:
SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
Postgres is a popular, open-source, multi-faceted database system that caters to wide-ranging use cases. However, regular monitoring is essential to ensure that it delivers peak performance. Use the metrics, tools, and tips detailed in this article to monitor Postgres and ensure that your database runs smoothly and efficiently.
Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.
Apply Now