Memcached is a popular in-memory key-value store used to speed up dynamic applications. By storing data in memory, Memcached reduces data retrieval time and boosts overall performance. However, as a shared-memory system, it poses unique challenges in terms of troubleshooting and monitoring.
This article will discuss the importance of monitoring Memcached and provide a comprehensive guide to tracking key metrics using native tools.
Memcached belongs to the class of software known as in-memory data stores. It predominantly acts as a caching layer sitting in front of a primary database and storing frequently accessed data in memory. This approach allows applications to get data directly from the cache, avoiding the need for time-consuming database queries that would require disk I/O.
Memcached has a distributed architecture in which a virtual memory pool is created and shared among multiple servers in a cluster. This approach enables Memcached to handle high traffic volumes and cater to growing data demands by adding more nodes to the cluster.
As a key-value store that doesn't understand data structures, Memcached expects all uploaded data to be serialized and stores it as key-value pairs where each key uniquely identifies the corresponding value. This efficient storage mechanism allows Memcached to achieve O(1) speed for all commands. On the most performant servers, it can deliver a throughput of millions of keys per second.
Memcached uses an LRU (Least Recently Used) eviction policy to manage memory. It supports different protocols, including HTTP, TCP, and UDP, which makes it easy to integrate Memcached with a wide range of applications.
It can be used for a variety of use cases:
While Redis, Aerospike, and Elasticsearch are compelling alternatives, Memcached provides distinct advantages that make it an ideal choice for specific use cases:
Regular monitoring of a Memcached cluster is essential to ensure bottleneck-free operations. Here are other key reasons why you should actively monitor Memcached:
Memcached has a shared memory architecture, which means that multiple processes can access the same memory block at the same time. This can make it harder to track down performance issues, as a problem may be caused by one or more processes. However, by regularly monitoring key metrics, you can detect issues early on, before they escalate.
For example, if you notice that the cache eviction rate suddenly spikes and coincides with an increase in overall memory usage, you can assume that the cache is running out of memory. This insight can help you to take corrective action in a timely manner, avoiding service disruption.
Monitoring Memcached allows you to track key performance metrics, including cache hit ratio, cache miss ratio, and response times. By analyzing trends in these metrics, you can gauge whether a cluster is performing at full potential or requires optimization.
For example, if you are noticing a really high cache miss ratio after every build upgrade, you can surmise that you need a better cache initialization strategy at application startup.
As data stored in Memcached resides solely in memory, it is vital to monitor its integrity. Regularly inspecting cache consistency and validating data accuracy ensures that clients receive reliable and fresh information from the cache.
For example, by analyzing older data records, you may detect a key-value pair that has long been invalidated. After investigating the root cause of the staleness, you can manually update the data to ensure that it is accurate and up to date.
Even the most fault-tolerant systems like Memcached require monitoring for high availability. Memcached relies on memory, so tracking memory and resource utilization is essential to ensure that the cluster stays up and running.
For example, a monitoring tool can alert administrators if a node is approaching peak memory utilization. This allows them to investigate the issue and respond accordingly, by, for example, increasing the memory resources of the node.
Memcached has several configuration options, and it performs best when it’s tuned according to business needs and the underlying hardware. Proactive monitoring provides insights into the resource usage of the cluster, enabling you to optimize configurations based on the actual workload.
For example, you may tweak the cache size, eviction policy, or connection limits for better memory utilization and improved throughput.
Quick contextualization of an issue is crucial for its timely resolution. Regular monitoring and alerting enable you to contextualize and diagnose issues as they arise. This helps in decreasing the mean time to resolution (MTTR) and ensuring business continuity.
For example, if you notice a few unexpected connection timeouts or failures, you can analyze logs to establish context and apply a timely fix.
For holistic monitoring of a Memcached cluster in production, focus on the following key metric categories:
Metrics related to the actual cache performance should be at the top of the list of metrics to monitor. These metrics can help you assess how well Memcached performs as the caching layer in your deployment.
Metric | Description |
---|---|
Cache hit rate | The percentage of requests served from the cache. |
Cache miss rate | The percentage of requests for which data wasn’t found in the cache. |
Cache eviction rate | The frequency at which entries are being evicted from the cache. |
Cache fill ratio | The percentage of cache space used to store data. |
Total entries | The total amount of key-value pairs stored in the cache. |
Average time to live | The average time to live value of all the records in the cache. |
Cache turnover rate | The rate at which cache entries are being replaced. |
New items | The number of new items added to the cache within a specific time. |
Reclaimed items | The total number of expired items that were evicted by Memcached to create space for new entries. |
Evicted and unfetched items | The total number of valid data items that were evicted from the cache and were never fetched by any client. High values of this metric should be investigated. (The definition of high differs based on operational requirements and SLAs.) |
Expired and unfetched items | The total number of expired data items that were reclaimed, and were never fetched by any client. High values of this metric should be investigated. (The definition of high differs based on operational requirements and SLAs.) |
Request metrics offer insights into the command processing layer of Memcached. Let’s look at a few examples:
Metric | Description |
---|---|
Timeouts | The number of times a request to Memcached timed out. A high number of timeouts may be a sign that Memcached is overloaded. |
Request rate | The overall rate at which requests are being made to Memcached. |
Response time | The average time taken by Memcached to respond to requests. |
Request throughput | The number of requests processed by Memcached per unit of time. This metric is a great way to gauge an instance’s instantaneous health. |
Request latency | The time taken for individual requests to be processed by Memcached. |
Miss latency | The average time taken to access an item that was not found in Memcached. A high miss latency can indicate that Memcached is not caching enough data. |
Check and Set (CAS) requests | The total number of CAS requests received by Memcached. |
Check and Set (CAS) bad requests | The total number of CAS requests received by Memcached in which the compared value didn’t match the currently cached value. |
Check and Set (CAS) hit requests | The total number of CAS requests received by Memcached in which the compared value matched the currently cached value. |
Check and Set (CAS) miss requests | The total number of CAS requests received by Memcached in which the requested key wasn’t found. |
Get commands | The total number of get commands received by the cache. |
Flush commands | The total number of flush commands received by the cache. |
Set commands | The total number of set commands received by the cache. |
Decrement hits | The total number of decrement requests received by Memcached in which the requested key was found. |
Decrement misses | The total number of decrement requests received by Memcached in which the requested key wasn’t found. |
Get hits | The total number of get commands received by Memcached in which the requested key was found. |
Get misses | The total number of get commands received by Memcached in which the requested key wasn’t found. |
Delete hits | The total number of delete commands received by Memcached in which the requested key was found. |
Delete misses | The total number of delete commands received by Memcached in which the requested key wasn’t found. |
Increment hits | The total number of increment requests received by Memcached in which the requested key was found. |
Increment misses | The total number of increment requests received by Memcached in which the requested key wasn’t found. |
Replace commands | The total number of replace commands received by the cache. |
Append commands | The total number of append commands received by the cache. |
Prepend commands | The total number of prepend commands received by the cache. |
Gets commands | The total number of gets commands received by the cache. |
Tracking memory metrics is another crucial aspect of monitoring Memcached. Focus on the following important memory metrics:
Metric | Description |
---|---|
Memory usage | The amount of memory currently utilized by Memcached. Ensure that this metric’s value doesn’t approach the max memory threshold. |
Available memory | The remaining available memory that Memcached can use. |
Fragmentation | The extent of memory fragmentation within the cache. Take steps to minimize fragmentation. |
Cache item size | The average size of a key-value pair stored in the cache. |
Memory utilization ratio | The percentage of used memory compared to the total memory. |
Cache memory allocation | The total amount of memory allocated for caching data in Memcached. |
Total bytes read | The total number of bytes that the cache has read from the network. |
Total bytes used for caching | The total number of memory bytes used for caching data. |
Total bytes written out | The total number of bytes that the cache has written to the network. |
Total bytes used for hashing | The total number of memory bytes used for storing data in hash tables. |
Network metrics enable you to identify and debug network bottlenecks, latency, and degradations. Focus on the following metrics:
Metric | Description |
---|---|
Current connections | The number of active connections formed by the cache. |
Stale connections | The number of stale connections. Strive to keep this value to a minimum (ideally zero). |
Total connections | The total number of connections formed by the cache since startup. |
Failed connections | The total number of failed connection attempts. |
Total connection limit reached | The number of times Memcached reached its max connection limit. Non-zero values of this metric should be investigated immediately. |
Accepting connections | A Boolean value to indicate whether the cache is currently accepting connections or not. A false value for this metric warrants immediate investigation. |
Network latency | The average time taken for data to travel between clients and Memcached. Strive to reduce this metric’s value as much as possible. |
Network errors | The total number of network-related errors that the cache has encountered since startup. |
Network throughput | The rate at which data is transferred over the network to/from Memcached. |
Packet loss | The amount of packet loss that has occurred. A high packet loss indicates a network problem. |
Keeping an eye on server health metrics allows you to optimize resource utilization and cache efficiency. The following server health metrics are the most important:
Metric | Description |
---|---|
CPU utilization | The current CPU utilization of the Memcached instance. |
Max CPU utilization | The maximum CPU utilization of the cache since startup. |
Uptime | The amount of time the instance has been up. |
Server load | The average load on the Memcached server over a given time. Track this metric over time to study usage patterns and perform adequate capacity planning. |
Node failure rate | The frequency of node failures within the cluster. A non-zero value of this metric should be investigated immediately. |
Active threads | The total number of currently active worker threads. |
Total threads | The total number of threads spawned by the Memcached instance since startup. |
Waiting threads | The total number of worker threads spawned by the Memcached instance that are currently in the “Waiting” state. |
Memcached offers a built-in command, stats, which can be used to track its performance in real time. The stats command displays the following key statistics:
Current connections, accepting connections, total connections, flush command counter, cache sizes, evicted elements, total bytes read, CAS hits, CAS misses, increment hits, increment misses, authentication commands, evicted non-zero elements, total pages, and others.
The stats command also offers the following sub commands:
You can combine different statistics from the output of the stats command to gather even more key insights. For example, you can calculate the global hit rate as get hits / (get hits + get misses). Or you can find the number of free connections using: total connections – current connections.
The Site24x7 Memcached plugin allows you to monitor all aspects of your Memcached cluster from a single pane of glass. You can track several key metrics in real time, including hit ratio, miss ratio, bytes read, bytes written, current connections, number of threads, evictions, latency, and throughput.
The free Python-based plugin can be downloaded directly from GitHub. It seamlessly integrates with the Site24x7 Linux agent, allowing you to view real-time performance metrics on the Site24x7 web client.
Memcached's simplicity and efficiency as an in-memory key-value store make it a great choice for improving the performance of dynamic applications. While it may not offer the extensive feature set of Redis or the interactivity of Aerospike it excels at what it was designed for: lightning-fast caching.
This article aimed to provide you with all the information you need to monitor key Memcached metrics using native tools. By proactively tracking these metrics, you can ensure the seamless operation of your Memcached cluster and deliver optimal performance to your users.
Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.
Apply Now