Performance Metrics of APM Insight

Interpret APM Insight monitoring results

Monitoring your applications using Site24x7 APM Insight, an application performance monitoring tool, allows you to track and measure important metrics including apdex score, appserver throughput, response time, exceptions and more from a customizable and unified console. On the right corner you can also decide on the time frame for which you need the metrics. Options to edit, unmanage instance, delete instance, copy web script and export to PDF are also given on the top right hand side of the console. Various parameters and metrics can be obtained by accessing the following tabs:

Overview
- Data throughput
- Events timeline
Web transaction
Database
Background
Traces
- Filter and advanced filter
- Un-instrumented block of code
Trace details tab
Node VM (For Node.js agent)
JVM (For Java agent)
IIS (For .NET agent)
Exceptions
RUM analytics tab
Milestone tab
Server metrics tab
Outages tab
Data collection stats tab
Instance split up tab

Overview:

The basic function of this tab is to give a bird's eye view of all the major parameters from an application's point of view.

Overview tab

Also, you can view details about the fastest and slowest transactions by response time as well as the recent exception traces generated.

Data Throughput:

The size of request and response objects are tracked and shown in data throughput. Request size is captured as bytes in and response size is captured as bytes out.

Data Throughput

Data throughput helps you assess the size of the incoming request - gives you an idea as how much data your app server is handling.

Transactions

For instance, from the above image you can understand that the particular transaction 'arh/trace' has a higher request size in comparison with other requests.

This hels you to assess the general size of the incoming requests and this in turn comes handy when there is a sudden spike in the size of the incoming requests.

Example, in case of DDOS attacks, the incoming requests could be of huge data size and by knowing your average request sizes, you can easily spot anomalies.

Events timeline:

Events Timeline widget records all the events of your selected Application/Instance for a selected time range. You can identify/decode various events from the past, which includes Down, Critical, Trouble, Maintenance, Anomaly, or Suspended. Each event is color-coded for easy identification. Events can be drilled down to extract maximum data and facilitate easy troubleshooting.

Events Timeline

Web transaction:

The various transactions running in the application are listed here along with the following metrics:

Parameters	Description
Transaction	Name of the transaction
Apdex	A numerical measure of user satisfaction, with 1 being the highest and 0 the lowest
Count	Number of times a particular transaction has been called by a user
Errors (%)	The percentage of errors that occur in a particular transaction
Avg. Resp. Time	The average amount of time a particular transaction takes to respond to a user request
Min	The lowest amount of time taken by a particular transaction to respond to a user request
Max	The highest amount of time taken by a particular transaction to respond to a user request
Total	Total time taken by a particular transaction to respond to all user requests
Avg. CPU Time	The average amount of time taken by the CPU to respond
Fatal	Number of fatal errors that have occurred in a particular transaction

You can sort the available data based on values of each parameters. Also, if you select any specific transaction you will get a graphical view displaying different metrics in the form of graphs.

Web Transactions

Database:

This section provides a complete description on the total number of SQL queries that are executed for the application.

Parameters	Description
Database Operation	The name of the db operation being performed on the database
Count	The number of times the db operation has been performed by users
Errors (%)	The percentage of errors that has occurred in the db operation
Avg. resp. Time	The average amount of time a particular db operation takes to complete
Min	The lowest amount of time taken by a db operation to complete
Max	The highest amount of time taken by a db operation to complete
Total	Total time taken by a db operation to complete

Background:

Information on the various background tasks such as maintenance, schedulers, messaging etc is provided here.

Parameters	Description
Transaction	Name of the background transaction
Count	Number of times a particular background transaction has been executed
Errors (%)	The percentage of errors that occur in a particular background transaction
Avg. Resp. Time	The average amount of time a particular background transaction takes to respond
Min	The lowest amount of time taken by a particular background transaction
Max	The highest amount of time taken by a particular background transaction
Total	The total time taken by a particular background transaction
Avg. CPU Time	The average amount of time taken by the CPU to respond

Traces:

The various traces that have been run on various applications and all details pertaining to them are listed in this section.

Parameters	Description
Start Time	Detailed start date and time when the trace was launched
Transaction	The name of the transaction on which the trace was launched
Resp. Time	The time taken by the trace to complete its execution
Avg. Resp. Time	The average amount of time taken by the trace when run multiple times
CPU Time	The amount of time taken by the CPU to respond

On selecting any particular trace we can dive into individual metrics for each. The drill down is divided into 3 tabs:

Summary
Trace Details
SQL Statements
Remote calls
JVM Metrics
Server Metrics

Traces - Summary:

Gives an overall summary on the trace selected and the slowest components that it was able to identify.

Traces summary

Parameters	Description
Slowest components	The names of all the components traced with the slowest being shown first
Count	Number of times the component was called
Duration	The time duration the component took to execute out of the total time that the trace took to execute
Percentage(%)	Percentage value of the duration the component took to execute
Custom Parameter	The user defined parameters for the respective transactions are shown

External calls

Traces - Trace Details:

This tab allows you to dig deeper into all of the entities involved in the trace to identify the anomalous spans causing latency.

APM Trace Details tab

This displays the total duration of the trace and the total number of spans in the trace. You can perform a quick search based on any of the trace types like APPCODE, MYSQL, WEBREQUEST, HANDLED_EXCEPTIONS, and ALL, by selecting it from the Filter drop-down menu.

You can also search for a span by its name using the Search by span name box as shown in the screenshot below.

Minimap

Gives a condensed view of the trace timeline. You can click and drag your mouse over the map to filter the spans of that time range. The filtered spans will be listed in the main timeline. If you want to select a different time range, click Reset and then select again.

Trace Details- Minimap

Minimap will not be generated for traces containing more than 1,500 spans.

Timeline

Shows the list of spans within the trace. You can also expand or collapse the span to view the children spans.

The screenshot below shows the expanded span.

Trace Details- Timeline

The timeline bar is color-coded based on the span type.
By default, all spans are expanded except those with a time duration less than 30% of the total trace duration.
The spans containing exceptions are highlighted in red.

Traces - SQL Statements:

Information on all the SQL queries executed by the trace

Parameters	Description
Timestamp(second)	The time when the sql query was executed by the trace
Execution time(ms)	How much time the query took on its own to complete
Query	Name of the query executed
No. of Queries	Total number of queries executed

Traces - Remote calls:

Lists down all the external remote calls made during the execution of the trace. All the external calls made are identified and listed down under two classifications, and the total count is obtained. The two different types of external calls identified are:

WEBREQUEST
WCF

To know more about how external calls are tracked, please go through our blog on the same topic.

Traces - JVM Metrics:

For Java applications, you can navigate to Traces > JVM Metrics to get the graph views of the important JVM metrics such as JVM CPU Usage, JVM Classes Count, Heap Memory, and Non-Heap Memory before and after the trace start time. The red mark on the graph indicates the trace's start time. The user can obtain troubleshooting information by comparing the key metrics before and after the tracing.

Traces - Server Metrics:

Navigate to Traces > Server Metrics to see graph views of the server's important metrics before and after the trace start time. The red mark on the graph indicates the trace's start time. This helps the user to get information on the status of the corresponding server at the specified time.

Traces- Server Metrics

Traces - NodeVM's metrics:

For Node.js applications, you can navigate to Traces > NodeVM's Metrics to get the graph views of CPU metrics, garbage collection data, and event loop data, before and after the trace start time. The red mark on the graph indicates the trace's start time. The user can obtain troubleshooting information by comparing the key metrics before and after the tracing.

Filter and advanced filter:

Site24x7 uses Filter and Advanced Filter options to identify traces based on multiple search conditions. You can then perform the required action on the filtered traces. You can apply the filters to any of these three categories:
All: View all traces, including error traces and distributed traces.
Errors: View the error traces.
Distributed: View the traces that flow to other applications.

Filter

You can perform a quick search based on any of the trace metrics by selecting it from the Filter drop-down and specifying the threshold value in the provided is above box.

Example: If you choose Response Time from the Filter drop-down and set the threshold value as three seconds in the is above box, all traces with response times greater than three seconds will be displayed.

Basic Filter

Advanced filter

The Advanced Filter option has built-in AND conditions, allowing you to specify an unlimited number of search conditions. During the filtration process, the AND conditions will be implemented, meaning only traces concurring to all of these conditions will be displayed.

You can click on the Add Filters option to select the required field type—be it Transaction Name, Exception Class, Component Name, or another filter—and the field value. The field values will be listed based on the field type selected.

You can add multiple filters one by one. Each filter added will be considered as an AND condition.

The defined filter value in the Filter drop-down will also be included in the AND condition.

For example, in the image below, the search criteria has three conditions—Transaction Name as zylker/settings/, Exception as java.lang.NullPointerException, and Response Time as above 2 seconds.

Advanced Filter

You can click the icon to view the performance of that particular transaction.

Performance of individual transaction

You can view the whole picture of the required trace by clicking directly on it.

Traces

Un-instrumented block of code:

In general, the APM Insight agent captures known frameworks and methods in your applications. Components involved in a transaction, including it’s method calls and functions are listed under the Traces tab.

While inspecting a transaction trace, you may encounter a field called un-instrumented block of code.

You may get this message under two circumstances:

A. When you have used custom methods or functions in your application code.

B. Even in known frameworks, the agent may not be able to track all methods or functions, called between two instrumented methods or functions. In such cases, that particular method or function is marked as un-instrumented block of code. This helps you to identify the exact occurrence of the specific method. By knowing the time stamp and the instance of occurrence, you can deploy custom instrumentation to figure out the issue.

JVM(Only available for Java agent users):

JVM tab is available only if you are using Java agent. This tab will give us all the necessary data on the JVM instance. Various parameters and metrics can be obtained by accessing the following tabs:

Summary
Garbage Collector
Threads
Configuration

JVM - Summary:

Gives an overall summary on the JVM instance with information on CPU usage, runtime memory, JIT, classes, heap and non-heap memory etc.

JVM summary

Runtime memory

Heap memory

Non-heap memory

Just In Time compiler

JVM classes count

JVM - Garbage collector:

Detailed information on how runtime memory is being managed in the JVM instance. It lists down all the different garbage collectors that are executed along with the collected objects count and time spent

JVM garbage collector- objects

Total collections and time spent

JVM - Threads:

Data on all the different paths followed in executing transactions in the application.

JVM threads

CPU Time and User Time

JVM - Configuration:

Complete description of the JVM instance running and other related information such as:

Host name and version
Memory type and how much is available
Up Time
Thread Count etc.

You can configure alerts for JVM metrics. Refer here to set up alerts.

Node VM

The Node VM tab is available only if you are using the Node.js agent. The Node.js agent uses Node VM, a native node add-on, that collects key metrics data from the Google Chrome V8. The agent collects CPU metrics, which are usually analyzed with garbage collection (GC) metrics. These metrics help you in improving the performance of your application.

IIS (Only available for .NET agent users):

This tab is available only if you have enabled APM Insight in IIS monitor console and only if you are using a .NET agent. To know more about how to enable APM Insight in IIS monitoring, please go through our blog on the same topic. Various parameters and metrics can be obtained by accessing the following tabs:

Summary
Application Pools

IIS - Summary:

Gives an overall view on the IIS server and the application accessing the server.

Microsoft IIS server details

IIS application details

IIS cache performance

IIS - Application pools:

Information on all the application pools that are running on the IIS server.

IIS heap performance

IIS CLR performance

Application performance

Memory statistics

Exceptions:

Description of all the exceptions that have occurred with count of how many times they have taken place.

Transaction splitup by exceptions and error code

Exception count and Error code ratio splitup by error code

RUM analytics:

An interface tab which shows all the important data collected by Site24x7 APM Insight RUM agent.

RUM Analytics

Milestone tab:

In general, you can mark milestones to review your application performance before and after a feature update, issue fix or performance enhancements etc. You can view all such created milestones under the milestone tab. Milestones created for the corresponding application, for the chosen time period is listed here.

Milestone Tab

Milestones created at monitor level only are displayed here. Milestones created at group level and global level can be viewed under the Admin tab. Learn more.

Clicking on a particular milestone displays your application metrics before and after the selected time period.

The following metrics can be compared using milestone markers:

Apdex score
App Server Response Time
Request Throughput
Data Throughput
Error Count
Exception Count
HTTP Error Rate
Web, Background and Database transactions

For example, clicking on the 'APM TEST' milestone, displays the application performance for a selected period, before and after one hour.

Milestone compare

Image displaying performance of transactions before and after six hours.

Milestone transactions comparison

Metrics for a particular milestone will be displayed only if the milestone created time and the chosen time period are in the same range. You cannot compare metrics when no milestones were created for a particular time period.

You can also compare and view the performance metrics directly by clicking on the milestone view.

To view this,

Go to Milestone View
Click on the Milestone name
Select the time range and click Apply.

Server Metrics tab:

In general, you can see a comprehensive list of all the server monitors mapped to the instances of your application. You can also view the server metrics at the instance level and application level separately.

Application-level metrics

If you choose the application name from the top-left menu, you will get the complete list of the server monitors associated with your application.
APM-Server Integration App Level List

You can view the whole picture of performance metrics by clicking on the server monitor.
APM-Server Integration Instance Details

Instance-level metrics

If you choose the instance name from the top-left menu, you will get a thorough picture of all the major performance metrics of the associated server.
Server Integration- Instance level metrics

Outages tab:

Gives an overall summary on the Down, Trouble, Critical history of the selected Application/Instance with information on Start Time to End Time, Duration, Reason, Comments.

Parameters	Description
Start Time to End Time	The start and end time of the detected outage
Duration	The time duration of the detected outage
Reason	The reason for the detected outage for quick troubleshooting
Comments	Comments added by the user for reference

Outages tab

You can click the icon to mark an outage as maintenance, edit comments, or even delete any outages if it is irrelevant.

Outages- hamburger options

Mark as Maintenance

You can mark a specific outage period as maintenance using the Mark as Maintenance option. After marking an outage as maintenance, the status icon changes to maintenance and the record is still available under the Outages tab. The maintenance can be reverted to outage if needed.

Data Collection Stats

You can get a detailed data report for a specific outage by clicking the hamburger icon and selecting the Data Collection Stats option.

Edit Comments

During any detected outage, Site24x7 will auto-populate the reason for the outage under the Reason section. Anyone, irrespective of the user role, can edit/delete these system-generated comments using the Edit Comments option.

Delete

You can delete any outage or maintenance that is irrelevant using the Delete option.

You can also add an outage manually if needed using the Add Outage button.
You can use the Download CSV button to export the displayed Outage report.

Add outages option

Data Collection Stats Tab:

The Data Collection Stats tab gives a detailed data report of the selected application or instance for the chosen time period. The data report will be available for the last 30 days.

Refresh the page manually to view the data report for the most recent polls.

Data Collection Stats

Parameters	Description
Status	The status of the monitor, like Up, Down, Trouble, or Critical
Apdex	A metric for user satisfaction, with 1 being the highest and 0 being the lowest
Average Response Time (ms)	The average time it takes to respond to a user request
Count	The number of requests with successful responses
Error Count	The total number of errors that occurred
Fatal Exception Count	The total number of fatal exceptions that occurred
Throughput (rpm)	The number of requests received per minute
Error Rate (%)	The percentage of errors that occurred
JVM CPU usage (%)	The percentage of CPU usage by the Java virtual machine (JVM)
Heap Memory Usage (%)	The percentage of heap memory used by the JVM
GC Count	The number of global garbage collections that occurred
GC Time (ms)	The time taken to perform garbage collection

Instance Split up tab:

In general, you can see a comprehensive list of all the instances in your application, along with the metrics associated with them. Instance split up tab

When you click on an individual instance, you will be taken to the respective Instance details page.

Parameters	Description	Available For
Apdex Score	A numerical measure of user satisfaction, with 1 representing the highest and 0 representing the lowest.	Java, .NET, Node.js, PHP, Ruby, Python
Satisfied (count)	The number of transactions that are labeled as Satisfied. If any transaction response time scores values below the Apdex threshold value, the transaction is labeled as Satisfied.	Java, .NET, Node.js, PHP, Ruby, Python
Tolerating (count)	The number of transactions that are labeled as Tolerating. If any transaction response time is exactly equal to Apdex threshold, or in between satisfied and frustrated threshold values, it is labeled as Tolerating.	Java, .NET, Node.js, PHP, Ruby, Python
Frustrated (count)	The number of transactions that are labeled as Frustrated. If any transaction response time scores above four times the Apdex threshold, the transaction is labeled as Frustrated.	Java, .NET, Node.js, PHP, Ruby, Python
Resp.Time (ms)	The average time taken by the instance to respond to user requests.	Java, .NET, Node.js, PHP, Ruby, Python
Throughput (rpm)	The number of requests received per minute.	Java, .NET, Node.js, PHP, Ruby, Python
Req.Count	The total number of requests received.	Java, .NET, Node.js, PHP, Ruby, Python
Errors (%)	The percentage of errors that occurred.	Java, .NET, Node.js, PHP, Ruby, Python
Status	The status of the instance, like Up, Down, Trouble, or Critical.	Java, .NET, Node.js, PHP, Ruby, Python
Host	The hostname of the instance.	.NET
IP	The IP addresses of the instance. Hovering over the value displays the complete IP list.	Java, .NET

Click on this hamburger icon (

) and select Export as PDF to export the Instance metrics report.

Performance Metrics of APM Insight

Interpret APM Insight monitoring results

Overview:

Web transaction:

Database:

Background:

Traces:

Traces - Summary:

Traces - Trace Details:

Header

Minimap

Timeline

Traces - SQL Statements:

Traces - Remote calls:

Traces - JVM Metrics:

Traces - Server Metrics:

Traces - NodeVM's metrics:

Filter and advanced filter:

Un-instrumented block of code:

JVM(Only available for Java agent users):

JVM - Summary:

JVM - Garbage collector:

JVM - Threads:

JVM - Configuration:

Node VM

IIS (Only available for .NET agent users):

IIS - Summary:

IIS - Application pools:

Exceptions:

RUM analytics:

Milestone tab:

Server Metrics tab:

Application-level metrics

Instance-level metrics

Outages tab:

Mark as Maintenance

Data Collection Stats

Edit Comments

Delete

Data Collection Stats Tab:

Instance Split up tab: