Monitoring if your server is up or down and its CPU usage is simply not enough for today’s applications. If you want to really monitor the performance and health of your applications, there are lots of application metrics available. The metrics are available from a wide variety of sources and viewing all of them from one place has long been a problem.
Retrace solves this problem by combining server metrics tools, including server monitoring, application framework metrics, custom metrics, error tracking, log monitoring, and full code level performance statistics. Retrace provides comprehensive application monitoring.
Monitor everything about your application
Find application problems instantly by having a single monitoring platform for all of your application metrics. Retrace provides robust application metrics monitoring.
Monitor your key app metrics, not just your servers
Read more below about best practices for monitoring all of these types of application metrics.
Analyze, trend, and chart all of your metrics
Powerful charting and dashboards make it easy to analyze and trend your software metrics.
Application Framework Metrics
Depending on which programming language you are using, they can provide a wealth of information that you should monitor about your app. Including things like garbage collection statistics, context switches, requests queued, requests per second and others. These application performance metrics can be very helpful when trying to identify application performance problems.
ASP.NET Performance Counters
The .NET framework provides a wealth of Windows Performance Counters about your .NET application process, IIS Application Pool, and more. Microsoft recommends monitoring several Windows Performance Counters for your ASP.NET applications. These counters are automatically created and are always available on any Windows Server. Retrace has fantastic support for monitoring common .NET Performance Counters by default and can monitor any Windows Performance Counter.
Monitoring the Java JVM & Managed Beans
Java utilizes JMX Managed Beans to provide statistics about your Java application. Depending on if you are using Tomcat, JBoss or other applications, they can also provide useful MBeans to monitor. Retrace has the ability to connect via JMX and monitor any Managed Beans.
Web Server Performance Stats
Depending on which web server you are using, it can provide a wealth of different metrics. Some are provided via Windows Performance Counters or Java MBeans. Others, like Apache or NGINX, may also have other ways to monitor various performance statistics.
Create Your Own Custom Application Metrics
Sometimes it makes the most sense to create your own metrics that are specific to your application. For example, perhaps you want to track the batch size of some incoming data when it is uploaded to a REST API that you provide.
You could implement this via a Windows Performance Counter or custom MBean. Another option is to use Retrace’s .NET or Java libraries.
Creating a custom metric is really simple!
StackifyLib.Metrics.Average("Incoming Data", "Batch Size", 25);
Retrace can also collect metrics that are reported as MBeans or Windows Performance Counters, but you may find our API much simpler to use and may provide more functionality as well. With our API they just work and will show up no matter where you deploy your app. If Retrace has to collect the MBeans or Counters then you have to manually configure that in your Retrace monitoring configuration for every app.
Tracking 3 Types of Error Rates
Tracking all of the errors in your application are a good first line of defense. It helps you find little bugs in your software that happen occasionally and big problems that are happening thousands of times a minute. Performance problems can also be caused by very high error rates.
There are three different types of error rates you should track. HTTP errors, how many errors are thrown total in your code, and how many errors you handle and log via your application logging. All three are critical application metrics that should be tracked and monitored.
1. HTTP Error %
You can monitor your web server for how many 500 level HTTP errors are occurring. If you are using an APM solution like Retrace, it can help you track this as well. For ASP.NET there is a counter called “Errors Total/Sec” for your specific application that you can track.
It is good if you can extrapolate the error rate out to % of HTTP requests that have errors. Retrace provides this automatically.
2. Total Exceptions Thrown Counts
For .NET you should track the Counter called “.NET CLR Exceptions – # of Exceps Thrown”. This number should include ALL exceptions. Even though being caught and thrown away. Sometimes your app may seem to be functioning correctly but this number can be in the thousands which is a really bad. This will have a big performance impact and there could be hidden problems.
3. Logged Exceptions
In your code you should have good application logging to a logging framework like log4net, log4j, etc. You should forward all of those errors to a error or bug tracking system like Retrace. From there you can see how many errors per minute your code is logging. These types of error tracking systems can also send you an alert whenever a new type of error is found. This is highly valuable!
Monitor Application Availability
Another key software metric to track is your availability or SLA %. In larger corporations you probably have contractual obligations to have your software working 99+% of the time or some similar sort of guarantee. It may even cost you money every minute your application is down!
Tracking application availability is a complicated subject that we can’t fully cover here. For your application you need to decide if it being online counts towards SLA or is more complex like how fast transactions are processed. Someone like VISA or Mastercard would require an SLA that not only requires credit card processing to be online, but also respond within a certain number of seconds.
Retrace currently tracks SLA in a simple manner by pinging a certain HTTP endpoint to make sure it responds. Although, by combining it with a measurement of “user satisfaction” which is calculated by application response times, you could come close to more complex scenarios.
Measuring Application Performance & User Satisfaction
Retrace uses the popular apdex formula to calculate a simple to understand performance score. It works by basically specifying a goal for how long a transaction should take and then sorting those into buckets of satisfied (fast requests) and tolerating (slow requests) users.
The result of this formula is a number between 0 and 1. This makes for a nice easy to track application metric to use across all of your applications uniformly. Without it, trying to trying to track performance by response times is very variable and is relative to what is good or bad.
Retrace uses this methodology for calculating user satisfactions for your entire application and for individual web requests or transactions.
Monitor Application Dependency Performance Metrics
Today’s applications utilize a wide array of external dependencies and services like SQL databases, Redis, Elasticsearch, MongoDB, queues, external HTTP web services and more. It is important to monitor these services to ensure they are working correctly and to understand the performance impact they are having on your application.
Monitoring SQL queries can help you understand which queries are being used the most or which are the slowest. By monitoring external HTTP calls you can also identify if a 3rd party service is causing your application performance problems.
Basic Server Metrics
It is important to monitor if your servers are online or not. Server CPU and memory usage are also important stats to monitor. You should also monitor the CPU and memory usage of your specific application as well, not just the server itself.
You may also find it useful to monitor network and disk performance, as well as disk space. Although, if you are like us and host your apps via a service like Azure, we don’t even care about these server metrics.
Turning Application Logs into Metrics
Sometimes your application or server logs contain really valuable information that can be useful for application monitoring. You could potentially change your application to report custom metrics, but instead you could setup a query to monitor your log data and achieve the same goal. A good example of this would be searching all of your logs across all servers for your application for a specific log message that denotes an important event. You could then turn this into a metric that can be tracked, charted, and alerted on with Retrace.
Retrace Combines Powerful Application Metrics Monitoring & Alerting
Tracking and monitoring all of these types of metrics may seem overwhelming. Retrace automatically tracks many of the things mentioned and some are more advanced features that our users can implement over time. Retrace automatically tracks things like key application framework performance metrics, error rates, user satisfaction scores, basic server metrics and more.