APM refers to application performance management or application performance monitoring. You could argue that they are the same thing, or perhaps management infers being more proactive and monitoring only being reactive when it comes to the performance of your application. Either way, APM is an essential tool to help optimize and monitor the performance of your apps.
What is Application Performance Management (APM)?
For my definition, APM, or application performance management, is largely an industry or vendor created term for anything that has to do with managing or monitoring the performance of your code, application dependencies, transaction times, and overall user experiences.
Wikipedia says, “Since the first half of 2013, APM has entered into a period of intense competition of technology and strategy with a multiplicity of vendors and viewpoints. This has caused an upheaval in the marketplace with vendors from unrelated backgrounds (including network monitoring, systems management, application instrumentation, and web performance monitoring) to adopt messaging around APM. As a result, the term APM has become diluted and has evolved into a concept for managing application performance across many diverse computing platforms, rather than a single market.”
Since APM is sort of a ubiquitous term for anything and everything performance related, some vendors use the term to mean totally different things. APM can span several different types of vendor solutions.
3 Types of APM monitoring tools
- App Metrics based – Several tools use various server and app metrics and call it APM. At best they can tell you how many requests your app gets and potentially which URLs might be slow. Since they don’t do code level profiling, they can’t tell you why.
- Code level performance – Stackify Retrace, New Relic, AppDynamics, and Dynatrace are the typical type of APM products you think of, based on code profiling and transaction tracing.
- Network based – Extrahop uses the term APM in regards to their ability to measure application performance based on network traffic. There is a whole product category called NPM that focuses on this type of solutions.
Some other tools do monitoring based on server and application metrics, not code level performance, sometimes refer to their products as application performance monitoring solutions. Knowing your server CPU or average response of your webserver is important and helpful, but APM aims to go way deeper.
By leveraging code profiling and other data collection techniques, application performance monitoring tools can provide detailed transaction tracing.
APM is all about understanding the “why” as fast as possible
If you want to measure the performance of a web application, it is pretty trivial to parse the access logs and get an idea of how long web requests take. This would give you an idea about overall performance and which pages are slow. Unfortunately, it doesn’t answer the key question of why.
The heart of APM solutions is understanding why transactions in your application are slow or failing.
For example, a development or operations team can instantly tell from this visual that their database is causing some performance spikes. They can also leverage their APM to identify exactly which database query and web requests were affected.
APM solutions can help identify common application problems quickly:
- Track overall application usage to understand spikes in traffic
- Find slowness or connection problems with application dependencies including SQL, queues, caching, etc
- Identify slow SQL queries
- Find highest volume and slowest web pages or transactions
10 Critical Application Performance Management Features for Developers
For developers, APM is really all about data, and I mean lots of data. But they need more that data, they need actionable insights from that data so they can quickly get to root cause of what is causing application problems.
Here are some of the key features that most of them support.
1. Performance of every web request and transaction
At the heart of APM you have to be able to measure the performance of every web request and transaction in your application. You can then use this to understand which requests are accessed the most, which are the slowest, and which ones you should add to your backlog to improve.
Knowing the performance of every web request is just the start though. You could potentially get that from a web server access log. The real key is understanding the why.
2. Code level performance profiling
If you want to understand why your application is slow, throwing errors, or has weird bugs in it, you have to get down to the code level. Knowing that a certain web request doesn’t work is important and actually pretty easy. Figuring out why it doesn’t work is hard, sometimes really hard.
By tracking what your application is doing all the way down to the code level, you can potentially gain way more insights about what is occurring:
- What key methods in your code are even being called?
- Which methods are slow?
- Is your app slow due to things like JIT, garbage collection, etc?
- What dependencies are being called?
3. Usage and performance of all application dependencies like databases, web services, caching, etc
Why your application is slow usually comes down to a spike in traffic or a problem with one of your application dependencies.
It is very common to have these types of problems:
- A particular SQL query is slow
- SQL database server is down
- External HTTP web services calls are failing
- Noisy neighbors in the cloud causing problems
As one example, we recently had some issues accessing Hubspot’s API. They were throttling us and the only way we would have ever known is because track all of the exceptions and can see in our APM that those affected transactions were also failing.
4. Detailed traces of individual web requests or transactions
Troubleshooting problems in production is very difficult. Transaction traces makes this a lot easier by being able to see details about exactly what is happening in your code and how that affects your users.
Traces can contain these types of data:
- Web request info like URL, etc
- Who the user was
- What dependencies did your code call (SQL, caching, HTTP calls, etc)
- Logging statements
- Application errors
- Key methods in your code
Seeing all of this data in a single trace can short circuit having to attempt reproducing a problem in QA. Getting to root cause can be nearly instantaneous with an APM solution that collects details traces.
5. Basic server monitoring and metrics like CPU, memory, etc
Application problems can occur for a lot of reasons. Thanks to virtualization and the cloud, a server going down isn’t near as common these days. However, it still does happen and is something you need to monitor for. It is also critical to monitor things like server CPU and memory. A lot of modern web applications are not usually CPU bound but they can still use a lot of CPU and it is a useful indicator for auto-scaling your application in the cloud.
6. Application framework metrics like performance counters, JMX MBeans, etc
Server metrics like CPU and memory are interesting, but for developers, application metrics can be a lot more valuable for true application performance monitoring. Developers need to monitor metrics around things like garbage collection, request queuing, transaction volumes, page load times, and much more. Developers can monitor a wide variety of Windows Performance Counters and JMX MBeans. It can also be critical to monitor things like Redis, Elasticsearch, SQL, and other services for key metrics.
7. Custom applications metrics created by the dev team or business
Standard server and application metrics can be very helpful for monitoring your applications. However, you may get way more value by creating and monitoring your own custom metrics. At Stackify we use them to do things like monitor how many log messages per minute are being uploaded to us or how long it takes to process a message off of a queue. These types of custom metrics are easy to create and can be very useful for application performance monitoring.
8. Application log data
Whenever something goes wrong in production the first thing you will hear a developer say is “send me the logs”. Log data is usually the eyes and the ears of developers once their applications are deployed. Developers need access to their logs via a centralized logging solution like a log management product. Fortunately, log management is an included APM feature in Retrace. Most APM solutions don’t support the #1 thing developers want to see… their logs!
9. Application errors
The last thing we ever want is for a user to contact us and tell us that our application is giving them an error or just blowing up. As developers, we need to be aware of any time this occurs and constantly watching for them. Errors are the first line of defense for finding application problems. We need to find and fix the errors, or at least know about them, before or customers call to tell us because odds are most of them won’t even call to tell you. They will just go somewhere else.
Excellent error tracking, reporting, and alerting are absolutely critical to developers in an application performance management system. I would highly recommend setting up alerts for new exceptions as well as for monitoring overall error rates. Anytime you do a new deployment to production you should be watching your error dashboards to see if any new problems have arised. Odds are, you will find some type of new errors that you can then quickly identify and hotfix.
10. Real user monitoring (RUM)
Is APM expensive? It doesn’t have to be!
Traditionally, application performance management tools have been an expensive luxury item that only large IT enterprises could afford. Many APM vendors still cater to the larger enterprises, still charging $2,000-$4,000 per year per server. Ouch!
Most APM solutions are very complex to configure and use. So much so that development teams don’t even use them. They end up being expensive traffic lights and dashboards. Some vendors have put a huge focus on making their products affordable and very easy to use so they can be available to development and operations teams of all sizes. Our product, Retrace, starts at just $10 a month.