Is it safe to run APM on production servers

By: mwatson
  |  March 11, 2024
Is it safe to run APM on production servers

We are often being asked, is it safe to run APM on production server? Would APM have an impact on our application performance? While we can’t answer for all APM solutions available in the market. After performing extensive testing, we can confidently say YES. Stackify APM+ is safe to run on production and will have only minimal effect on your performance even in extreme conditions.

Stackify APM+ is a combination of lightweight code profiling, application metrics, error tracking and log management. This combination in one easy to use platform is amazingly powerful for developers to find and fix almost any kind of application problem. From the beginning, we designed Stackify to be lightweight and safe for production servers.

How Stackify APM+ is optimized for production usage:

  • Code profiling is minimized to key application framework methods
  • Implemented in highly optimized C++ code.
  • Data processing is done in a separate process, outside of your application code

Below we will dive into more detail on these items.

Optimizing APM+ for low overhead making it safe to have APM on production servers

Microsoft provides a CLR profiling library for implementing code profiling solutions like Stackify APM+, the .NET debugger, Visual Studio performance analyzer, ANTS, and other tools. Sounds simple enough, right? Not exactly. The CLR API is written in C++ and if that wasn’t bad enough for a group of C# developers, the interface between C++ and the .NET CLR was sort of like the movie Inception except with nightmares. It is full of pointers to weird memory layouts full of more pointers. Add generics and inheritance to that and you have a nightmare inside of a nightmare inside of a nightmare.

To ensure that we do not impact performance, we limit what methods we intercept and inspect from the .NET CLR profiler. We only intercept certain application lifetime methods and key methods in 3rd party libraries (See list here). Some other profilers potentially inspect every single method, or at least they have to figure out if they want to intercept every single method as they occur. Either way, this takes up a lot of extra processing cycles and can really slow down a busy web application. Stackify APM+ only looks at minimum set of methods to ensure that it is very lightweight and safe for very busy web applications in production environments.

We have done load tests with apps doing up to 300 web requests per second, which is more than 10x what most web applications receive. For every ASP.NET web request our profiler may inspect 50+ method calls. At 300 requests per second, that is 15,000 method calls a second. At that volume, every processor cycle matters.

In v2.3 we did a lot of performance tuning and reduced CPU overhead of our profiler by 80%. We did a lot of tuning to our C++ code and hundreds of load tests. We spent days changing and testing single lines of C++ code to find the absolute highest level of performance. The biggest improvement came from optimizing how we use hash maps and aggressively caching CLR related data wherever possible to avoid additional CLR API calls.

Splitting the baby in half to improve performance

One of the biggest differences between Stackify and most other APM solutions is that we designed our APM solution in two parts. One that collects data about your application performance and another that crunches it outside of your application process. This allows us to optimize data collection to be as fast as possible with as little impact to your application as possible. Some APM solutions perform this data crunching and uploading within the main profiling process, which is also your application. This can cause random performance problems with your application.

This is a load test of 100 requests per second on a leading competitor’s product. You can see that the background processing that they perform randomly has a huge impact on application performance.

APM affect on production servers

By doing the data crunching in a separate Windows service that runs at a low process priority, we can be assured that it will not impact the performance of your application.

Same load test with Stackify shows no impact to performance:

No affect on production server using APM by Stackify

Stackify now uses up to 50% less CPU than other leading APM solutions

One complaint we have heard for a long time is that some of our competitor’s products were not safe for production usage and caused performance problems with their applications. Our goal is to be safe for production applications, even high volume ones. We are glad to say that through our testing, Stackify APM+ causes little to no impact on application response times and throughput. Which is something our competitors can’t claim based on our research and customer feedback. Based on the amount of CPU overhead added to your IIS worker process (w3wp), Stackify uses up to 50% less CPU than some other APM products, while actually doing more, like supporting async applications and dozens of 3rd party libraries.

Conclusion: Stackify APM is safe for production servers

We believe we have achieved our goal of not impacting applications response times or throughput while adding as little CPU to the process as possible. Stackify APM+ is an incredibly powerful tool that is safe to use on production servers.

Read more about this topic in our support docs. Is Stackify safe for production?

Improve Your Code with Retrace APM

Stackify's APM tools are used by thousands of .NET, Java, PHP, Node.js, Python, & Ruby developers all over the world.
Explore Retrace's product features to learn more.

Learn More

Want to contribute to the Stackify blog?

If you would like to be a guest contributor to the Stackify blog please reach out to [email protected]