Retrace Power User Tips and Tricks – Error and Log Management

James Michaelis Developer Tips, Tricks & Resources

The explosive growth of ecommerce has slowed in the last year. But the need for businesses to deliver a great digital user experience continues to grow. Companies that don’t rely on online customer purchases can still suffer blows to revenues due to a poor online experience. Market conditions are raising the importance of Application Performance Monitoring (APM) tools to ensure every digital interaction with your company is positive.

APM tools vary by design, features and functionality. Most are designed for IT operations, but Retrace, on the other hand, is designed and built for developers, by developers. Why is that important?

A true APM “solution” needs to do more than alert ITOps about problems. Retrace is a full lifecycle APM solution for use from development and QA through production. By fixing performance issues and bugs earlier in the life cycle, Retrace helps you avoid application issues that impact users.

But like all software solutions, some Retrace features go unused or misunderstood by customers and would-be users. To help everyone get the most from Retrace, we are introducing our “Power User” series on leveraging valuable Retrace features.

Error and Log Management

One of the main goals of Retrace is to arm developers with all the information they need to create applications that work and perform for their users as they should. Users of Retrace can easily track down errors or see performance issues by using the core features of Error and Log Management, Application Performance Monitoring (APM), and Host/Server Monitoring

Using these core features will go a long way in ensuring that applications are working as they should. There are, however, a number of “pro tips” that our biggest Retrace power users rely on every day that go beyond the aforementioned core features of Retrace.

And when we say “Pro Tips” we mean covering how our Retrace Engineering team uses Retrace internally … so tips will surely help your applications perform great, too.

1. Log Query Monitors

One of the most utilized monitors our Engineering team uses internally are Log Query Monitors. These are proactive monitors you create based on a query match, field/filter match or a combination of the two. If you want to know when a payment process is failing in a production environment, for example, you can create a monitor that matches the log statements that appear when that type of issue occurs. This is how you could configure the query:

To see other examples of how to configure log query monitors, check out “Log Query Monitor Best Practices”.

Once these monitors have been created, you can set up alerts anytime the configured criteria is met allowing you to know about an issue before a user might report it. And since these log query monitors can be created as a Resource Monitor (a standalone monitor that does not need to be tied to a server or application), these are very versatile and powerful when it comes to proactively monitoring your environments.

2. Log Tagging

Error and Log Management tools can collect millions of log statements. One of the trickiest and most arduous parts of using such a tool is sifting through enormous logs. As mentioned above, Log Query Monitors are great at proactively catching and alerting you of issues. Still, searching through all the noise of useless data collected during an outage is more than challenging.

One easy solution is to add tags (#) to your log statements. Here is how to tag a log statement:

Once tagged, log statements are indexed. You can quickly and easily filter and search log statements by the contained tags or clicking anywhere that you see them highlighted in the output.

Retrace Trials are a common use case for our engineering team. When a new user creates a Retrace Trial, we implement tagging in our internal logging when provisioning new client databases:

By using the “provision” tag in our logging for related provisioning events, we can search #provision to quickly and easily see only the information we need to troubleshoot issues. Retrace is a big time saver here, enabling you to see the provisioning error and all the correct context around it while filtering out everything else.

Tags can also be insanely useful to track a transaction across many boundaries, such as multiple apps and servers. For example, many transactions start in a web app but get passed via queue to some sort of background worker. Logging a tag for the same subsystem/subject across multiple apps involved in a process greatly simplifies debugging.

To improve searching and filtering even more, we routinely include multiple tags in our log messages, often including a tag for a subsystem/subject, one for the operation, and one for state (e.g., #api #validatekey #failure). By following this convention, we can open up a range of options to quickly identify the log statements we care about. We can get a broad view of all failing API operations via “#api #failure”, look at all “#validatekey” operations, or isolate this single event by searching for all three.

Conclusion

Take a look at Retrace Documentation for more details or to answer other questions you may have. And keep an eye out for our next Pro User Tips and Tricks post on core APM features in Retrace. Better still, start your own Retrace Trial today!