Why is monitoring microservices so important? It’s because we work in a time where systems are complex, distributed across multiple microservices. For example, even a simple e-commerce app may have Ordering, Product Catalog, and Shipping services. Our tooling and practices sometimes struggle to keep up with such complexity.
It reminds me of music and video games. In the early days of video games, much music came in the form of MIDI files. These files could only play one set of notes, which made the music feel flat. Now, music can be fully fleshed out. We hear every instrument, playing together in a symphony. The combination of rhythms and instruments together makes music more beautiful than each of them by themselves.
One System at a Time?
In software, we’ve practiced monitoring one system at a time. When we need to look at a different system, we switch to a different screen or tool. But microservices are like instruments in a symphony. We need a way of looking at them together, seeing the music that they produce in one spot.
In this post we’ll look at five steps you can take to ensure that you can monitor your microservices to see if they’re working together in harmony. This also lets you fix places where they may be working in disharmony.
What Do We Mean by Microservices?
Microservices are autonomously deployable, business-centric units. Not all deployment artifacts are separate services. For example, I may have one microservice that has a runtime component and an ETL component. I want to deploy these together because they’re tightly coupled. I also want to monitor these as one service, for the most part. With microservices, I want to see them as separate services but also assess how they correlate with each other.
Now, let’s look at the five steps you’ll need when monitoring microservices.
1. Determine Just a Couple of Services to Start
When implementing or learning something new, it can be tempting to try to get all the wonderful features up and running in one go. In software development, we call this Shiny Tool Syndrome. We get so caught up in a realm of possibilities that we can lose ourselves in the complexity of just setting things up.
To help avoid this, limit what you set up for monitoring microservices. Choose two or three services to connect to your new monitoring tooling. Then repeat steps 2 through 5 (see below) for each one.
Which Services Should You Choose?
To continue the comparison from the beginning of this post, choose the loudest instruments in your symphony. Which services have the most strategic importance to your business? Choosing the most strategic services will give you the highest value for your efforts.
You may be constrained in choosing the most strategic services. Perhaps some of them are legacy applications that may not play well with newer tooling or don’t have tests that let you easily reconfigure them. In this case, choose the couple of services that you feel will be the safest to change and that you can run locally in order to ensure you have wired things up correctly.
The overall goal is to limit your focus to just a couple of services—however you choose them.
2. Determine the Few Things to Measure First
In line with the idea of limiting your focus, you’ll want to hone in on the top one to three most important metrics you want to measure. Some tools have many features. Retrace, for example, can do performance metrics, alerts, centralized logging, and error tracking. If you haven’t yet chosen a tool for monitoring microservices, reviewing all the features can be daunting.
In contrast, looking at only one to three of your most important metrics will allow you to sidestep this analysis paralysis. And that way you can get the ball rolling on step 3 sooner instead of trying to understand everything all at once.
Which Metrics Should Be Your Focus?
To figure out what metrics to focus on when monitoring microservices, it’s important to understand your business needs. Where have most of the customer or operational complaints come from? Is it from services being down too much or requests being too slow? Perhaps it’s from too many database errors popping up during high loads.
The more deeply you understand your business, the easier time you’ll have picking what metrics to choose tooling for. Also, it’ll be easier to make a case for your boss to spend the budget you need so you can buy the tooling.
What if you find yourself siloed (separated from other groups or departments) or at a loss to know what’s most important to your business? Start with the four golden signals. These signals are a time-proven way to get immediate value from monitoring a service. They’re almost universally useful to any business. Retrace supports these out of the box and has smart defaults depending on the type of service you’re monitoring.
3. Commission APM and Logging Software
Now that you’ve done your homework, you can choose a monitoring tool. Use your one to three important metrics to guide how you look at one. If they have a demo instance, you can play around to your heart’s content to understand how it actually works. This is the ideal experience because screenshots can only take you so far.
You’ll want to see a few things in your monitoring tool no matter the metric. First, you’ll want to easily overview your entire system. This includes not just your runtime services but your databases and other back-end components as well. Doing this lets you easily get a feel for the rhythm of your system as a whole, just like hearing a song where all instruments work together.
You’ll also want dashboards that let you correlate across services. These may reveal relationships that aren’t easy to see in your code. (You can quickly jump from these to your centralized logging to see what exactly is happening if you follow step 5, listed below.)
What the Tool Should Do
Finally, the tool should let you easily break down your overviews into specific services and sections of your system. You should be able to slice by a few different dimensions. The tool should do most of the hard work of pinpointing potential problems. A tool with smart defaults will make this easy, as it will set up the combination of monitors you need at multiple granularities.
You should look at not only a tool for monitoring but also one for centralized logging. I bring up centralized logging also because monitors are always aggregated somehow. They’re great at showing trends and overall health. But at some point, you’ll need to dig in find out exactly what is going on. You shouldn’t need to access multiple log fields to see the story of a request across multiple microservices. Instead, you should be able to see logs from multiple services in one place.
Retrace has both monitoring and centralized logging, but that isn’t necessary. Your monitoring tool can be separate from your logging tool as long as you have a way to correlate your logs to what you see in your monitors—for example, an app ID and timestamp. (In step 5, we’ll talk about how you can make strong correlation IDs.)
4. Instrument Metrics at Extension Points
A good tool will have some way of automatically instrumenting your services. Usually, this means you have to add a library and configure some properties to connect to the right server. Ensure the tool supports instrumentation for the language and framework of your choice. Otherwise, you’ll have to find seams in your framework’s request life cycle to instrument yourself. Some frameworks, such as Spring Boot, come with monitoring extensions built in.
Even with auto-instrumentation, most tools should let you customize. This lets you get to those weird places in your app that have evolved from less-than-ideal circumstances.
Once you’ve instrumented and configured, run your service locally, pointing to your monitor server. Ensure that data is actually coming into your tool from the service. If you’re using Retrace, here’s a great article describing how you can verify that your app is working.
5. Instrument Tracing to Your Logs
Even after you get everything connected and running, there’s still one more thing I recommend. I mentioned centralized logging earlier and how it’s important to ultimately track down problems. With microservices, it can be hard to follow the trace of events through your system. This makes cross-service bugs all but impossible to find.
I recommend implementing trace IDs in each of your services. You need to do this in a standard way across your services so that one trace ID can flow throughout your entire system. The OpenTracing standard is a great way to do this, and many frameworks have instrumentation libraries to support it. With this in place, you’ll be able to easily query logs across multiple services and explore how problems may ripple across your software.
Monitoring microservices is like monitoring most systems, but with just a couple of twists. You’ll need a tool that can monitor multiple services side by side. You’ll also need to add trace information to each service so you can understand how they interact with each other. With these in hand, you’ll have a beautiful symphony of services giving insights at every moment, allowing you to make smart decisions about our scaling and architecture.