Prometheus has emerged as the de facto standard for monitoring in cloud-native environments based on several key factors. Prometheus offers a highly scalable time-series database, capable of handling millions of metrics and a pull-based architecture that simplifies network configuration and enhances security.
In this blog post, we’ll explore the four primary Prometheus metric types: counter, gauge, histogram, and summary. We’ll discuss what each type is, how it works, and provide real-world use cases. We’ll also cover when to use (and when not to use) each metric type and touch on how Prometheus metrics can be complemented by other monitoring solutions.
Prometheus is an open-source monitoring and alerting system used by many companies to understand how their workloads perform. The system is widely used for application and infrastructure monitoring in cloud-native environments. Companies like SoundCloud, Docker, and CoreOS rely on Prometheus for real-time metrics collection and analysis.
Prometheus excels in monitoring microservices architectures, containerized applications, and dynamic cloud environments. For instance, a large e-commerce platform might use Prometheus to monitor request latencies, error rates, and resource utilization across hundreds of microservices.
Prometheus employs a pull-based model to collect metrics, periodically scraping configured targets and retrieving metrics data at regular intervals. Metrics are stored as time-series data, identified by metric name and key-value pairs called labels. This structure allows for efficient storage and retrieval of multidimensional data.
Prometheus provides a powerful query language called PromQL (Prometheus Query Language). PromQL allows users to select and aggregate time-series data in real time. Here are some basic query examples:
These queries can be used in Prometheus’s web UI, Grafana dashboards, or alerting rules as the foundation for creating insightful visualizations and proactive monitoring systems. Understanding Prometheus metrics and queries is crucial for effective monitoring.
Prometheus offers four fundamental metric types, each designed to capture different aspects of system and application behavior. These metric types form the building blocks of effective monitoring and observability strategies. Understanding each type’s characteristics and use cases is crucial for implementing a robust monitoring solution. Let’s explore counter, gauge, histogram, and summary metrics in detail.
Counters represent cumulative measurements that consistently grow over time. These metrics can only increase in value or be reset to zero, typically when the process restarts. Counters are ideal for tracking total occurrences of an event or measuring cumulative values, as well as monitoring things like the number of requests, errors, or completed tasks. For instance, you could use this metric for tracking the total number of HTTP requests to a web server. Here’s an example of how to use it:
http_requests_total{method="get"} 1234
http_requests_total{method="post"} 567
You might want to use counters for metrics that always increase, like request counts or error totals. Avoid counters for values that can decrease, such as current memory usage.
Gauge represents a single numerical value, which can fluctuate over time, increasing or decreasing as needed. Gauges are suitable for measuring current states, like temperature, memory usage, or active connections. For instance, you could use a gauge for monitoring the current CPU usage of a system. Here’s an example of how to use it:
cpu_usage_percent{core="0"} 65.3
cpu_usage_percent{core="1"} 42.7
You might want to use gauges for metrics that can increase or decrease, like temperature or queue size. Avoid gauges for continuously increasing values, such as total request count.
Histogram collects and categorizes observed values into predefined, adjustable ranges or intervals and provides a way to group and count measurements across a spectrum of possible values. Histograms are useful for measuring the distribution of values, like request durations. For instance, you could use histograms for analyzing the distribution of HTTP request durations.
Here’s an example of how to use it:
http_request_duration_seconds_bucket{le="0.1"} 12345
http_request_duration_seconds_bucket{le="0.5"} 23456
http_request_duration_seconds_bucket{le="1"} 34567
http_request_duration_seconds_bucket{le="+Inf"} 45678
http_request_duration_seconds_sum 87654.321
http_request_duration_seconds_count 45678
Use histograms when you need to calculate percentiles or analyze value distributions. Avoid histograms for simple counters or gauges that don’t require distribution analysis.
Summary shares characteristics with histograms but offers additional statistical data, calculating configurable quantiles over a sliding time window. For instance, you could use summary for measuring the 95th percentile of API response times.
Here’s an example of how to use it:
api_response_time_seconds{quantile="0.5"} 0.123
api_response_time_seconds{quantile="0.9"} 0.456
api_response_time_seconds{quantile="0.95"} 0.789
api_response_time_seconds_sum 1234.567
api_response_time_seconds_count 1000
Use summaries when you need precise quantile calculations over a sliding time window. Avoid summaries if you don’t need quantile calculations or if histograms suffice.
Now that we’ve explored the four fundamental Prometheus metric types, let’s examine a real-world scenario where all these metrics can be effectively utilized together. This example demonstrates how each metric type contributes to a holistic monitoring solution, providing valuable insights into different aspects of system performance and user behavior.
Imagine you’re responsible for monitoring a high-traffic e-commerce platform that processes thousands of transactions daily. Your goal is to ensure optimal performance, identify potential issues, and improve the user experience during the critical checkout process.
Here’s how you could leverage all four Prometheus metric types: 1. Counter: Track the total number of completed purchases.
checkout_completions_total 15234
This counter helps you monitor overall sales volume and track long-term trends in purchase completions. 2. Gauge: Monitor the current number of active shopping carts.
active_shopping_carts 327
This gauge provides real-time insights into user engagement and potential server load. 3. Histogram: Measure the distribution of checkout process durations.
checkout_duration_seconds_bucket{le="10"} 5432
checkout_duration_seconds_bucket{le="30"} 12345
checkout_duration_seconds_bucket{le="60"} 14321
checkout_duration_seconds_bucket{le="+Inf"} 15234
checkout_duration_seconds_sum 436782.5
checkout_duration_seconds_count 15234
This histogram allows you to analyze the distribution of checkout times, helping identify performance bottlenecks. 4. Summary: Calculate quantiles for payment processing times.
payment_processing_seconds{quantile="0.5"} 1.23
payment_processing_seconds{quantile="0.9"} 3.45
payment_processing_seconds{quantile="0.99"} 6.78
payment_processing_seconds_sum 28976.54
payment_processing_seconds_count 15234
This summary provides insights into payment processing performance, highlighting potential issues with payment gateways. By combining these metrics, you can then create a monitoring dashboard showing:
Additionally, you could set up alerts based on these metrics:
This approach allows you to proactively monitor the use case of an e-commerce platform, quickly identify and resolve issues, and continuously improve the checkout process for customers.
While Prometheus excels at collecting and storing time-series metrics, it primarily focuses on infrastructure-level monitoring. Building out a scalable Prometheus setup requires expertise in running your monitoring stack and a lot of dedicated engineering resources. Doing so also requires developers to maintain the instances of Prometheus and define time-series metrics to be collected. Even large enterprises find the challenges of scaling Prometheus daunting and choose to use valuable engineering resources in more productive ways.
On the other hand, Stackify’s Retrace APM (Application Performance Management) complements Prometheus by providing deeper, more context-rich insights into application performance. Stackify and Prometheus create a turn-key solution with almost instant time to value for developers.
Stacify APM offers comprehensive application monitoring capabilities that go beyond what Prometheus typically captures:
Let’s revisit the e-commerce platform example to see how Stackify APM could enhance the monitoring setup:
By combining Prometheus with Stackify APM, you create a powerful monitoring ecosystem. Prometheus provides broad, system-wide metrics and alerting, while Stackify APM offers deep, application-specific insights. This synergy allows you to not only detect issues quickly but also understand and resolve them more effectively. This monitoring strategy empowers you to maintain a high-performance system, quickly resolve issues, and continuously improve your applications. With a Stackify free trial, you can experience these benefits firsthand and see how it complements your existing Prometheus setup.
This monitoring strategy empowers you to maintain a high-performance system, quickly resolve issues, and continuously improve your applications. With a Stackify free trial, you can experience these benefits firsthand and see how it complements your existing Prometheus setup.
Prometheus offers powerful metrics collection capabilities with its four metric types: counter, gauge, histogram, and summary. Each serves specific use cases in monitoring and observability. Understanding these types helps developers choose the right approach for their monitoring needs.
However, comprehensive application performance monitoring often requires more than what Prometheus alone provides. Stackify’s APM complements Prometheus by offering deep, application-centric insights, code-level performance data, and integrated logging.
So, when monitoring with Prometheus, choose the appropriate metric type based on your specific monitoring requirements. Consider combining Prometheus with Stackify APM for a turn-key solution that doesn’t require developer cycles spent on maintenance, offers a more comprehensive monitoring strategy, and continuously analyzes your metrics to maintain high-performing, reliable applications. See how reliably Stackify APM extends the functionality of Prometheus and start your free Stackify APM trial today.
If you would like to be a guest contributor to the Stackify blog please reach out to [email protected]