About a decade ago, the Ruby programming language made a big splash in the software engineering industry thanks to the Ruby on Rails web framework. The terse and friendly syntax of Ruby and the “they thought of everything” feeling of Rails offered web startups the ability to move quickly and nimbly.
However, as an interpreted language, Ruby is slow compared to compiled languages. The general solution adopted by Rubyists was to throw more hardware at the problem. “Hardware is cheaper than salaried engineers,” went the common maxim. This plan worked well for a lot of high-profile sites built on Rails, but that doesn’t mean we should ignore possible bottlenecks. If you want to figure out why your code is slow, you’ll need a Ruby profiler. Here are some different types of profiling available and some tools to help you do so.
You should always ask yourself if you even have a problem. You definitely don’t want to waste time trying to optimize code that doesn’t need it. Premature optimization is the root of all evil!
Another important thing to think about is what you’re trying to measure. What questions are you trying to answer? Do you have a clear and pressing problem with slow performance? Are your processes getting killed due to memory usage? Is some task or part of your code pegging the CPU? Are we spending too much time reading and writing to disk?
If you’re diagnosing issues on a database-backed application, the code talking to the database is a classic source of problems. A query on a large table that doesn’t hit an index can totally ruin your day. Sometimes it’s better to do multiple queries rather than join against a giant table. That’s likely to give your manager an aneurysm.
Really, you can find the answers to all of these questions using one Ruby profiler or another.
There are surprisingly few books on Ruby performance and profiling. The Pragmatic Programmers book, Ruby Performance Optimization, is the closest thing to the bible on the subject and well worth a read, but it’s a few years old now. You can find more current information on blogs, and there’s a wealth of blog posts on Ruby performance and Ruby profilers out there too.
Brandon Weaver is a prolific writer about all things Ruby and recently wrote a series of posts about using the TracePoint Ruby class to profile your code. He goes into plenty of detail and has lots of examples. He’s also working on his own library called TraceSpy, which wraps around TracePoint.
Julia Evans is another prolific Ruby profiler blogger. She took a sabbatical a couple of years ago to learn about Ruby profiling in depth and to work on her own profiler. The result was Rbspy, which we’ll take a look at below. She blogged about the process and has also spoken about it at conferences. She has a funny and breezy style that’s very welcoming. Definitely no stiff academic writing and speaking here!
Additionally, there are plenty of other Ruby luminaries, including Aaron Patterson, aka Tenderlove, who has written about Ruby performance in one way or another. This GitHub page also has a good list of blog posts and profiling tools.
Different Ruby profilers are going to have different features, like reports on memory usage or profiling your program’s CPU usage. Some will even give you graphs that are so beautiful you’ll want to print them out and put them in a frame on your wall! Be sure to check their documentation to know what you’re getting.
Profilers that measure CPU usage or the time spent in various parts of your code can be broken down into two basic approaches: tracing and sampling.
A tracing profiler follows the full flow of the execution of your program. It’s like your mom following you around all day and watching everything you do. The benefit here is that you leave no stone unturned. You’re going to get excruciating detail about everything your code does so you can easily see where the most time is spent. The downside of this approach is that the profiling code is going to add overhead to your code’s execution. This could make it risky for code running in production. You don’t want to take the site down just because you want to know why it’s slow!
Sampling profilers, on the other hand, mostly stay out of the way and only check on what’s going on at certain intervals. It’s like your mom calling you occasionally to see what you’re doing. Okay, this kind of check-in is actually happening tens or hundreds of times a second, and that would be pretty annoying if your mom did it (if not downright physically impossible)! The danger here is missing some important information due to your sampling interval not being tuned right. But if you’re profiling production code, sampling is safer.
The list of Ruby profiler tools stretches back into the rough and wild early days of Ruby, or at least back to when the world fell in love with Rails. Some have been abandoned or sport stern warnings not to use them anymore. You want to make sure that the tool you choose works with your version of Ruby and has the features you need. And if you wind up needing some help or having an issue, choosing an actively maintained tool gives you a better chance.
One of the earliest and most popular Ruby profilers was Zenprofile. It was written by Ryan Davis, aka Zenspider, one of the founding members of the Seattle Ruby Brigade. It soon went defunct, however, due to Ruby language changes. Another early tool—but one that’s still kicking—is rblineprof. It’s a tracing profiler and installs as a gem. Then you write some code that uses it to profile other bits of code. Not a lot of hand-holding here.
Google, which has long made a big deal about performance, built their own profiling library called Google Performance Tools. This was the inspiration and basis for perftools.rb, a sampling profiler with various modes of operation and methods of integration. You can require the library in your project and use it to profile a specific bit of code. This is like rblineprof, but since you can dial in the sampling interval, it’s safe to use in production. You can also run it externally to profile your app without modifying it.
There’s also a Rack middleware gem called rack-perftools_profiler that you can use to benchmark your web app. More on examining web app performance below.
However, there hasn’t been much work on perftools.rb in recent years because the same authors started over again with stackprof. It has all the features of perftools.rb and more. It has an API that you can hook into your code. Or you can run it externally to monitor your running program. It’ll also spit out various helpful reports. You can generate neat graphs with graphviz. And you can generate those gorgeous “FlameGraphs.”
The ruby-prof gem is one of the best-known tracing profilers for Ruby. It has been around for a long time but is also under active development. You can hook it into your code using its API, or you can run it as a command-line program to examine your running Ruby process. You can spit out a variety of reports. There’s also another gem to convert ruby-prof output into a FlameGraph. Note however that this is a tracing profiler. As it says on the GitHub page: “Most programs will run approximately twice as slow while highly recursive programs (like the Fibonacci series test) will run three times slower.”
The newest player on the Ruby profiler block is Rbspy. This is the library written by Julia Evans, as mentioned above. (I told you we’d get back to it!) One notable thing about Rbspy is that it’s written in Rust. Most other Ruby profilers are written in C. It also has great documentation. Being a sampling profiler, it’s safe to run on your production code. But note that this tool is strictly external. It’s a command-line app that you run to watch and report on your running Ruby process. This means there’s no API you can use to hook into your code directly.
Rbspy can generate FlameGraphs too. Just look at how pretty they are!
Okay, you’re probably wondering what these graphs do for you. Ironically, they’re called FlameGraphs due to the colors and the fact that they show the “hottest” code paths; however, this graph generated by Rbspy is called an Icicle Graph because it’s inverted. It’s an icy graph showing you how hot your code is!
Regardless of the direction of the graph, it shows the time spent in function calls and the resulting call stack for each invocation. Each box represents a function call with the call stack growing downward. So the y-axis shows stack depth. The x-axis doesn’t technically represent time, though the width of a box does represent the time spent in the function.
The really neat thing about a FlameGraph that this screenshot doesn’t convey is that they’re SVG files, and each box is a link that allows you to drill down into that level of the call stack. Basically, you can “zoom in.”
The tools we’ve looked at so far are mostly concerned with time spent in various parts of your code. This can tell you if the code is banging on the CPU or hanging around waiting for a disk/network. But sometimes you want to know what is using all your memory, especially if you have a memory leak. It’s so lame when mean old Monit comes along and kills your process because it was sucking up all your memory!
For this, you want a memory profiler. Note that ruby-prof has a memory mode that you can run in addition to all its CPU usage, object allocation, and garbage collection goodness. If you want a dedicated tool, there’s the memory_profiler gem. You have to work it into your code. There’s no command-line app to spy on a separate process here. But it will tell you all about what piece of code is allocating memory.
Considering how many developers work on Ruby on Rails web apps, profiling your web app is a must. The Derailed Benchmarks gem is one such tool. After installing it, you can run its various tasks to profile your app, both statically (without running the app) and dynamically (booting up the app and watching over it).
Here are some examples:
bundle exec derailed bundle:mem
This tells you how much memory your gems are using.
bundle exec derailed bundle:objects
This gives you detailed information about objects created when your dependencies are loaded.
bundle exec derailed exec perf:objects
When you’ve figured out you have a memory leak, this task will help you track it down. Note that it uses the memory_profilergem to do so.
bundle exec derailed exec perf:stackprof
As you might have guessed, this uses the stackprof gem to perform sampling benchmarks on your app.
bundle exec derailed exec perf:ips
Finally, this last command uses the benchmark-ips gem to give you “iterations per second” benchmarking. This tells you how many times the code under test—like loading your homepage—runs in a second. You can then try some improvements and see if you can make the number go up.
Derailed Benchmarks is an excellent tool that gives you a lot of insight into potential problems in your web app. But wouldn’t it be nice if you could have a profiling tool directly integrated into your website? Surprise! You can.
Rack Mini Profiler is a popular gem that plugs directly into your Rack-based web app and provides page speed analysis on a pop-up overlay embedded in the page. Being Rack middleware means you can use it with any web framework based on Rack. That includes Rails, Sinatra, Hanami, and many others. It provides database profiling, call-stack profiling (via stackprof), and memory profiling (via memory_profiler). It even gives you FlameGraphs! You can read a lot more about all of its features and benefits in this detailed blog post.
Since web app performance is so important, it’s not surprising that companies have sprung up to do it for you. They call this Application Performance Monitoring or APM. There are several players in this space. But since you’re reading the Stackify blog, clearly we’re going to talk about Stackify!
Stackify’s flagship product is called Retrace. It’s a SaaS application that provides a wealth of insight into all facets of how your Ruby web app is doing. After signing up with Stackify (there’s a free trial), you install the stackify-ruby-apm gem and add a YAML configuration file to your app. The Stackify agent runs as a background process on your server and sends performance information to Stackify.
Once the agent is hooked up and sending data, you can log into the Stackify site and start benefitting from all the profiling and monitoring Retrace provides. The dashboard page is like your command central, showing you the key health metrics that you need to focus on. This includes a graph of app performance over an adjustable time range. This graph breaks down the time spent in your stack: Ruby code, database queries, external requests, etc.
The dashboard also shows your top requests, including how many hits received, average time, and the percentage of time overall. By clicking on a request, you get a detailed overview of how that endpoint is performing. This includes a user-satisfaction score and a performance breakdown. You can also see if there have been errors thrown during that request. That’s definitely something important to know! The request details screen even includes a sequence breakdown, showing how much time was spent in each phase of the request.
Retrace lets you ship your server logs to them as well. This means that even though you’ve sprung for a dozen servers to keep your site nimble and mean, all of those individual logs wind up in Retrace. In the log viewer, you can go from a log statement to the transaction trace for that request. You can also leap from errors to transaction traces.
Retrace is also smart enough to keep track of when you deploy new code. You’re going to want to keep a close eye on the dashboard and error reporting after changes are rolled out so you know quickly if a nasty bug slipped by QA (sorry QA). Oh, and you can set up Retrace to notify you of problems via email and text message.
As we’ve seen, there’s no shortage of tools available to help figure out why your code is making you a sad panda. If you’re working on a library or small app that needs to scream, you should probably just grab a tracing profiler like ruby-prof and get excruciating detail about what your code is spending its time on.
If you want to profile some production code, however, plugging in a tracing profiler could mean shooting yourself in the foot. For that, you want a sampling profiler like rbspy. This tool is a great addition to our profiling toolbox and probably has the best documentation of any of them. It’s also currently very actively maintained, so if you have a problem, you’re likely to get some help.
If you’re trying to gain insights into where your web app requests are spending all their time, you can use rack-mini-profiler or sign up for an APM service. The former is great if you’re on a budget or working on your own small Rails site. But if you’re building your dream web startup and trying to convince your users to plunk money down on your service or keep customers engaged and happy, it’s definitely worth springing for the cost of an APM service. You’ll find that you spend a lot of time in its UI, gaining all manner of profound insights and web-app wisdom.
Now, go forth and profile!