Everybody climb aboard the hype train with me. Today, we’re going to study a new job title: the DevOps engineer. This role is getting popular in the same way that the full-stack developer role became popular before it. In fact, one could argue that the DevOps engineer is an extension of the full-stack developer in that both seek to extend our ownership of our software.
We see ‘DevOps engineer’ as a valuable job title, but how do we position ourselves as one? This starter guide will show you how you can enter the field. First, we’ll uncover what a DevOps engineer is. Then, we’ll take a look at skill sets that are key to DevOps. Learning just one of these skill sets and then dipping into the others will let you grow into the DevOps engineer role.
What is a DevOps Engineer?
The spirit of DevOps is to broaden the ownership a development team has over its product as widely as possible. Only by owning things all the way from inception to customer use can we make decisions that will strengthen our product in the hands of our users. This includes understanding how to get software in production. It also means we need to monitor our system’s health and use. We should know how customers are using our products. This lets us prioritize and innovate on features.
A DevOps engineer embodies this spirit of ownership. In this role, you’ll specialize not only in getting the code to production very fast, but you’ll also understand its behavior after it has been deployed. You’ll understand and use a myriad of instrumentation, tools, and techniques to stabilize and improve your production features. And you’ll be able to bring feedback insights from production to the rest of the team.
The DevOps Engineer Ingredients for Success
So what does it take to become a successful DevOps engineer?
The idea here is that if you’re looking to become a DevOps engineer, you can start with any of these skills and expand over time. There’s a lot here and this graph isn’t exhaustive. But that’s okay because you only need to capitalize on one of these skills to start. We’ll cover each skill in depth and explain why it matters.
Cloud and Serverless
The first skill set we’ll cover is the ability to leverage cloud and serverless platforms. Back around 2010, I would have labeled this skill set infrastructure as automation. There used to be value in scripting your servers and database configurations as part of end-to-end ownership—from server to application. And while you can still gain some value out of infrastructure automation, it’s healthier to pay others to do that for you. Cloud offerings have evolved to the point where, with the push of a button or a couple of lines from a command prompt, we can have a configured infrastructure up and running. This includes databases, load balancers, authentication services, and more.
To be a DevOps engineer, it is and will continue to be key to understand these cloud offerings. The big players in this space are Google, Microsoft, and Amazon. You do have other platform providers who have value, like Heroku and Digital Ocean, but the three big players currently have the highest variety of tools from which to build a software system. I recommend that you learn at least one of these three platforms.
You’ll also want to understand how to make an application work well in cloud environments. Many of the other skills I discuss here will enable you to do this. The key is that you don’t want to be overly dependent on specific operating systems and environments. You’ll also want your application parameterized enough to work and scale it without needing to change any code.
Let’s go even further here. Serverless platforms all go beyond infrastructure and provide ways to run your code without even needing an application to start up. AWS Lambda is one such offering. It’ll host a piece of code for you, handle scaling concerns automatically, and let you delegate that code from your application. There are also offerings that let you spin up database schemas without caring about where they’re hosted. These things are evolving fast and will continue to dominate the scene.
Cloud and serverless matter to a DevOps engineer because a large part of your work will take place after the code has already been deployed. We need to scale, measure, and configure the production code so that it can handle our customer needs. Cloud and serverless offerings make this cheaper than ever before, giving you an advantage over your competition and allowing you to gain positioning in the ability to quickly deliver value to customers. If you’re stuck using in-house tools, there’s value in learning some infrastructure automation instead. But with the exception of a few niches, your organization has likely already fallen too far behind for that to be of much use. By focusing on using cloud tools over automating your own infrastructure, you’ll have an easy time provisioning the tools you need to run your code.
Continuous delivery is part of a spectrum: continuous integration > continuous delivery > continuous deployment. The idea here is to reduce risk by deploying and eventually releasing our code in smaller increments. This idea is also part of Lean Flow, which we’ll discuss next.
The path to continuous delivery and deployment starts with continuous integration, a classic technique that lets a development team get faster feedback on their code and work together better. Continuous delivery takes it to the next level where we build our code and then, with one button, push it to all of our lower environments quickly. This assumes we have an automated test suite that we can run to vet every deployment, which we’ll discuss later. Finally, continuous deployment is when we aren’t just pushing to lower environments, but also pushing to production on every code commit. Continuously deploying tends to scare many development teams, especially those in a large enterprise. It brings a perception of risk, even though it actually reduces risk. There are, however, strategies, such as feature toggles, that make these fears negligible.
Continuous delivery brings a lot of intrinsic benefits. I recommend applying this practice within any development team. But it has even more value for a DevOps engineer. This is because taking on more ownership over our software means knowing stuff. Yet our mental capacity has not changed. There’s only so much we can care about when delivering software. Building an automated deployment pipeline with continuous delivery practice frees us from the worry of botched configurations, deployment procedures, and the like. We make that stuff boring so we can shift our focus to the other skills listed here.
I love lean flow. The ideas are simple, yet very powerful, in quickening our ability to put software in a customer’s hands. Lean flow stems from lean manufacturing, where you monitor and manage products coming through warehouses or assembly lines as a flow of value from the first bolt until it gets shipped to the customer. In the same way, we can view software as a flow of code, often packaged in user stories, from inception until it sits stably in production. Note that I use the term stable because in DevOps we pay as much attention to how the code behaves after it’s deployed as we do before it’s deployed.
When we apply lean flow, we make visible how work travels through the development team. We use Kanban boards to identify the status of work. The team pulls work in instead of waiting for it get pushed on us. We look at the queues of work building at various places in order to identify bottlenecks. Anywhere we see large batches, we break them down to get more a continuous flow. And we aggressively and continuously review and improve our processes to increase quality and throughput.
The techniques of lean flow let us get value from delivered software more quickly and with fewer defects. It also allows us to identify handoffs so that we can bring those handoffs into the team. Over time, this lets us have complete ownership over our software. It also plays well into other skill sets, like continuous delivery, which is based on these principles. One of the best books to read about this, though a heavy read, is The Principles of Product Development Flow. The principles discussed in this book underpin almost every skill on this list.
You may not have heard of the term observability, but if you’re looking into being a DevOps engineer, you have certainly heard things surrounding it. These include concepts such as monitoring, metrics, and logging. But each of these ideas is a smaller part of a larger whole. Observability is the ability to ask your software questions and receive answers, even if you don’t know what those questions will be when you deploy. It’s the mindset of designing a system to give you insight into what’s happening.
For example, let’s say you toiled for weeks and released a new feature. How do you know people are even using that feature? Or let’s say your system keeps crashing at midnight. How will you know what’s causing the crash? Building observability into our system lets us answer these questions.
Ultimately, we want our system to broadcast context-rich events about what’s happening at any given moment. Tools like Retrace make this easy for us. But there will always be an element of thinking we, as DevOps engineers, have to take full advantage of such tools.
Without observability, we can’t even do the Ops part of DevOps. This is the key ingredient that gives us the ability to change our system after it has been deployed to production. Observability tools allow us to see error rates and request counts. These tools let us set alerts when we break our service level objectives. And without observability, it would be really hard to make our systems reliable, which we’ll talk about next.
In the minds of developers, everything just works. We wire up complex systems that talk to other complex systems. We then diagram it out, showing how everything flows together smoothly. Then reality hits. Services go down. Someone accidentally alters a database record. The network is flaky, and we miss a customer request. Reliability is the skill of being able to deal with this reality.
To make a system reliable, we have to observe it and scale it to demand. We must design our services so that they expect the inevitable. This involves applying things such as retries and circuit-breakers. This may mean designing failure states into the decision logic of our code. It also means we look to simplify what we can in our system so that we can mitigate reliability problems.
We not only want to look at ways to automate reliability into our system, but we also want to look at how to quickly fix things manually. Every feature has a secret persona: that of the Ops person. And that person is now us. We need to plan for and design maintenance shafts into our code so that we can recover quickly from incidents and keep downtime low.
The idea of reliability is so important that places like Google made an entire job title out of it. Most of the Ops side of our work as a DevOps engineer will be learning how to make our system more reliable. Also, as DevOps engineers, we’re going on-call to support our system. Measuring and aggressively baking in reliability will keep our days on-call bearable.
It’s interesting to me how easily security gets left in the dust. We all know it’s important. We all have seen a news article on security breaches and customer data leaks. Yet, it still sits as a niche in our industry. Security is the art of keeping our systems safe from malicious, or even ignorant, attacks.
As a DevOps engineer, we’ll be plugging in things like security vulnerability scans. We’ll need monitoring in place that detects abnormal use. We’ll need to install shields against denial of service attacks. It will be valuable for us to bring penetration testers to try to break into our system. We’ll need to do threat modeling to our systems to see where potential weaknesses lie.
Fortunately, we have resources such as OWASP that make security a skill we can onboard onto easily. And we can automate some of the protections listed above, like vulnerability scans.
In an ideal world, security should be table stakes for developers to learn when building software. Alas, it’s often a rare thing for a team to think about. But neglecting security can cost a company millions, if not billions, of dollars. One customer data leak could bankrupt your organization. I also believe it’s ethical to ensure that data is kept safe. Your customers are trusting you with this information with the expectation that it’ll be secure. As a DevOps engineer, it’s ultimately on us. We own our entire system now, so we can’t rely solely on an external team to think about these things.
The need for security has become so important it has even garnered its own title: DevSecOps. Whatever we call it, security will always be an essential part of owning our software system.
Test automation is one of the most freeing things we can learn as we step into DevOps engineering. I also believe it’s essential for any software system, no matter what your level of ownership is. Test automation is the skill of building and running repeatable tests that validate our system is working. The simplest and most familiar form is what we call unit tests. Most developers I know are aware of or are building unit tests. The most astute practice is test-driven development, a practice where the tests actually drive out the design and implementation of the code.
But test automation goes way beyond unit tests. In a healthy software ecosystem, you’ll have a pyramid of tests that go from large end-to-end ones to your unit tests. You have contract tests to ensure you’ll work well with external systems. There are smoke tests that ensure you deployed correctly to your various environments. You may even have acceptance tests that your business partners will write for you to implement against. This provides a low-friction way to ensure you’re meeting acceptance criteria.
Most talk of testing deals with preventative testing: tests that will stop defects from getting into production. But there’s a slew of tests that you can actually run in production! Things like canary releases, blue/green deployments, and health checks let you keep the cost of defects low, even after the code has been deployed. The key thing here is that certain testing may be very hard to implement in lower environments, but easy to implement in production. So it’s important to keep the cost of defects low, not to stop them altogether.
At the start of this section, I mentioned that test automation is freeing. This is the same value that continuous delivery gives us: we free up our mental capacity for the other skill sets. I don’t want to spend most of my time manually checking things because I’m worried that I broke the system. I want to script that check once and be done with it. And now that we’re paying attention to software after it hits production, testing in production is even more key. We want to ensure everything is running smoothly for our customers as cheaply and effectively as possible. The more we automate it, the cheaper it becomes.
Transformation is the runt of the litter here, but still very important. The thing that’s easy to forget is that DevOps isn’t just a set of tools, but an ownership mindset change. You’ll get pushback from a lot of people. A manager may not understand why we need to invest in resiliency. Database administrators may not understand why you need production schema access. And your team members may not understand why they need to go on-call. Changing from a dev-only to a DevOps mindset is a mental and physical transformation.
Because it’s a transformation, you’ll see responses that may seem illogical to you. Your boss may say, “Why would we need to buy a monitoring tool? Our customers tell us when something is wrong.” It’s very easy to become frustrated with all this. But humans are as emotional as we are logical. We can’t expect people to understand everything. And purely logical arguments won’t always work. Keep this in mind: how we say something will be as important as what we say. Everyone is coming from a different perspective. We must understand that and connect to that perspective.
This is a skill set that’s hard to enter into as a first step. Usually, people are coming from a developer background and the ability to empathize, sell ideas, and influence others may be lacking. But there are those for whom software development is a second career. I know people who have owned their own businesses or have been teachers before going through a software engineering boot camp. For those people, the skills you learn that allow you to communicate effectively can be a great thing to leverage into a DevOps engineer role.
If we don’t transform the mindsets of the people around us, we’ll find friction at every turn. Tools will be harder to procure. It’ll be more difficult to get user stories around reliability into the backlog. Being able to empathize and consult with our peers will allow our other skill sets to flourish more frequently.
The DevOps Engineer Roadmap
We covered quite a bit on what ingredients make up the cake of DevOps. I want to reiterate that you don’t need to immediately know them all. If you’re just getting started with DevOps, pick one and grow into it. If you’re already an expert in one, start learning another of these. They’re all connected and learning one can ease us into learning another. Learning continuous delivery can help us understand the principles of lean flow. Knowing the principles of lean flow can let us find the best place to put in automated tests.
If we treat ourselves as perpetual students and keep learning as we do our work, we’ll understand all of these skill sets and be a proud DevOps engineer in no time.